High Performance Python tutorial v0.2 (from EuroPython 2011)

My updated High Performance Python tutorial is now available as a 55 page PDF. The goal is to take you on several journeys which show you different ways of making Python code run much faster (up to 75* on the CPU, faster with a GPU).

UPDATE As of October 2014 I’ll be teaching High Performance Python and Data Science in London, sign-up here to join our announce list (no spam, just occasional notes about our upcoming courses).

UPDATE (2014) I’ve written the High Performance Python book with O’Reilly, it is publishing this year.

UPDATE 1 this talk is superseded by my High Performance Python 1 tutorial from PyCon 2012.

UPDATE 2 I’m thinking of writing an updated guide, if you’re interested in hearing about it please join the High Performance Python Mailing List (I’ve only got a list right now). I’ll make an announce once I know more.

UPDATE 3 (Sept 2012) I was missing the EuroPython video, I’ve embedded it down below.

This is an update to the 49 page v0.1 I published three weeks ago after running the tutorial at EuroPython 2011 in Florence.

Topics covered:

  • Python profiling (cProfile, RunSnake, line_profiler) – find bottlenecks
  • PyPy – Python’s new Just In Time compiler, a note on the new numpy module
  • Cython – annotate your code and compile to C
  • numpy integration with Cython – fast numerical Python library wrapped by Cython
  • ShedSkin – automatic code annotation and conversion to C
  • numpy vectors – fast vector operations using numpy arrays
  • NumExpr on numpy vectors – automatic numpy compilation to multiple CPUs and vector units
  • multiprocessing – built-in module to use multiple CPUs
  • ParallelPython – run tasks on multiple computers
  • pyCUDA – run tasks on your Graphics Processing Unit
  • Other algorithmic choices and options you have

The improvement over the last version (v0.1) is that I’ve filled in all the sections now including pyCUDA (there are still a few IAN_TODOs marked, I hope to finish these in a future v0.3). I’ve also added a short section on Algorithmic Choices, link to the new Cython prange operator and show the new numpy module in PyPy.

Here’s the video from EuroPython (via pyvideo.org):

The source code is on my github page. The original slides are on slideshare too. If you’re after a challenge then at the end of the report I suggest some ported versions of the code that I’d like to see.

The report is licensed Creative Commons by Attribution (please link back here) – I’ll also happily accept a beer if you meet me in person! If you’re curious about this sort of work then note that I offer A.I. and high performance computing consulting and training via my Mor Consulting.

Update – ShedSkin 0.9 adds faster complex number support. I haven’t added it to the report yet, evidence in the ShedSkin Group suggests it gets closer to the non-complex-number version (i.e. you don’t have to do more work but you get a nice speed boost whilst still using complex numbers).

Update (Nov 2011) – Antonio and Armin posted a note which explains some of the slowness in PyPy and show how it is competitive, under the right conditions. Armin also contributed a C version which shows PyPy to run as fast as C (for their chosen configuration).

Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.


  • Hi! :) Do you know if is any substitution for PyCUDA? I have ATI graphical card and I want to use library for it.
  • “up to 75* on the CPU” — is that “75 times” or “75%”?
  • @snipe - pyOpenCL is the ATI equivalent but I don't have code samples yet (but I'd love to include one if someone makes one!) @Vasily - it is "up to 75* faster on the CPU compared to CPython" - see the ShedSkin/Cython examples in the PDF.
  • Hi Ian, Thanks for the great tutorial. You've given me a few new ideas for optimising my own code. Just one small typo I noticed on p35: "np.greater(..." (4th bullet point) should be "np.where(..." and so you might want to add a line detailing the "done = np.greater(..." inequality prior to that.
  • Tobu
    IPython has some quality support for SMP and clustering, with optional MPI integration: http://ipython.org/ipython-doc/dev/parallel/index.html
  • Valery
    Hi Ian, are there anything interesting regarding PyPy+CUDA marriage?
  • Sorry, I haven't heard of anything for PyPy+PyCUDA.
  • MySchizoBuddy
    any implementation using llvm-clang instead of gcc. Can you also talk more about which implementation to use for scipy/numpy on a webserver, in reference to a engineering/scientific web application.
  • Given that numpy only works properly with CPython (not PyPy) for not, that would be your main choice. ShedSkin may integrate with numpy in the future, Cython integrates well with numpy right now. Therefore CPython 2.x + numpy + Cython is probably the right choice. I don't know anything about llvm-clang I'm afraid. Cheers, Ian.
  • Rob
    Downloaded, the first few pages looks good. Hope it's a good read. I will try to give my feedback once done. Regards, Rob http://www.brainwavelive.com/services/python-application-development.html