ArtificialIntelligence PythonJuly 25, 2011

High Performance Python tutorial v0.2 (from EuroPython 2011)

My updated High Performance Python tutorial is now available as a 55 page PDF. The goal is to take you on several journeys which show you different ways of making Python code run much faster (up to 75* on the CPU, faster with a GPU).

UPDATE As of October 2014 I’ll be teaching High Performance Python and Data Science in London, sign-up here to join our announce list (no spam, just occasional notes about our upcoming courses).

UPDATE (2014) I’ve written the High Performance Python book with O’Reilly, it is publishing this year.

UPDATE 1 this talk is superseded by my High Performance Python 1 tutorial from PyCon 2012.

UPDATE 2 I’m thinking of writing an updated guide, if you’re interested in hearing about it please join the High Performance Python Mailing List (I’ve only got a list right now). I’ll make an announce once I know more.

UPDATE 3 (Sept 2012) I was missing the EuroPython video, I’ve embedded it down below.

This is an update to the 49 page v0.1 I published three weeks ago after running the tutorial at EuroPython 2011 in Florence.

Topics covered:

Python profiling (cProfile, RunSnake, line_profiler) – find bottlenecks
PyPy – Python’s new Just In Time compiler, a note on the new numpy module
Cython – annotate your code and compile to C
numpy integration with Cython – fast numerical Python library wrapped by Cython
ShedSkin – automatic code annotation and conversion to C
numpy vectors – fast vector operations using numpy arrays
NumExpr on numpy vectors – automatic numpy compilation to multiple CPUs and vector units
multiprocessing – built-in module to use multiple CPUs
ParallelPython – run tasks on multiple computers
pyCUDA – run tasks on your Graphics Processing Unit
Other algorithmic choices and options you have

The improvement over the last version (v0.1) is that I’ve filled in all the sections now including pyCUDA (there are still a few IAN_TODOs marked, I hope to finish these in a future v0.3). I’ve also added a short section on Algorithmic Choices, link to the new Cython prange operator and show the new numpy module in PyPy.

Here’s the video from EuroPython (via pyvideo.org):

The source code is on my github page. The original slides are on slideshare too. If you’re after a challenge then at the end of the report I suggest some ported versions of the code that I’d like to see.

The report is licensed Creative Commons by Attribution (please link back here) – I’ll also happily accept a beer if you meet me in person! If you’re curious about this sort of work then note that I offer A.I. and high performance computing consulting and training via my Mor Consulting.

Update – ShedSkin 0.9 adds faster complex number support. I haven’t added it to the report yet, evidence in the ShedSkin Group suggests it gets closer to the non-complex-number version (i.e. you don’t have to do more work but you get a nice speed boost whilst still using complex numbers).

Update (Nov 2011) – Antonio and Armin posted a note which explains some of the slowness in PyPy and show how it is competitive, under the right conditions. Armin also contributed a C version which shows PyPy to run as fast as C (for their chosen configuration).

Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.

Annotation, Bottlenecks, Compilation, Complex Number, Computing Consulting, Cpus, Creative Commons, Florence, Gpu, Graphics Processing Unit, High Performance, Journeys, Multiple Computers, Numerical Python, Numpy, Page Pdf, Pycon, Python Code, Python Library, Python Mailing, Python Tutorial, Slideshare, Time Compiler, Vector Operations, Vector Units, Vectors, Version V0

10 Comments

snipe
July 25, 2011 at 11:15 am
Hi! :) Do you know if is any substitution for PyCUDA? I have ATI graphical card and I want to use library for it.
Vasiliy Faronov
July 25, 2011 at 1:39 pm
“up to 75* on the CPU” — is that “75 times” or “75%”?
Ian
July 25, 2011 at 1:55 pm
@snipe - pyOpenCL is the ATI equivalent but I don't have code samples yet (but I'd love to include one if someone makes one!) @Vasily - it is "up to 75* faster on the CPU compared to CPython" - see the ShedSkin/Cython examples in the PDF.
Roger Stuckey
July 26, 2011 at 4:37 am
Hi Ian, Thanks for the great tutorial. You've given me a few new ideas for optimising my own code. Just one small typo I noticed on p35: "np.greater(..." (4th bullet point) should be "np.where(..." and so you might want to add a line detailing the "done = np.greater(..." inequality prior to that.
Tobu
July 29, 2011 at 11:08 am
IPython has some quality support for SMP and clustering, with optional MPI integration: http://ipython.org/ipython-doc/dev/parallel/index.html
Valery
September 5, 2011 at 12:12 pm
Hi Ian, are there anything interesting regarding PyPy+CUDA marriage?
Ian
September 5, 2011 at 3:55 pm
Sorry, I haven't heard of anything for PyPy+PyCUDA.
MySchizoBuddy
September 24, 2011 at 4:54 pm
any implementation using llvm-clang instead of gcc. Can you also talk more about which implementation to use for scipy/numpy on a webserver, in reference to a engineering/scientific web application.
Ian
September 27, 2011 at 2:01 pm
Given that numpy only works properly with CPython (not PyPy) for not, that would be your main choice. ShedSkin may integrate with numpy in the future, Cython integrates well with numpy right now. Therefore CPython 2.x + numpy + Cython is probably the right choice. I don't know anything about llvm-clang I'm afraid. Cheers, Ian.
Rob
December 20, 2011 at 12:06 pm
Downloaded, the first few pages looks good. Hope it's a good read. I will try to give my feedback once done. Regards, Rob http://www.brainwavelive.com/services/python-application-development.html

High Performance Python tutorial v0.2 (from EuroPython 2011)

10 Comments

Navigation

Recent Posts

About Ian