High Performance Python at PyDataLondon 2014

Yesterday I spoke on The High Performance Python Landscape at PyDataLondon 2014 (our first PyData outside of the USA – see my write-up). I was blessed with a full room and interesting questions. With Micha I’m authoring a High Performance Python book with O’Reilly (email list for early access) and I took the topics from a few of our chapters.

“@ianozsvald providing eye-opening discussion of tools for high-performance #Python: #Cython, #ShedSkin, #Pythran, #PyPy, #numba… #pydata” – @davisjmcc

Overall I covered:

  • line_profiler for CPU profiling in a function
  • memory_profiler for RAM profiling in a function
  • memory_profiler’s %memit
  • memory_profiler’s mprof to graph memory use during program’s runtime
  • thoughts on adding network and disk I/O tracking to mprof
  • Cython on lists
  • Cython on numpy by dereferencing elements (which would normally be horribly inefficient) plus OpenMP
  • ShedSkin‘s annotated output and thoughts on using this as an input to Cython
  • PyPy and numpy in PyPy
  • Pythran with numpy and OpenMP support (you should check this out)
  • Numba
  • Concluding thoughts on why you should probably use JITs over Cython

Here’s my room full of happy Pythonistas 🙂

pydatalondon2014_highperformancepython

“Really useful and practical performance tips from @ianozsvald @pydata #pydata speeding up #Python code” – @iantaylorfb

Slides from the talk:

 

 

UPDATE Armin and Maciej came back today with some extra answers about the PyPy-numpy performance (here and here), the bottom line is that they plan to fix it (Maciej says it is now fixed – quick service!). Maciej also notes improvements planned using e.g. vectorisation in numpy.

VIDEO TO FOLLOW


Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.

9 Comments