Yesterday I spoke on The High Performance Python Landscape at PyDataLondon 2014 (our first PyData outside of the USA – see my write-up). I was blessed with a full room and interesting questions. With Micha I’m authoring a High Performance Python book with O’Reilly (email list for early access) and I took the topics from a few of our chapters.
“@ianozsvald providing eye-opening discussion of tools for high-performance #Python: #Cython, #ShedSkin, #Pythran, #PyPy, #numba… #pydata” – @davisjmcc
Overall I covered:
- line_profiler for CPU profiling in a function
- memory_profiler for RAM profiling in a function
- memory_profiler’s %memit
- memory_profiler’s mprof to graph memory use during program’s runtime
- thoughts on adding network and disk I/O tracking to mprof
- Cython on lists
- Cython on numpy by dereferencing elements (which would normally be horribly inefficient) plus OpenMP
- ShedSkin‘s annotated output and thoughts on using this as an input to Cython
- PyPy and numpy in PyPy
- Pythran with numpy and OpenMP support (you should check this out)
- Numba
- Concluding thoughts on why you should probably use JITs over Cython
Here’s my room full of happy Pythonistas 🙂
“Really useful and practical performance tips from @ianozsvald @pydata #pydata speeding up #Python code” – @iantaylorfb
Slides from the talk:
UPDATE Armin and Maciej came back today with some extra answers about the PyPy-numpy performance (here and here), the bottom line is that they plan to fix it (Maciej says it is now fixed – quick service!). Maciej also notes improvements planned using e.g. vectorisation in numpy.
VIDEO TO FOLLOW
Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.
9 Comments