I’ve listed my next Higher Performance Python public class, it’ll run online for 3 mornings on June 1-3 during UK hours. We’ll use Zoom and Slack with pre-distributed Notebooks and modules and you’ll run it using an Anaconda environment. Here’s the write-up from my recent class.
We’ll focus on
- Profiling to find what’s slow in your code so you spend your time fixing the right things (this is so important, our intuitions are always wrong!)
- Switching to NumPy to get benefits from vectorisation
- Compiling with Numba to get C-like speeds for very little effort (we’ll get a 200x speed-up overall)
- Run in parallel with OpenMP and with JobLib to take advantage of multiple cores
- Learn slow and faster ways of solving problems in Pandas (we’ll see a massive speed-up once we go slightly “under the covers” with Pandas and avoid doing silly access operations)
- Use Numba compiled functions to process Pandas data (using the raw=True trick)
- Use Dask to process Pandas in parallel to use all your cores when your data fits in RAM
- Look into using Dask to process bigger-than-RAM datasets
- Review other tooling and process options to make you generally more performant in your work
Feel free to contact me if you have questions about the course. I’m currently not planning to run another iteration of this for some months (possibly October for the next one).
Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.