Data science PythonApril 14, 2020

Notes on last week’s Higher Performance Python class

Last week I ran a two-morning Higher Performance Python class, we covered:

Profiling slow code (using a 2D particle infection model in an interactive Jupyter Notebook) with line_profiler & PySpy
Vectorising code with NumPy vs running the original with PyPy
Moving to Numba to make iterative and vectorised NumPy really fast (with up to a 200x improvement on one exercise)
Ways to make applying functions in Pandas much faster and multicore (with Dask & Swifter, along with Numba)
Best practice for each tool
Excellent discussion where I got taught a few new tips too (and in part this is why I love teaching smart crowds!)

If you’d like to hear about the upcoming iterations please join my low-volume training announce list and I offer a discount code in exchange for you spending two minutes filling in this very brief survey about your training needs.

“Working through the exercises from day 1 of the high performance python course from. Who knew there was so much time to shave off from functions I use every day?.. apart from Ian of course” – Tim Williams

Here’s my happy class on the first morning:

We used Zoom to orchestrate the calls with a mix of screen-share for my demos and group discussion. Every hour we took a break, after the first morning I set some homework and I’m waiting to hear how the take-home exercise will work out. In 2 weeks we’ll have a follow-up call to clear up any remaining questions. One thing that was apparent was that we need more time to discuss Pandas and “getting more rows into RAM” so I’ll extend the next iteration to include this. A little of the class came directly from the 2nd edition of my High Performance Python book with O’Reilly (due out in May), almost all of it was freshly written for this class.

In the class Slack a bunch of interesting links were shared, we got to discuss how several people use Numba in their companies with success. Whilst I need to gather feedback from my class it feels like the “how to profile your code so you focus your effort on the real bottlenecks” was the winner from this class, along with showing how easily we can use Numba to speed up things in Pandas (if you know the “raw=True” trick!).

I plan to run another iteration of this class, along with online-only versions of my Successfully Delivering Data Science Projects & Software Engineering for Data Scientists – do let me know, or join my training email list, if you’d like to join.

Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.

Notes on last week’s Higher Performance Python class

Navigation

Recent Posts

About Ian