Higher Performance Python

This course is for anyone who uses Pandas, NumPy and the scientific Python stack regularly who is frustrated that their code doesn't run fast enough!
The next course has not been planned - contact me for details of private runs.
Get a notification for future courses by filling in this notification form.
This is a 3 morning virtual course (Zoom & Slack) held for a small group of circa 10 people. This course mixes numeric scenario solving, new high performance tools and processes to explore ways to help you and your team be more performant and to write more performant solutions with Python. This is particularly focused on scientific Python.

It is aimed at existing Python programmers who have 2+ years of prior programming experience and who need their Python code to run faster.

This course is aimed at any Pythonic data scientist who:

  • ● Wants to understand which parts of their code is slow - and why
  • ● Needs to make NumPy and Pandas operations faster
  • ● Needs to get more data into RAM for Pandas without using new tools
  • ● Wants to understand how Dask and Vaex can help help scale to multi-core and larger-than-RAM datasets

During the course we'll cover:

  • ● Profiling - to figure out what's slow in Notebooks and modules using cProfile, line_profiler, memory_profiler, %timeit and other tools
  • ● Comparing the differences between pure Python and NumPy for numeric simulations with a "pandemic spread simulator" (2D particle model) looking at a naieve algorithm in pure Python, PyPy, NumPy, then compiling the better algorithm with Numba vs using a smarter choice of more sophisticated algorithm - the goal is to trade off quick and dirty implementations, vectorisation, compilation and actually thinking about better algorithms where each step takes more time and effort
  • ● Looking into making Pandas faster by using better choices of dtypes, better algorithmic choices, looking behind the scenes, using less RAM
  • ● Looking at JobLib to make pure Python code run on multiple cores, with the same for Dask and Swifter on Pandas DataFrames
  • ● Taking a look a bit deeper at Dask and Vaex for bigger-than-RAM scenarios
  • ● Using various exercises to test everyone's knowledge

After the course you'll:

  • ● Have full solutions to all exercises that you can run off-line
  • ● Get a cheatsheet that will help you prioritise your choices when back in the office
  • ● Receive a Certificate of Professional Development
  • ● Be able to join a call 2 weeks later where we follow-up and solve any outstanding questions
● Have continued access to my slack channel to ask questions with other students past and future

During 2020 the course scored 4.7/5.0 overall from 4 classes.

Testimonials

"I would highly recommend Ian's Higher Performance Python course to anyone who is looking for a solid understanding of optimizing native Python with tools like numpy, numba, dask, and many others. His training course is packed with practical examples of how to correctly profile and optimize code, hands-on tasks and in-depth discussions." 
- Elena Sharova

"Thanks Ian! a very practical course on how to improve python programs from the investigation to focus on what is relevant, to the solutions with some ready-to-use tools and techniques. Also a big thumbs up for the quick switch to online course, it was very smooth."
- Stephane, OasisLMF (Catastrophe modelers)
Get in contact with Ian
If you’d like to discuss how Ian can help your team, get in contact by emailing Ian[at]MorConsulting.com
  • Read my book

    Oreilly High Performance Python by Micha Gorelick & Ian Ozsvald
    Oreilly High Performance Python by Micha Gorelick & Ian Ozsvald