Higher Performance Python (ODSC 2019)

Building on PyDataCambridge last week I had the additional pleasure of talking on Higher Performance Python at ODSC 2019 yesterday. I had a brilliant room of 300 Pythonic data scientists at all levels who asked an interesting array of questions:

Happy smiling audience

This talk expanded on last week’s version at PyDataCambridge as I had some more time. The problem-intro was a little longer (and this helped set the scene as I had more first-timers in the room), then I dug a little further into Pandas and added extra advice at the end. Overall I covered:

  • Robert Kern’s line_profiler to profile performance in sklearn’s “fit” method against a custom numpy function
  • Pandas function calling using iloc/iterrows/apply and apply with raw=True (in increasingly-fast order)
  • Using Swifter and Dask to parallelise over many cores
  • Using Numba to get an easy additional 10x speed-up
  • Discussed highly-performant team advice to sanity check some of the options

“It was a fantastic talk.” – Stewart

My publisher O’Reilly were also kind enough to send over a box of the 1st edition High Performance Python books for signing, just as I did in Cambridge last week. As usual I didn’t have enough free books for those hoping for a copy – sorry if you missed out (I only get given a limited set to give away). The new content for the 2nd edition is available online in O’Reilly’s Safari Early Access Programme.

Book signing

The talk ends with my customary note requesting a postcard if you learned something useful – feel free to send me an email asking for my address, I love to receive postcards 🙂 I have an email announce list for my upcoming training in January with a plan to introduce a High Performance Python training day, so join that list if you’d like low-volume announcements. I have a twice-a-month email list for “Ian’s Thoughts & Jobs Listing” which includes jobs I know about in our UK community and my recommendations and notes. Join this if you’d like an idea of what’s happening in the UK Pythonic Data Science scene.

The 2nd edition of High Performance Python should be out for next April, preview it in the Early Access Programme here.

Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.