Archives of Python

“Making Pandas Fly” at EuroPython 2020

I’ve had a chance to return to talking about High Performance Python at EuroPython 2020 after my first tutorial on this topic back in 2011 in Florence. Today I spoke on Making Pandas Fly with a focus on making Pandas run faster. This covered: Categories and RAM-saving datatypes to make 100-500x speed-ups (well, some of […]

Weekish notes

I’ve recently switched back from Sourdough yeast to dried packet yeast mix, given a recipe by a colleague (thanks Nick!). I immediately set to work modifying his recipe (well, cutting out steps if we’re honest). The first loaf looked fine but was bland – I cut out too much salt. The next was really very […]

Weekish notes

I gave another iteration of my Making Pandas Fly talk sequence for PyDataAmsterdam recently and received some lovely postcards from attendees as a result. I’ve also had time to list new iterations of my training courses for Higher Performance Python (October) and Software Engineering for Data Scientists (September), both will run virtually via Zoom & […]

Week note

Well, mid-next-week note I guess. I gave another variant of my higher performance Python talk last night for PyDataUK to 250 live streamers, we had some good questions, cheers all. On Friday Micha & I heard that the 2nd edition of our Higher Performance Python book has gone to the printers – we’d said we’d […]

“Flying Pandas” and “Making Pandas Fly” – virtual talks this weekend on faster data processing with Pandas, Modin, Dask and Vaex

This Saturday and Monday I’ve had my first experience presenting at virtual conferences – on Saturday it was for Remote Pizza Python (brilliant line-up!) and on Monday (note – this post predates the talk, I’ll update it tomorrow after I’ve spoken) at BudapestBI. UPDATE added 2nd variant of Making Pandas Fly for a short-notice PyDataUK […]

New Higher Performance Python class (June 1-3)

I’ve listed my next Higher Performance Python public class, it’ll run online for 3 mornings on June 1-3 during UK hours. We’ll use Zoom and Slack with pre-distributed Notebooks and modules and you’ll run it using an Anaconda environment. Here’s the write-up from my recent class. We’ll focus on Profiling to find what’s slow in […]

Notes on last week’s Higher Performance Python class

Last week I ran a two-morning Higher Performance Python class, we covered: Profiling slow code (using a 2D particle infection model in an interactive Jupyter Notebook) with line_profiler & PySpy Vectorising code with NumPy vs running the original with PyPy Moving to Numba to make iterative and vectorised NumPy really fast (with up to a […]

Another Successful Data Science Projects course completed

A week back I ran the 4th iteration of my 1 day Successful Data Science Projects course. We covered: How to write a Project Specification including a strong Definition of Done How to derisk a new dataset quickly using Pandas Profiling, Seaborn and dabl Building interactive data tools using Altair to identify trends and outliers […]

Higher Performance Python (ODSC 2019)

Building on PyDataCambridge last week I had the additional pleasure of talking on Higher Performance Python at ODSC 2019 yesterday. I had a brilliant room of 300 Pythonic data scientists at all levels who asked an interesting array of questions: This talk expanded on last week’s version at PyDataCambridge as I had some more time. […]