All posts of Ian

“Making Pandas Fly” at EuroPython 2020

I’ve had a chance to return to talking about High Performance Python at EuroPython 2020 after my first tutorial on this topic back in 2011 in Florence. Today I spoke on Making Pandas Fly with a focus on making Pandas run faster. This covered: Categories and RAM-saving datatypes to make 100-500x speed-ups (well, some of […]

Weekish notes

I’ve recently switched back from Sourdough yeast to dried packet yeast mix, given a recipe by a colleague (thanks Nick!). I immediately set to work modifying his recipe (well, cutting out steps if we’re honest). The first loaf looked fine but was bland – I cut out too much salt. The next was really very […]

Weekish notes

I gave another iteration of my Making Pandas Fly talk sequence for PyDataAmsterdam recently and received some lovely postcards from attendees as a result. I’ve also had time to list new iterations of my training courses for Higher Performance Python (October) and Software Engineering for Data Scientists (September), both will run virtually via Zoom & […]

“Making Pandas Fly” for PyDataAmsterdam 2020

I thank the PyDataAmsterdam 2020 organisers for another chance to speak on Making Pandas Fly (PyDataAmsterdam 2020). This variant of the talk focuses more on: Understanding when categories beat strings and smaller floats beat larger ones What’s happening with NumPy behind the scenes How we can save 50% of our RAM (and so fit in […]

Weeknote (dtype-diet)

Over the weekend I hacked on dtype_diet – a tool for Pandas users that checks their DataFrame to see if smaller datatypes might be applicable. If so they’d offer no data loss and a reduction in RAM, for Categorical data there’s also the possibility of faster calculations. This tool makes no changes, it recommends the […]

Week(ish) note

So – High Performance Python 2nd ed finally shipped (Amazon, Goodreads) – yay! In brief we’ve added notes on how you can be a “highly performant programmer”, added some more profiling, added Pandas onto NumPy, improved the Compiling to C chapter with more Numba and a new full section on GPUs (in the first edition […]

Week note

Well, mid-next-week note I guess. I gave another variant of my higher performance Python talk last night for PyDataUK to 250 live streamers, we had some good questions, cheers all. On Friday Micha & I heard that the 2nd edition of our Higher Performance Python book has gone to the printers – we’d said we’d […]

“Flying Pandas” and “Making Pandas Fly” – virtual talks this weekend on faster data processing with Pandas, Modin, Dask and Vaex

This Saturday and Monday I’ve had my first experience presenting at virtual conferences – on Saturday it was for Remote Pizza Python (brilliant line-up!) and on Monday (note – this post predates the talk, I’ll update it tomorrow after I’ve spoken) at BudapestBI. UPDATE added 2nd variant of Making Pandas Fly for a short-notice PyDataUK […]

Recent “week notes”

I’ve not done a public “week notes” before. I’ve been hacking on various things and I figure it is worth sharing some of it. Using public Companies House data I’ve started to plot the decline in new company formations in the UK. Here’s a first crack, which shows a decline at the end of March. […]

New Higher Performance Python class (June 1-3)

I’ve listed my next Higher Performance Python public class, it’ll run online for 3 mornings on June 1-3 during UK hours. We’ll use Zoom and Slack with pre-distributed Notebooks and modules and you’ll run it using an Anaconda environment. Here’s the write-up from my recent class. We’ll focus on Profiling to find what’s slow in […]