All posts of Ian

Weekish notes

I gave another iteration of my Making Pandas Fly talk sequence for PyDataAmsterdam recently and received some lovely postcards from attendees as a result. I’ve also had time to list new iterations of my training courses for Higher Performance Python (October) and Software Engineering for Data Scientists (September), both will run virtually via Zoom & […]

“Making Pandas Fly” for PyDataAmsterdam 2020

I thank the PyDataAmsterdam 2020 organisers for another chance to speak on Making Pandas Fly (PyDataAmsterdam 2020). This variant of the talk focuses more on: Understanding when categories beat strings and smaller floats beat larger ones What’s happening with NumPy behind the scenes How we can save 50% of our RAM (and so fit in […]

Weeknote (dtype-diet)

Over the weekend I hacked on dtype_diet – a tool for Pandas users that checks their DataFrame to see if smaller datatypes might be applicable. If so they’d offer no data loss and a reduction in RAM, for Categorical data there’s also the possibility of faster calculations. This tool makes no changes, it recommends the […]

Week(ish) note

So – High Performance Python 2nd ed finally shipped (Amazon, Goodreads) – yay! In brief we’ve added notes on how you can be a “highly performant programmer”, added some more profiling, added Pandas onto NumPy, improved the Compiling to C chapter with more Numba and a new full section on GPUs (in the first edition […]

Week note

Well, mid-next-week note I guess. I gave another variant of my higher performance Python talk last night for PyDataUK to 250 live streamers, we had some good questions, cheers all. On Friday Micha & I heard that the 2nd edition of our Higher Performance Python book has gone to the printers – we’d said we’d […]

“Flying Pandas” and “Making Pandas Fly” – virtual talks this weekend on faster data processing with Pandas, Modin, Dask and Vaex

This Saturday and Monday I’ve had my first experience presenting at virtual conferences – on Saturday it was for Remote Pizza Python (brilliant line-up!) and on Monday (note – this post predates the talk, I’ll update it tomorrow after I’ve spoken) at BudapestBI. UPDATE added 2nd variant of Making Pandas Fly for a short-notice PyDataUK […]

Recent “week notes”

I’ve not done a public “week notes” before. I’ve been hacking on various things and I figure it is worth sharing some of it. Using public Companies House data I’ve started to plot the decline in new company formations in the UK. Here’s a first crack, which shows a decline at the end of March. […]

New Higher Performance Python class (June 1-3)

I’ve listed my next Higher Performance Python public class, it’ll run online for 3 mornings on June 1-3 during UK hours. We’ll use Zoom and Slack with pre-distributed Notebooks and modules and you’ll run it using an Anaconda environment. Here’s the write-up from my recent class. We’ll focus on Profiling to find what’s slow in […]

Notes on last week’s Higher Performance Python class

Last week I ran a two-morning Higher Performance Python class, we covered: Profiling slow code (using a 2D particle infection model in an interactive Jupyter Notebook) with line_profiler & PySpy Vectorising code with NumPy vs running the original with PyPy Moving to Numba to make iterative and vectorised NumPy really fast (with up to a […]

Notes from Zoom call on “Problems & Solutions for Data Science Remote Work”

On Friday I held an open Zoom call to discuss the problems and solutions posed by remote work for data scientists. I put this together as I’ve observed from my teaching cohorts and from conversation with colleagues that for anyone “suddenly working remotely” the process has typically not been smooth. I invited folk to join […]