Archives of Python

My first commit to Pandas

I’ve used the Pandas data science toolkit for over a decade and I’ve filed a couple of issues, but I’ve never contributed to the source. At the weekend I got to balance the books a little by making my first commit. With this pull request I fixed the recent request to update the pct_change docs […]

Skinny Pandas Riding on a Rocket at PyDataGlobal 2020

On November 11th we saw the most ambitious ever PyData conference – PyData Global 2020 was a combination of world-wide PyData groups putting on a huge event to both build our international community and to leverage the on-line only conferences that we need to run during Covid 19. The conference brought together almost 2,000 attendees […]

“Making Pandas Fly” at EuroPython 2020

I’ve had a chance to return to talking about High Performance Python at EuroPython 2020 after my first tutorial on this topic back in 2011 in Florence. Today I spoke on Making Pandas Fly with a focus on making Pandas run faster. This covered: Categories and RAM-saving datatypes to make 100-500x speed-ups (well, some of […]

Weekish notes

I’ve recently switched back from Sourdough yeast to dried packet yeast mix, given a recipe by a colleague (thanks Nick!). I immediately set to work modifying his recipe (well, cutting out steps if we’re honest). The first loaf looked fine but was bland – I cut out too much salt. The next was really very […]

Weekish notes

I gave another iteration of my Making Pandas Fly talk sequence for PyDataAmsterdam recently and received some lovely postcards from attendees as a result. I’ve also had time to list new iterations of my training courses for Higher Performance Python (October) and Software Engineering for Data Scientists (September), both will run virtually via Zoom & […]

Week note

Well, mid-next-week note I guess. I gave another variant of my higher performance Python talk last night for PyDataUK to 250 live streamers, we had some good questions, cheers all. On Friday Micha & I heard that the 2nd edition of our Higher Performance Python book has gone to the printers – we’d said we’d […]

“Flying Pandas” and “Making Pandas Fly” – virtual talks this weekend on faster data processing with Pandas, Modin, Dask and Vaex

This Saturday and Monday I’ve had my first experience presenting at virtual conferences – on Saturday it was for Remote Pizza Python (brilliant line-up!) and on Monday (note – this post predates the talk, I’ll update it tomorrow after I’ve spoken) at BudapestBI. UPDATE added 2nd variant of Making Pandas Fly for a short-notice PyDataUK […]

New Higher Performance Python class (June 1-3)

I’ve listed my next Higher Performance Python public class, it’ll run online for 3 mornings on June 1-3 during UK hours. We’ll use Zoom and Slack with pre-distributed Notebooks and modules and you’ll run it using an Anaconda environment. Here’s the write-up from my recent class. We’ll focus on Profiling to find what’s slow in […]

Notes on last week’s Higher Performance Python class

Last week I ran a two-morning Higher Performance Python class, we covered: Profiling slow code (using a 2D particle infection model in an interactive Jupyter Notebook) with line_profiler & PySpy Vectorising code with NumPy vs running the original with PyPy Moving to Numba to make iterative and vectorised NumPy really fast (with up to a […]

Another Successful Data Science Projects course completed

A week back I ran the 4th iteration of my 1 day Successful Data Science Projects course. We covered: How to write a Project Specification including a strong Definition of Done How to derisk a new dataset quickly using Pandas Profiling, Seaborn and dabl Building interactive data tools using Altair to identify trends and outliers […]