I’ve not done a public “week notes” before. I’ve been hacking on various things and I figure it is worth sharing some of it.
Using public Companies House data I’ve started to plot the decline in new company formations in the UK. Here’s a first crack, which shows a decline at the end of March. This data comes monthly as a single dump so it didn’t contain April. Here’s a second crack going back 10 years, it shows co-ordinated drops in activity during UK public holidays (and this March still looks awful).
For this third crack I’ve used the Companies House API to augment the static dump with up-to-date data for April (which’ll be replaced when the new data dump is provided in a week). There’s a 3 week current window showing “no dissolutions” which I suspect means they’ve not been added to the public database, the decline in registrations is clear. I’m guessing registrations go via a different human process than dissolutions and dissolutions might be very laggy due to admin.
In Pandas I learned about the “memory_usage” function which gives a per-column memory report. Benjamin noted that this appears in Dask and CuDF too in a reply.
For my upcoming Remote Pizza Python talk (tomorrow) on Modin, Dask & Vaex I’ve delved further into Modin and Dask. The Modin folk gave useful feedback for “how Modin is working” and I’ve got an open question on Dask on stackoverflow regarding memory usage.
On Monday I give a talk remotely for PyDataBudapest which focuses more on how to get more out of Pandas on a smaller-data scenario. Experiences for both of these talks will go into my upcoming Higher Performance Python training (start of June).
My garden is doing well – I’m now eating my new radishes. Kilos of flour will arrive soon so I can expand my bread making experiments too!
Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.