Over the weekend I hacked on dtype_diet – a tool for Pandas users that checks their DataFrame to see if smaller datatypes might be applicable. If so they’d offer no data loss and a reduction in RAM, for Categorical data there’s also the possibility of faster calculations. This tool makes no changes, it recommends the code you might copy into your project. I developed this as one of the ideas I didn’t get around to building whilst I was working on my book, but since that’s now published…
Talking of which – Micha and I have cleaned up our profiles on Amazon and Goodreads for High Performance Python 2nd ed. It is a pity we can’t do any book signings at events (seeing as we won’t have physical events for a while!). One thing I might do via my “thoughts & jobs” email list is to organise a Friday “coffee morning” to chat with whoever turns up about high performance Python and other subjects.
I’ve also learned that if I make a 1kg sourdough dough, immediately after kneading I can put 0.5kg into the fridge and 4 days later it cooks just fine. This is a nice refinement. I’m now interspersing buying fresh bread (for variety and convenience) with making my own and it feels like a more sustainable practice.
All going well I’ll be talking this month at PyDataAmsterdam and next month at EuroPython on higher performance Python and Pandas, I figure the dtype_diet tool will fit in nicely with a discussion about the new Pandas dtypes and their benefits over numpy dtypes.
Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.