Archives of Data science

Week(ish) note

So – High Performance Python 2nd ed finally shipped (Amazon, Goodreads) – yay! In brief we’ve added notes on how you can be a “highly performant programmer”, added some more profiling, added Pandas onto NumPy, improved the Compiling to C chapter with more Numba and a new full section on GPUs (in the first edition […]

Notes on last week’s Higher Performance Python class

Last week I ran a two-morning Higher Performance Python class, we covered: Profiling slow code (using a 2D particle infection model in an interactive Jupyter Notebook) with line_profiler & PySpy Vectorising code with NumPy vs running the original with PyPy Moving to Numba to make iterative and vectorised NumPy really fast (with up to a […]

Notes from Zoom call on “Problems & Solutions for Data Science Remote Work”

On Friday I held an open Zoom call to discuss the problems and solutions posed by remote work for data scientists. I put this together as I’ve observed from my teaching cohorts and from conversation with colleagues that for anyone “suddenly working remotely” the process has typically not been smooth. I invited folk to join […]

Another Successful Data Science Projects course completed

A week back I ran the 4th iteration of my 1 day Successful Data Science Projects course. We covered: How to write a Project Specification including a strong Definition of Done How to derisk a new dataset quickly using Pandas Profiling, Seaborn and dabl Building interactive data tools using Altair to identify trends and outliers […]

Higher Performance Python (ODSC 2019)

Building on PyDataCambridge last week I had the additional pleasure of talking on Higher Performance Python at ODSC 2019 yesterday. I had a brilliant room of 300 Pythonic data scientists at all levels who asked an interesting array of questions: This talk expanded on last week’s version at PyDataCambridge as I had some more time. […]

“Higher Performance Python” at PyDataCambridge 2019

I’ve had the pleasure of speaking at the first PyDataCambridge conference (2019), this is the second PyData conference in the UK after PyDataLondon (which colleagues and I co-founded 6 years back). I’m super proud to see PyData spread to 6 regional meetups and now 2 UK conferences. We had over 200 attendees and the conference […]

“A starter data science process for software engineers” – talk at PyLondinium 2019

I’ve just spoken on “A starter data science process for software engineers” (slides linked) at PyLondinium 2019, this talk is aimed at software engineers who are starting to ask data related questions and who are starting a data science journey. I’ve noted that many software engineers – without a formal data science background – are […]

Second Successfully Delivering Data Science Projects just over

I ran the second iteration of my Successfully Delivering Data Science Projects course last Friday to this happy group, we had a lovely day and good conversation has continued in the teaching slack over the weekend: Topics covered included the design and derisking of data projects (not just machine learning), building a project plan, communicating […]