AHL Python Data Hackathon

Yesterday I got to attend Man AHL’s first London Python Data hackathon (21-22 April – photos online). I went with the goal of publishing my ipython_memory_usage tool from GitHub to PyPI (success!), updating the docs (success!) and starting to work on the YellowBrick project (partial-success).

This is AHL’s first crack at running a public Python hackathon – from my perspective it went flawlessly. They use Python internally and they’ve been hosting my PyDataLondon meetup for a couple of years (and, all going well, for years to come), they support the Python ecosystem with public open source contributions and this hackathon was another method for them to contribute back. This is lovely (since so many companies aren’t so good at contributing and only consume from open source) and should be encouraged.

Here’s Bernd of AHL introducing the hackathon. We had 85 or so folk (10% women) in the room:

Bernd introducing Python Data hackathon at AHL

I (and 10 or so others) then introduced our projects. I was very happy to have 6 new contributors volunteer to my project. I introduced the goals, got everyone up to speed and then we split the work to fix the docs and to publish to the test PyPI server and then finally to the official PyPI public server.

This took around 3 hours, most of the team had some knowledge of a git workflow but none had seen my project before. With luck one of my colleagues will post a conda-forge recipe soon too. Here’s my team in action (photo taken by AHL’s own CTO Gary Collier):

Team at AHL hackathon

Many thanks to Hetal, Takuma, Robin, Lucija, Preyesh and Pav.

Robin had recently published his own project to PyPI so he had some handy links. Specifically we used twine and these notes. In addition the Pandas Sprint guide was useful for things like pulling the upstream master between our collaborative efforts (along with Robin’s notes).

This took about 3 hours. Next we had a crack at the sklearn-visualiser YellowBrick – first to get it running and tested and then to fix the docs on a recent code contribution I’d made (making a sklearn-compatible wrapper for statsmodels’ GLM) with some success. It turns out that we might need to work on the “get the tests” running process, they didn’t work well for a couple of us – this alone will make for a nice contribution once we’ve fixed it.

Overall this effort helped 6 people contribute to two new projects, where 5 of the collaborators had only some prior experience (as best I remember!) with making an open source contribution. I’m very happy with our output – thanks everyone!

Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.