“Creating correct and capable classifiers” at PyDataAmsterdam 2018

This weekend I got to attend PyDataAmsterdam 2018 – this is my first trip to the Netherlands (Yay! It is lovely here). The conference grew on last year to 345 attendees with over 20% female speakers.

In addition to attending some lovely talks I also got to run another “Making your first open source contribution” session, with James Powell and a couple of people in 30 minutes we fixed some typos in Nick Radcliffe’s tdda project to improve his overview documentation. I’m happy to have introduced a couple of new people to the idea that a “contribution” can start with a 1 word typo-fix or adding notes to an existing bug report, without diving into the possibly harder world of making a code contribution.

We also had Segii along as our NumFOCUS representative (and Marci Garcia of the Pandas Sprints has done this before too). If you want to contribute to the community you might consider talking to NumFOCUS about how to be an ambassador at a future conference.

I gave an updated talk on my earlier presentation for PyDataLondon 2018, this time I spoke more on :

  • YellowBrick‘s ROC curves
  • SHAPley machine learning explanations
  • Along with my earlier ideas on diagnosis using Pandas and T-SNE

I had a lovely room, wide enough that I only got a third of my audience in the shot below:

My audience at PyDataAmsterdam 2018

I’ve updated some of the material from my London talk, particularly I’ve added a few slides on SHAPley debugging approaches to contrast against ELI5 that I used before. I’ll keep pushing this notion that we need to be debugging our ML models so we can explain why they work to colleagues (if we can’t – doesn’t that mean we just don’t understand the black box?).

Checking afterwards it is lovely to get supportive feedback, thank you Ondrej and Tobias:

Here are the slides (the code has been added to my data_science_delivered github repo):

I’m really happy with the growth of our international community (we’re up to 100 PyData meetups now!). As usual we had 5 minute lightning talks at the close of the conference. I introduced the nbdime Notebook diff tool.

I’m also very pleased to say that I’ve had a lot of people come up to say Thanks after the talk. This is no doubt because I now highlight the amount of work done by volunteer conference organisers and volunteer speakers (almost everyone involved in running a PyData conference is an unpaid volunteer – organisers and speakers alike). We need to continue making it clear that contributing back to the open source ecosystem is essential, rather than just consuming from it. James and I gave a lightning talk on this right at the end.

Update – I’m very happy to see this tweet about how James’ and my little talk inspired Christian to land a PR. I’m also very happy to see this exchange with Ivo about potentially mentoring newer community members. I wonder where this all leads?


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight and in his Mor Consulting, sign-up for Data Science tutorials in London. He also founded the image and text annotation API Annotate.io, lives in London and is a consumer of fine coffees.