PyDataLondon 2015 Write-up and my “Ship It!” talk on publishing data science products

(this post is still evolving June 22nd…)

We’ve just run our 2nd PyDataLondon conference, we’ve had around 300 attendees, 3 keynotes, 3 tracks over 3 days. It has been fab! We’ve grown 50% on last year along with 20% female speakers and 20% female attendees (both up on last year). I’m really happy with the results of all the hard work of our conference committee. Here’s Helena giving our opening keynote:

Video status – forthcoming. Slide status – they’ll get linked in this github repo.

Our keynoters were Helena Bengstton (Editor for Data Projects at The Guardian), Eric Drass (the data scientist’s artist-philosopher, see @bffbot2 and @theresamaybot) and Meta Brown (speaker and writer for statistics and business analytics). Meta gave me a copy of her latest book Data Mining for Dummies which covers the CRISP-DM process she discussed – yay and thanks!

Florian has posted a huge set of high quality conf photos, go dig to see some gems!

Our monthly meetup is now at 1,650 members and our 13th meetup is scheduled for Tues July 7th at AHL (near Bank tube) – go RSVP now! If you have questions about Pythonic data science – you’ll get them answered with 200+ folk at our meetups (probably in the pub after – buy beer and talk to folk!).

I gave a talk entitled “Ship It!“, breaking down 10 years of experience on building, running and deploying successful data science projects. It reflects on recent experiences consulting on automated contract recruitment over 1.5 years with ElevateDirect here in London. I looked at 10 years of my consulting projects, removed those that failed (noting reasons why) and then categorised those that worked into the 4 groups that I start the talk with. After that I build on lessons as the groups build into each other.

Peadar Coyle (@springcoil) spoke on deployment recently at PyConItaly, his talk is worth a watch. You’ll probably want to catch up on his PyMC tutorial that we had over the weekend at PyDataLondon.

I’m thinking of writing a book (or something like that) in the future on building and shipping data science products, if you’re interested take a look and join the announce list.

In my talk and during the closing notes I made a point to everyone – if there’s one simple thing you do today to help support open source projects (particularly if you use them, but don’t contribute to them in other ways) – please please Cite the Project in Public. scikit-learn has a citations page, this helps them raise money from funding bodies, they justify the funding by showing how it helps companies do more business. All you have to do is write a paragraph’s testimonial and send it to your favourite project. The scikit’s, scipy, numpy, ML tools, matplotlib etc – they’d all love to have new testimonials. It’ll take you 15 minutes, please go do it.

Other reviews:

Since the conference was a huge success it means a good chunk of money was raised for NumFOCUS, the non-profit that backs the PyData conferences. As a result the awards and scholarships that they provide to the community including the John Hunter scholarship, diversity grants and women in tech, grants for development on tools like AstroPy, IPython, SymPy and Software Carpentry will get a huge boost. Good job all!

“”If you want to support open source projects publicly say you use them and write testimonials” – @ianozsvald at #pydataldn15 YES PLEASE.” @drmaciver of Hypothesis

UPDATE – David has a testimonials page for his Hypothesis library.

I’ll call out a new project that I mentioned- DSADD (Data Scientists Against Dirty Data – now known as Engarde), a set of decorators to apply to Pandas DataFrames to set constraints on your data. This helps when dealing with dirty data.

I also got to do another book signing for my High Performance Python, along with Yves and his Python for Finance:

Our team (my co-chair Emlyn and team Cecilia, Graham, Florian, Slavi and Calvin) did a wonderful job, along with Leah and James (our International Team [they make all the background stuff happen – particularly Leah!]), and Bloomberg’s team including Amy, Kenny and Darren:

Our wonderful sponsors were Continuum (thanks for PyDatas and for Anaconda!), Bloomberg (thanks for the venue!), Pivigo, Pivotal, Adthena, Pluralsight, Plotly, Sainsburys. Huge thanks to you all for making this possible.

The party last night was in a local Bier Keller with a live Oompah Band (don’t ask!). Much conversation was had 🙂

It was encouraging to see more folk using Python 3.4 at the conference, though still 2.7 was in the majority. I wonder how news that the next Ubuntu (15.10 Wily Werewolf) is switching to Python 3.5 in October will help with people’s transition?

If you’re interesting in hearing about PyDataLondon 2016, join this announce list. It’ll be almost-zero-volume for the next 6 months, I’ll do something with it once we’re planning the next conference.

If you’re interested in other conferences, also check out:

Finally – if you’re after a Data Science Job, I run a very-low-volume jobs list (mostly for London but for the UK in general), read about it here. My ModelInsight also runs data science Python training in London, we announce new training courses on this list. All the lists are MailChimp (so you can unsubscribe instantly at any time), I rarely post to the lists and I keep it all relevant.


Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.

41 Comments