We’ve just drawn PyDataLondon 2014 to a close, it has been a wonderfully successful weekend. The growth of Python’s use for data science in the last few years here in the UK is pretty phenomenal. Many thanks to Continuum Analytics and NumFocus for backing and organising the PyData conferences.
“Start of the week after busy weekend attending 1st #PyData London. Thanks to organisers & speakers: a smashing set of talks & gr8 community.” @lindauruchurtu
“If it weren’t for the succession of great talks at #pydata Ldn, I’d be getting quite upset about the AUS v SA test. Thank you @PyDataConf” @davisjmcc
We’ve had a fab weekend and a packed schedule with training and varied talks including:
- Two great keynotes (Deutsche Börse finance by Felix Fernandez and ‘big data’ brain research on a budget by Gael Varoquaux)
- Machine learning (mostly scikit-learn) and text processing
- Practical discussion of how and why things work (including some hard-won lessons on processes and statistics) and strong lessons on mistakes to avoid
- Art and economics
- Lots of IPython Notebook
- Some R and Matlab (we need more of this!)
- Great lightning talks including live rocket-science-in-the-Notebook to close the weekend
- Speakers from both industry (including BAE, the Met Office and Hedge Funds through to fresh startups) and academia stretching through Europe
One outcome from Gael’s keynote was the importance of citing the open source projects that get used to help highlight their need for funding and resources:
“next time you write an article that uses scikit-learn and friends, cite the software you use, that will help authors, eg get funding #pydata” – @dimazest
I ran a panel asking “Shouldn’t more companies be using data science?” – the deliberately loaded question was addressed by a a range of industrial representatives including James from New York, Jonathan, Johnny, Dirk, Ian and Philip. The short answer seemed to be that more companies were taking risks (and winning the rewards) of analysing their data and that some more training (both for scientists and for managers) could help things along.
“#PyData panel first question: have you done anything data analysis related within the last six month? (half the room raises hands)” James Powell
Through my Mor Consulting I talked on The Landscape of High Performance Python by taking a look at profiling techniques and compiler options for single-machine multi-core speed-ups, obviously this is somewhat connected to the High Performance Python book I’m working on (hopefully an early release of the first chapters will be out shortly).
“Like @ianozsvald ‘s ‘team velocity’ to describe how clean slow code can be better than complex fast code in terms of team development” – Mark Basham of Diamond Synchrotron
Renowned Brightonian artist Eric Drass spoke on the confluence of art, mass data, surveillance, the redaction of political positions (and how nothing is ever really removed from the internet – AlgoCameron) and Hugh Hefner:
Martin Goodson‘s “Most Winning A/B Test Results are Illusory” talk has hit HackerNews with good discussion via his published paper.
(reformed string-theorist) Linda spoke on trying #sklearn as an avid R user for music recommendation, highlighting some of the highs and lows of both toolsets (and noting the sillyness of the ‘language wars’):
My colleague Bart Baddeley discussed problems and solutions in clustering approaches, IPython Notebook with all examples available online:
“Similarity matrices are a neat way of eye-balling whether you’ve chosen the right number of clusters #pydata” – Hugo Carr
Neri covered NLP and ML using NLTK and scikitlearn for real-time customer support at Conversocial (a successful London customer support startup):
Philippe Bracke spoke on house price rents and yields, modelled during his PhD:
“Interesting conclusion from @PhilippeBracke #pydata you earn less money from renting more expensive properties” – Ian Taylor
SkimLinks sponsored a fun Saturday party (they’re hiring!). The conference series is generously sponsored by Continuum Analytics (it all started in the USA – hello Bryan!) and supported through the non-profit NumFocus organisation (and Leah does a rather ace job of pulling all the loose strands into a cohesive whole!).
Level39 in Canary Wharf provided the venue. Additional sponsors include Lyst who are hiring (hi Seb!), Python Academy (hello Mike!), Python Software Foundation, Knowsis, DataRobot (hello Jeremy and Peter!), Python Weekly and O’Reilly.
The view from Level39 was rather nice (their space is ace – visit it if you get a chance – thanking Jacqui for the photo):
Clearly we have a strong base here to build from for future conferences. EuroSciPy 2014 (Cambridge, August) was discussed and PyDataBerlin was announced, it’ll happen in conjunction with EuroPython (July, Berlin). I’ll be at all three.
More write-ups are available:
For future events we’ll have to work on female attendance (I counted 10% – this surely can be improved), we also want more interdisciplinary talks (we had some R and Matlab – we need more languages and other approaches). Overall I’m super happy with the outcome, we organised this in under two months, we got a fab turn-out and a stellar set of speakers (from nearby, throughout Europe and out to the USA). The next event can only be stronger still.
We collected slides and everything was recorded, videos will hopefully be up in a week.
I thank the organising team – Leah (NumFocus) kept us all on track, Emlyn, Cecilia, Florian, Yves and James here and our past-PyData American supporters all kept things moving in what appeared to be a rather effortless way. It wouldn’t have worked without everyone’s support including all the custodians of local usergroups who kindly spread the word – many thanks to you all.
Ian applies Data Science as an AI/Data Scientist for companies in Mor Consulting, founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.