Archives of Data science

PyDataLondon Conference 2015 Call for Proposals now OPEN (yay!) for June 19-21

PyDataLondon 2015 will take place June 19-21 at Bloomberg’s HQ in Central London, we’ll have 300 people, multiple tracks and a very solid set of speakers and teachers. You should come. You should probably speak and share your knowledge. In fact – you should submit a talk to our Call for Proposals, it opens this […]

A review of ModelInsight’s growth this last year

Early last year Chris and I founded ModelInsight, a boutique Python-focused Data Science agency in London. We’ve grown well, I figure some reflection is in order. In addition the Data Science scene has grown very well in London, I’ll put some notes on that down below too. Through consulting, training, workshops and coaching we’ve had […]

PyDataParis 2015 and “Cleaning Confused Collections of Characters”

I’m at PyDataParis, this is the first PyData in France and we have a 300-strong turn-out. In my talk I asked about the split of academic and industrial folk, we have 70% industrialists here (at least – in my talk of 70 folk). The bulk of the attendees are in the Intro track and maybe […]

Scikit-learn training in London this April 7-8th

We’re running a 2 day scikit-learn and statsmodels training course through my ModelInsight with Jeff Abrahamson (ex-Google) at the start of April (7-8th) in central London. You should join this course if you’d like to: confidently use scikit-learn to solve machine learning problems strengthen your statistical foundations so you know both what to use and why […]

Data-Science stuff I’m doing this year

2014 was an interesting year, 2015 looks to be even richer. Last year I got to publish my High Performance Python book, help co-organise the rather successful PyDataLondon2014 conference, teach High Performance in public (slides online) and in private, keynote on The Real Unsolved Problems in Data Science and start my ModelInsight AI agency. That […]

Starting Spark 1.2 and PySpark (and ElasticSearch and PyPy)

The latest PySpark (1.2) is feeling genuinely useful, late last year I had a crack at running Apache Spark 1.0 and PySpark and it felt a bit underwhelming (too much fanfare, too many bugs). The media around Spark continues to grow and e.g. today’s hackernews thread on the new DataFrame API has a lot of […]

Lightning talk at PyDataLondon for Annotate

At this week’s PyDataLondon I did a 5 minute lightning talk on the Annotate text-cleaning service for data scientists that I made live recently. It was good to have a couple of chats after with others who are similarly bored of cleaning their text data. The goal is to make it quick and easy to […]

Data Science Jobs UK (ModelInsight) – Python Jobs Email List

I’ve had people asking me about how they can find data scientists in London and through our PyDataLondon meetup we’ve had members announcing jobs. There’s no central location for data science jobs so I’ve put together a new list (administered through my ModelInsight agency). Sign-up to the list here: Data Science Jobs UK (ModelInsight) Aimed […]

A first approach to automatic text data cleaning

In October I gave the opening keynote at PyConIreland on The Real Unsolved Problems in Data Science. One of the topics I covered was poor quality data, by some estimates data cleaning occupies 50-80% of a data scientist’s time. Personally I’ve just spent the better part of last year figuring out ways to convert poorly-represented […]