Entrepreneurial Geekiness

Ian is a London-based independent Chief Data Scientist who coaches teams, teaches and creates data products. More about Ian here.
Entrepreneurial Geekiness
Ian is a London-based independent Chief Data Scientist who coaches teams, teaches and creates data products.
Coaching
Training
Jobs
Products
Consulting

Allergic Rhinitis (“Why do I always sneeze?!”) research project using Machine Learning

Since April my wife (@fluffyemily) and I have been running a research project around her allergies. She sneezes all year and we’re trying to figure out the cause. Allergic Rhinitis affects 10-30% of Westerners, in Emily’s case it is all-year so it isn’t just pollen related. We figure that a good data-collection process coupled with robust analysis might reveal some of the causes of sneezing such that Emily’s in better control of her Rhinitis.

Emily’s a senior iOS developer with Mozilla, she wrote an open source App for her iPhone to log her sneezes, antihistamine use and interactions with “things” like animals. The App gives us a time-stamp and geolocation. Since she’s mostly in London we’ve got a rich source of events to join to other datasets.

This post is just to put down a marker. I’ve made some progress using Machine Learning to predict when an antihistamine might be used. Currently I can out-predict a Dummy (majority-class) classifier using many cross-validation runs, this is hardly brilliant but we didn’t expect diagnosing a long-term allergy to be a simple affair! Exploratory data analysis on the data shows lots of interesting behaviours, I hope to talk about some of these in the future.

We’ve tried (and so far rejected) air-born particulates as a reason for her allergies via Kings College LondonAir data (thanks!). Weather data is more promising using a local wunderground station (Emily seems to be a little sensitive to humidity and windspeed). I’ve recently started work on MyFitnessPal logged data (the Python 3.4 port was thankfully easy) to start to look at alcohol (a known histamine modifier) and possibly other food.

Behind the scenes I’ve got a collaborative group (thanks Frank and Giles!) in Slack and a private github repo, I plan to talk a little on how this works. I think talking about ways we can collaborate on research projects has value, anything that helps us move on from just working in an office seems like a good idea.

If you’re interested in hearing updates about this project and maybe getting involved to log your own allergy data, join this email announce list. Your email will be kept private, I’ll just send you an email every now and again when we’ve made some progress (which will probably appear here) and when we need volunteers.

Ultimately we’d like to help predict the causes of allergies for other folk. We’ve been talking about this for around 2 years, it is encouraging to see research like this pointing to the use of ML to predict and model the body’s behaviours.


Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.
Read More

Announcing PyDataLondon 2016 (May 6-8th)

We’re very happy to announce that Bloomberg will host us a second time for PyDataLondon 2016 (our 3rd annual conference). We’ll run the conference over May 6-8th (a tutorial day and 2 conference days as last time) with approximately 330 people in attendance. The location is Central London – near Bank underground station and London Bridge.

Our PyDataLondon meetup community has grown amazingly in the last year, we’ve almost doubled in size to 2,500+ members with 200 in the room each month. We’ve had 19 events in almost 2 years, mostly around Python (some with R, Julia and Matlab), mostly on data science (and stats, visualisation and high performance) and all with a lovely collaborative audience.

The conference Call for Proposals will be opened very soon (in a week or two). If you’d like to speak in front of 330 active data scientists in London’s most active data science community, get thinking on your topic. We’re interested in data science topics, mostly around Python (but we’re cool with other tech and theory). Extra attention will be paid to talks offering real-world stories (for both success and failure – all lessons are equally useful).

Sign-up to this email announce list to be kept in the loop, I’ll write a couple of mails when the CfP is open and as the conference plans develop.

If you’ve not been to one of our conferences before checkout my write-ups from 2015 and 2014.

If you’re hiring or you have a relevant product – think on sponsoring. We expect to sell all of our spots this year due to increased demand for strong data scientists – if you’d like to have a prime spot in the central room (all the talk-rooms hang off of the central room so sponsors are in the thick of it), do get in contact.

You might also be interested in PyDataAmsterdam on March 12-13th (their Call for Proposals is already open).


Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.
Read More

“Data Science Delivered” (a collection of notes on getting stuff shipped)

Over the last year I’ve given a collection of keynotes and talks around shipping and supporting data science products with Python. I’ve started to gather up my notes into a document – they’re hosted on github as Data Science Delivered, currently its around 5 pages of A4. I put the rough form together after my last keynote of the year in Budapest.

Right now it has notes on how to approach a new project, ways of dealing with bad data, ways to ship working products and ways projects might get sunk.

I’m slowly going to add to this list, I think the rough structure is in place and there’s a lot of detail to add. If you’re interested in getting updates then add your email here and I’ll mail you on occasion when I’ve added a new chunk of information.


Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.
Read More

“Featherweight” data science API to publish Python functions on the web

One of the challenges I’ve encountered when coaching data science teams in smaller organisations is the difficulty of publishing proof-of-concept data science products via web calls, when the team doesn’t know anything about web programming. My preference is to use Flask (and flask-restful and maybe Swagger docs) but that’s an awful lot of learning to put onto a non-engineering researcher to help them publish code that another team can consume.

I’ve prototyped “featherweight” as a very simple solution to this problem. Behind the scenes Flask is used to publish your function(s) on a local server. You can then call the function with standard GET requests and key/value arguments (e.g. via cURL or a web browser or the requests module) and get a block of JSON that wraps whatever results your function returned.

The goal is to make it super-easy for a non-engineering researcher to take their Python function or method and to publish it on a web API, without knowing anything about web programming. Examples on github include publishing a simple math function and publishing scikit-learn’s Iris classifier.

Whilst this API won’t solve production use-cases (it is single-threaded, it doesn’t do any clever logging, there’s no additional security) it will solve proof-of-concept and dev-level usage. It also opens the door to moving from Featherweight to a custom Flask interface. Feedback happily received!


Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.
Read More

Opening Plenary at BudapestBI Forum 2015

I’ve just given my final talk for the year – I’m “at my other home” in Budapest (I’m half-Hungarian) and have had the honour of opening Bence and team’s BudapestBI Forum 2015. This conference has both an open-source-day and (tomorrow) an enterprise-day, all around analytics and with lots of Python and R.

This talk is an iteration of my previous Shipping talks, in part backed by results from our latest PyDataLondon survey to 2,000 members where we’ve asked about member frustrations and I’ve integrated some of the results into this talk:

Shipping Data Science Products
(source)

Here are my slides:

In the room we had roughly 2/3 ‘engineers/builders’ and 1/3 ‘researchers/analysts’, it seems that Python and R are used by a large number of folk here today.

I also ‘released’ a set of my notes that I’ve tentatively entitled “Data Science Delivered” – this is a github doc with a series of the notes that I wish I’d learned years ago. Right now these notes are super-rough, I figure “release early, release often” will help me refine these.

It is based in part through my talking, teaching and coaching over the last couple of years. I intend to add more in the next couple of weeks (so hopefully by November 2015 it’ll be far less rough!), I’d like to add some Notebooks as examples. You’re welcome to post bugs/requests and I’ll try to add notes, if I know about those areas. Please feel free to share some of your experiences (via @ianozsvald, via email, via Bugs etc).


Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.
Read More