About

Ian Ozsvald picture

This is Ian Ozsvald's blog (@IanOzsvald), I'm an entrepreneurial geek, a Data Science/ML/NLP/AI consultant, author of O'Reilly's High Performance Python book, co-organiser of PyDataLondon, a Pythonista, co-founder of ShowMeDo and also a Londoner. Here's a little more about me.

High Performance Python book with O'Reilly

View Ian Ozsvald's profile on LinkedIn

ModelInsight Data Science Consultancy London Protecting your bits. Open Rights Group

13 May 2015 - 16:42Data Science Deployed – Opening Keynote for PyConSE 2015

I’ve just had a fab couple of days at PyConSE in Stockholm, I really enjoyed giving the opening keynote (thanks!) and attending two days of interesting talks. The Saturday was packed with data science talks (see below), it felt like a mini PyData or EuroSciPy, most cool!

The goal of my talk was to show use-cases for why you should do data science, why it is valuable, how to do it successfully with Python and how get the data products deployed. The whole shebang in 40 minutes. Tools mentioned include scikit-learn, statsmodels, textract, pandas, matplotlib, seaborn, bokeh, IPython and Notebooks, Spyder, PyCharm, Flask and Spyre.

Sidenote – this is the follow-on to my “The Real Unsolved Problems in Data Science” opening keynote at PyConIreland 2014.

My main points seemed to make it through, phew!

What I take from @ianozsvald talk:
“How can i turn our data into business value?”
“Log everything!”
Think + hypothesize + test @pythse

Exploiting your data is key to staying relevant in your business! Listening to @ianozsvald at #pyconse @scalior

Note – I’ll be updating this write-up a little over the next couple of days (it is the end of the conf and I’m rather shattered right now!).

The slides and video for my Data Science Deployed talk are below:

I’d like to acknowledge Ollie Glass along with Ferenc Huszár (Balderton) and Thomas Stone (Prediction.io) for feedback on early ideas for my talk – cheers gents!

I also plugged PyDataBerlin, our upcoming PyDataLondon (June 19-21, CfP open for just 1 more week) and EuroSciPy on stage, hopefully we’ll see a few more international visitors. I should also have plugged PyConUK too as there’s now a Science Track too!

The following talks from yesterday will interest you, I hope the videos come online soon:

  • Analyzing data with Pandas
  • Data processing and machine learning with Python (slides)
  • Deep Learning and Deep Data Science
  • Hacking Human Language
  • IPython: How a notebook is changing science
  • The Hitchhikers Guide to Python

Here’s a couple of extra links that might be interesting:

Here’s Ilian Iliev’s review of the conference too.

I have a vague idea to write-up these topics more in the future, I’m calling this Building Data Science Products with Python. There’s a mailing list, I’ll email to ask questions a little over the coming months to figure out if/how I should write this.

Thanks everyone for a lovely conference!


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

14 Comments | Tags: Life, pydata, Python

7 May 2015 - 20:2212th PyDataLondon meetup at AHL

We’ve just had our 12th meetup – we’re fully a year old, we’ve nearly 1,500 members and now we’re planning our second conference (the Call for Proposals is open for just another 10 days!). Python Data Science has grown crazily-popular in the last couple of years!

Here’s a photo from last week’s meetup, that’s over 220 people at our new host hedge-fund AHL (they’re hiring):

IMG_20150505_190654

Our two speakers were:

  • Slavi Marinov talking on using gensim for topic classification for financial prediction
  • Lasse Bohling talking on using statistics for football prediction at footballradar.com

Slides are linked in the meetup comments. We’ll take a break for a month to run the conference (June 19-21), then we’ll pick up again in July.


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

5 Comments | Tags: pydata, Python

3 May 2015 - 15:08“#talkpay” tweet salary visualisation

This weekend the #talkpay tag has shown people outing their salaries, to democratise some of this information. This provides some interesting data for visualisation. If you’re curious about a discussion around salary data then @patio11’s blog entry is a good starting point.

@echen grabbed some of the data, I took a copy of the online sheet and made the following code to visualise the salaries. This is a very simplistic analysis, it is mostly US data, there’s no filtering for location (you’d expect San Francisco to pay significantly more than many other US cities).

First, here’s a histogram of the majority of the salaries listed (ignoring the top-9 which go up to $1.1 million which distort the plot):

Next we can filter by some text terms, here’s a similar histogram for software developers. Note the interesting peaks at $80k and $120k, then smaller but obvious bumps at $150k, $200k and $250k:

There’s much less data for teachers but you can get an idea of the difference in likely salaries:

Finally we can plot a normed (summed to 1.0) cumulative histogram, you can think of the data as probabilities to get an idea of the proportion of people who earn less/more than a certain amount:

It is worth remembering that the data is thin, just 800 samples, it is also self-reported so most of the reports will be from people who are confident in being public. It is likely that the true distribution of salaries is lower, as people who aren’t confident are less likely to publish.


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

15 Comments | Tags: pydata, Python

2 May 2015 - 19:56PyDataLondon Conference 2015 Call for Proposals now OPEN (yay!) for June 19-21

PyDataLondon 2015 will take place June 19-21 at Bloomberg’s HQ in Central London, we’ll have 300 people, multiple tracks and a very solid set of speakers and teachers. You should come. You should probably speak and share your knowledge. In fact – you should submit a talk to our Call for Proposals, it opens this weekend and closes May 18th So You Don’t Have Long!

We have a set of Themes for the talks:

  • Medical and Bioinformatics
  • Tools (libraries, IDEs, hardware – whatever feels like a tool)
  • FinTech and Economics
  • Ecommerce and AdTech
  • Other goodies (including Art, Open Data, Data Journalism, NGOs, Gaming, IoTs and Robotics – but open to whatever you think is going to be interesting)

The first three topics are definitely of interest to companies in London, Tooling is important to everyone and the “Other goodies” theme is the catch-all for stuff that’s of interest beyond the normal body of companies we know about. The CfP is only open for less than 3 weeks so don’t hang around! Get a title and short abstract down on paper first and then you can fill in the rest online easily enough.

This conference builds upon PyDataLondon 2014 Conference, we had 200 people last year at the top of Canary Wharf last year. This year we’ll be 50% bigger and in the centre of London. You want to come along!

Please forward this around to people who will find it interesting! We’re keen to have an even wider community than our usual 1,400 PyDataLondon meetup members, we’re friendly for non-Python talks (data science is our focus) and we’d love submissions from people around R, SAS, Julia, Hadoop and the like. Our CfP review committee is 50% female, 50% male, more industrial than academic and they’re all deeply active in the field. We want speakers covering beginner, intermediate and expert data science topics, don’t hold off if you’ve never spoken before, we’d love for you to get involved.

If you’re hiring then you’ll probably want to sponsor – we’ve already closed the first few sponsorship slots and the next set are under discussion so you should get in touch quickly. By sponsoring you’ll be visible to our 300 world-class actively-practising data scientists and you’ll get to meet the creative academic minds and active businesses in our London data science community. Seriously, you should sponsor and get involved, don’t hang around or you’ll be left with that little table at the end of the corridor and you don’t want that!

If you’re interested in the above then you might also be interested in PyConSweden (May 12-13) – I’m giving the Opening Keynote on Data Science Deployed (it’ll be written up here later) and there’s a set of very nice data science talks in the schedule. Very shortly after we’ll have PyDataBerlin on May 29-30 in the heart of Berlin, go grab your tickets before they sell out.

Even if you can’t make our conferences do please join our monthly PyDataLondon meetup and get involved in our very active community. You’ll find slides from past presenters in the Comments for each of the meetups.


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

31 Comments | Tags: Data science, pydata, Python