About

Ian Ozsvald picture

This is Ian Ozsvald's blog (@IanOzsvald), I'm an entrepreneurial geek, a Data Science/ML/NLP/AI consultant, founder of the Annotate.io social media mining API, author of O'Reilly's High Performance Python book, co-organiser of PyDataLondon, co-founder of the SocialTies App, author of the A.I.Cookbook, author of The Screencasting Handbook, a Pythonista, co-founder of ShowMeDo and FivePoundApps and also a Londoner. Here's a little more about me.

High Performance Python book with O'Reilly View Ian Ozsvald's profile on LinkedIn Visit Ian Ozsvald's data science consulting business Protecting your bits. Open Rights Group

11 October 2014 - 16:18My Keynote at PyConIreland 2014 – “The Real Unsolved Problems in Data Science”

I’ve just given the opening keynote here at PyConIreland 2014 – many thanks to the organisers for letting me get on stage. This is based on 15 years experience running my own consultancies in Data Science and Artificial Intelligence. (Small note  – with the pic below James mis-tweeted ‘sexist’ instead of ‘sexiest’ (from my opening slide) <sigh>)

 

The slides for “The Real Unsolved Problems in Data Science” are available on speakerdeck, I’ll embed them with the video here when it is available. I wrote this for the more engineering-focused PyConIreland audience. These are the high level points, I did rather fill my hour:

  • Data Science is driven by companies needing new differentiation tactics (not by ‘big data’)
  • Problem 1 – People asking for too-complex stuff that’s not really feasible (‘magic’)
  • Problem 2 – Lack of statistical education for engineers – do go statistics courses!
  • Problem 3 – Dirty data is a huge cost – think about doing a Data Audit
  • Problem 4 – We need higher-level data cleaning APIs that understand human-level data (rather than numbers, strings and bools!) – much work is required here
  • Problem 5 – Visualisation with Python still hard and clunky, has a poor on-boarding experience for new users (and R does well here)
  • Problem 6 – Lots of go-faster/high-performance options but really Python should ‘handle this for us’ (and yes, I have written a book on this)
  • Problem 7 – Lack of shared vocabulary for statisticians & engineers
  • Problem 8 – Heterogeneous storage world is mostly non-Python (at least for high performance work), we need a “LAMP Stack for Data Science”
  • Problem 9 – Collaboration is still painful (but the IPython Notebook is improving this)
  • Problem 10 – We’re still building the same tools over and over (but the Notebook makes it easier) - we could do with some shared tools here
  • Linked Open Data is very useful and you should contribute to it and consume it
  • Our common tooling in Python is very powerful – please join numpy and scipy projects and contribute to the core
  • I noted a few times that the Python science stack works in Python 3 so you should just use Python 3.4+ for all new projects
  • PyData/EuroSciPy/SciPy/DataKind meetups are a great way to get involved
  • We need a “Design Patterns for Data Science with Python” book (and I want to know what you want to learn)

From discussions afterwards it seems that my message “you need clean data to do neat data science stuff” was well received. I’m certainly not the only person in the room battling with Unicode foolishness (not in Python of course as Python 3+ solves the Unicode problem :-).


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

No Comments | Tags: High Performance Python Book, pydata, Python

5 September 2014 - 12:20Fourth PyDataLondon Meetup

We’ve just run our 4th PyDataLondon meetup (@PyDataLondon). Having over 500 members is superb for just 4 months growth, woot :-)

Many thanks to @GoPivotalEMEA for hosting us.

We had 3 speakers and 1 lightning talk.

Here are my slides on “The High Performance Python Landscape”:

I’m still collecting data for my two surveys (to discuss at a future PyData when I’ve got enough data), one on Data Science training needs and one on Why Are More Companies Not Using Data Science?

Next Dirk spoke on Data for Good and datamining water sources in Tanzania including some very honest thoughts on how to (hopefully) leave behind working systems that local teams can maintain. Dirk’s talk is built on a project called Taarifa (and source) that our Florian helped build.

Finally we had Matt from Plot.ly over from San Francisco, he gave very compelling reasons to investigate the online visualisation (and data sharing) system for plot.ly. The matplotlib 1-line converter was particularly nice.

Tariq gave a lightning talk on his Make Your Own Mandelbrot book (aimed at kids and newbiews to Python), his slides are online.

We’ve got a growing collection of Offer/Want cards which help connect folk in the pub afterwards, we’ll keep building these up:

Our next event is on October 7th, be sure to follow the @pydatalondon twitter account and join the PyDataLondon meetup group to see the forthcoming announce. The RSVPs for the 4th event were filled in 2 hours of the general announce!


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

No Comments | Tags: pydata, Python

30 August 2014 - 12:06Slides for High Performance Python tutorial at EuroSciPy2014 + Book signing!

Yesterday I taught an excerpt of my 2 day High Performance Python tutorial as a 1.5 hour hands-on lesson at EuroSciPy 2014 in Cambridge with 70 students:

IMG_20140828_155857

We covered profiling (down to line-by-line CPU & memory usage), Cython (pure-py and OpenMP with numpy), Pythran, PyPy and Numba. This is an abridged set of slides from my 2 day tutorial, take a look at those details for the upcoming courses (including an intro to data science) we’re running in October.

I’ll add the video in here once it is released, the slides are below.

I also got to do a book-signing for our High Performance Python book (co-authored with Micha Gorelick), O’Reilly sent us 20 galley copies to give away. The finished printed book will be available via O’Reilly and Amazon in the next few weeks.

Book signing at EuroSciPy 2014

If you want to hear about our future courses then join our low-volume training announce list. I have a short (no-signup) survey about training needs for Pythonistas in data science, please fill that in to help me figure out what we should be teaching.

I also have a further survey on how companies are using (or not using!) data science, I’ll be using the results of this when I keynote at PyConIreland in October, your input will be very useful.

Here are the slides (License: CC By NonCommercial), there’s also source on github:


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

No Comments | Tags: Life, pydata, Python

26 August 2014 - 21:35Why are technical companies not using data science?

Here’s a quick question. How come more technical companies aren’t making use of data science? By “technical” I mean any company with data and the smarts to spot that it has value, by “data science” I mean any technical means to exploit this data for financial gain (e.g. visualisation to guide decisions, machine learning, prediction).

I’m guessing that it comes down to an economic question – either it isn’t as valuable as some other activity (making mobile apps? improving UX on the website? paid marketing? expanding sales to new territories?) or it is perceived as being valuable but cannot be exploited (maybe due to lack of skills and training or data problems).

I’m thinking about this for my upcoming keynote at PyConIreland, would you please give me some feedback in the survey below (no sign-up required)?

To be clear – this is an anonymous survey, I’ll have no idea who gives the answers.

Create your free online surveys with SurveyMonkey , the world’s leading questionnaire tool.

 

If the above is interesting then note that we’ve got a data science training list where we make occasional announcements about our upcoming training and we have two upcoming training courses. We also discuss these topics at our PyDataLondon meetups. I also have a slightly longer survey (it’ll take you 2 minutes, no sign-up required), I’ll be discussing these results at the next PyDataLondon so please share your thoughts.


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

No Comments | Tags: ArtificialIntelligence, Data science, pydata, Python

20 August 2014 - 21:24Data Science Training Survey

I’ve put together a short survey to figure out what’s needed for Python-based Data Science training in the UK. If you want to be trained in strong data science, analysis and engineering skills please complete the survey, it doesn’t need any sign-up and will take just a couple of minutes. I’ll share the results at the next PyDataLondon meetup.

If you want training you probably want to be on our training announce list, this is a low volume list (run by MailChimp) where we announce upcoming dates and suggest topics that you might want training around. You can unsubscribe at any time.

I’ve written about the current two courses that run in October through ModelInsight, one focuses on improving skills around data science using Python (including numpy, scipy and TDD), the second on high performance Python (I’ve now finished writing O’Reilly’s High Performance Python book). Both courses focus on practical skills, you’ll walk away with working systems and a stronger understanding of key Python skills. Your developer skills will be stronger as will your debugging skills, in the longer run you’ll develop stronger software with fewer defects.

If you want to talk about this, come have a chat at the next PyData London meetup or in the pub after.


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

No Comments | Tags: Data science, pydata, Python

8 August 2014 - 17:59PyDataLondon 3rd event

This week we had our 3rd PyDataLondon meetup (@PyDataLondon), this builds on our 2nd event. We’re really happy to see the group grow to over 400 members, co-org Emlyn made a plot (see below) of our linear growth.

Our main speakers:

  • Andrew Clegg (chief Data Scientist at Pearson Publishing in London) spoke on his Snake Charmer vagrant distribution of common Python science packages. They use it to quickly run new experiments using disposable virtual machines. Andrew’s slides are online along with his IPython Notebook
  • Maria Rosario Mestre gave an introduction to Apache Spark based on recent usage at Skimlinks, the story was useful as it covered both pros and cons. We learned that Python is (currently) a second-class citizen, the API in general is rapidly evolving and debugging info is hard to come by – it feels not really ready for production usage (unless you want to put in additional hours). Slides here
  • Emlyn Clay gave a lightning talk debunking the ‘brain machine interface’. Slides here
  • I gave a lightning talk on my IPython Memory Usage Analyzer, slides here

Andrew’s talk gave a live demo of reading live wikipedia edit data and visualising, having rolled out a new environment using vagrant. This environment can be deleted and rebuilt easily allowing many local environments using entirely separate virtual box distributions:

Emlyn extracted the dates when each member joined the PyDataLondon meetup group, using this he plotted a cumulative growth chart. It looks rather like we have some growth ahead of us :-) The initial growth is after we announced the group at the start of May, a few months after our first conference. You can see some steps in the graph, that occurs during the run-up to each new event:

growth_image

Emlyn announced the growth during our new ‘news segment’, he showed textract as his module of the month. Please humour him and feed us some new news for next month’s event :-) I also got to announce that my High Performance Python book is days away from going to the publisher after 11 months work – yay! We also discussed Kim’s S2D2 (in the news) and the new Project Jupyter.

We ran the “want & need” card experiment to build on last month’s experiment, this enabled some of us to meet just-the-right-people in the pub after to swap helpful notes:

Finally I also announced the upcoming training courses that my ModelInsight will be running October, there’s a blog post here detailing the Intro to Data Science and High Performance Python courses (or sign-up to our low-volume announce list).


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

No Comments | Tags: ArtificialIntelligence, pydata, Python

4 July 2014 - 16:08Second PyDataLondon Meetup a Javascript/Analystic-tastic event

This week we ran our 2nd PyDataLondon meetup (@PyDataLondon), we had 70 in the room and a rather techy set of talks. As before we hosted by Pivotal (@gopivotal) via Ian – many thanks for the beer and pizza! I took everyone to the pub after for a beer on out  data science consultancy to help get everyone talking.

UPDATE As of October 2014 I’ll be teaching Data Science and High Performance Python in London, sign-up here if you’d like to be notified of our courses (no spam, just occasional notes about our plans).

As a point of admin – we’re very happy that people who were RSVPd but couldn’t make it were able to unRSVP to free up spots for those on the waitlist. This really helps with predicting the number of attendees (which we need for beer & pizza estimates) so we can get in everyone who wants to attend.

We’re now looking for speakers for our 3rd event – please get in contact via the meetup group.

First up we had Kyran Dale (my old co-founder in our ShowMeDo educational site) talking around his consulting speciality of JavaScript and Python, he covered ways to get started including ways to export Pandas data into D3 with example code, JavaScript pitfalls and linting in “Getting your Python data into the Browser“:

 

Next we had Laurie Clark-Michalek talking on “Day of the Ancient 2 Game Analysis using Python“, Laurie went low-level into Cython with profiling via gprof2dot (which incidently we cover in our HPC book) and gave some insight into the professional game-play and analysis world:

We then had 2 lightning talks:

We finished with a small experiment – I brought a set of cards and people filled in a list of problems they’d like to discuss and skills they could share. Here’s the set, we’ll run this experiment next month (and iterate, having learned a little from this one). In the pub after I had a couple of nice chats from my ‘want’ (around “company name cleaning” from free-text sources):

Topics listed on the cards included Apache Spark, network analysis, numpy, facial recognition, geospatial and a job post. I expect we’ll grow this idea over the next few events.

Please get in contact via the meetup group if you’d like to speak, next month we have a talk on a new data science platform. The event will be on Tues August 5th at the same location.

I’ll be out at EuroPython & PyDataBerlin later this month, I hope to see some of you there. EuroSciPy is in Cambridge this year in August.

 


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

No Comments | Tags: High Performance Python Book, Life, pydata, Python

26 June 2014 - 14:08PyDataLondon second meetup (July 1st)

Our second PyDataLondon meetup will be running on Tuesday July 1st at Pivotal in Shoreditch. The announce went out to the meetup group and the event was at capacity within 7 hours – if you’d like to attend future meetups please join the group (and the wait-list is open for our next event). Our speakers:

  1. Kyran Dale on “Getting your Python data onto a Browser” – Python+javascript from ex-academic turned Brighton-based freelance Javascript Pythonic whiz
  2. Laurie Clark-Michalek – “Defence of the Ancients Analysis: Using Python to provide insight into professional DOTA2 matches” – game analysis using the full range of Python tools from data munging, high performance with Cython and visualisation

We’ll also have several lightning talks, these are described on the meetup page.

We’re open to submissions for future talks and lightning talks, please send us an email via the meetup group (and we might have room for 1 more lightning talk for the upcoming pydata – get in contact if you’ve something interesting to present in 5 minutes).

Some other events might interest you – Brighton has a Data Visualisation event and recently Yves Hilpisch ran a QuantFinance training session and the slides are available. Also remember PyDataBerlin in July and EuroSciPy in Cambridge in August.

 


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

No Comments | Tags: Data science, Life, pydata, Python

4 June 2014 - 22:30First PyDataLondon meetup done, preparing the second

Last night we ran our first PyDataLondon meetup (@PyDataLondon). We had 80 data-focused Pythonistas in the room, co-organiser Emlyn lead the talks followed by a great set of Lightning Talks. Pivotal provided a cool venue (thanks Ian Huston!) with lovely pizza and beer in central Shoreditch – we’re much obliged to you. This was a grand first event and we look forward to running the next set this summer. Our ModelInsight got to sponsor the beers for everyone after, it was lovely to see everyone in the pub – helping to bind our young community is one of our goals for this summer.

Emlyn opened with a discussion on “MATLAB and Python for Life Sciences” covering syntax similarities, ways to port MATLAB libraries to Python and hardware interfacing:

pydatalondon_20140605_emlyn

After the break we had a wide range of lightning talks:

Here’s Jacqui talking on Viz using Python and D3 and introducing her part in the new Data Journalism book:

pydatalondon_20140605_jacqui

During the night I asked some questions of the audience. We had a room of mostly active Python users (mainly beginner or intermediate), the majority worked with data science on a weekly basis, almost all using Python 2 (not 3). 6 used R, 2 used MATLAB and 1 used Julia (and I’m still hoping to learn about Julia). A part of the reason for the question is that I’m interested in learning who needs what in our new community, I’m planning on re-running my 2 day High Performance Python tutorial in London in a couple of months and we aim to run an introduction to data science using Python too (mail me if you want to know more).

We’re looking for talk proposals for next month and the month after along with lightning talk proposals – either mail me or post via the meetup group (but do it quick).

I totally failed to remind everyone about the upcoming PyDataBerlin conference in Berlin in July, it runs inside EuroPython at the same venue (so come and stay all week, a bunch of us are!). I also forgot to announce EuroSciPy which runs here in Cambridge in August, you should definitely come to that too, I believe I’m teaching more High Performance Python.

The next event will be held on July 1st at the same location, keep an eye on the meetup group for details. I’m hoping next time to maybe put forward a Lightning Talk around my High Performance Python book as hopefully it’ll be mostly finished by then.

Thanks to my co-organisers Emlyn and Cecilia (and Florian – get well soon)!


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

No Comments | Tags: Life, pydata, Python

26 April 2014 - 21:09PyDataLondon Meetup Number 1 (June – JS, NLP, Kinects)

Our first PyData London Meetup will occur on June 3rd at Pivotal in Shoreditch. At our first night we’ll have:

  • Pete Passaro talking on javascript and natural language processing using Python
  • Emlyn Clay on Matlab and Python for science
  • Chipp Jansen on auto-sculpting with a Kinect
  • Ian Huston on Pivotal’s open source tools (I had no idea they supported Redis and so many other tools!)
  • <1 more, TBC – submit an idea if you’re interested>

After this we’ll head to a local pub. Details will be announced via @pydatalondon. I haven’t run an event since the Five Pound App nights down in Brighton, I’m rather looking forward to getting data-focused Pythonistas together for interesting talks and good beer :-)

This new meetup will occur every month, it builds on the success of our PyData London conference back in February, we had over 200 people and a lot of rather superb presentations. My time to run the PyData London meetup is supported by my new ModelInsight Data Science consultancy.

You might also be interested in Yves’ new Python for Quant Finance meetup.


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

2 Comments | Tags: pydata, Python