About

Ian Ozsvald picture

This is Ian Ozsvald's blog, I'm an entrepreneurial geek, a Data Science/ML/NLP/AI consultant, founder of the Annotate.io social media mining API, author of O'Reilly's High Performance Python book, co-organiser of PyDataLondon, co-founder of the SocialTies App, author of the A.I.Cookbook, author of The Screencasting Handbook, a Pythonista, co-founder of ShowMeDo and FivePoundApps and also a Londoner. Here's a little more about me.

High Performance Python book with O'Reilly View Ian Ozsvald's profile on LinkedIn Visit Ian Ozsvald's data science consulting business Protecting your bits. Open Rights Group

30 August 2014 - 12:06Slides for High Performance Python tutorial at EuroSciPy2014 + Book signing!

Yesterday I taught an excerpt of my 2 day High Performance Python tutorial as a 1.5 hour hands-on lesson at EuroSciPy 2014 in Cambridge with 70 students:

IMG_20140828_155857

We covered profiling (down to line-by-line CPU & memory usage), Cython (pure-py and OpenMP with numpy), Pythran, PyPy and Numba. This is an abridged set of slides from my 2 day tutorial, take a look at those details for the upcoming courses (including an intro to data science) we’re running in October.

I’ll add the video in here once it is released, the slides are below.

I also got to do a book-signing for our High Performance Python book (co-authored with Micha Gorelick), O’Reilly sent us 20 galley copies to give away. The finished printed book will be available via O’Reilly and Amazon in the next few weeks.

Book signing at EuroSciPy 2014

If you want to hear about our future courses then join our low-volume training announce list. I have a short (no-signup) survey about training needs for Pythonistas in data science, please fill that in to help me figure out what we should be teaching.

I also have a further survey on how companies are using (or not using!) data science, I’ll be using the results of this when I keynote at PyConIreland in October, your input will be very useful.

Here are the slides (License: CC By NonCommercial), there’s also source on github:


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

No Comments | Tags: Life, pydata, Python

26 August 2014 - 21:35Why are technical companies not using data science?

Here’s a quick question. How come more technical companies aren’t making use of data science? By “technical” I mean any company with data and the smarts to spot that it has value, by “data science” I mean any technical means to exploit this data for financial gain (e.g. visualisation to guide decisions, machine learning, prediction).

I’m guessing that it comes down to an economic question – either it isn’t as valuable as some other activity (making mobile apps? improving UX on the website? paid marketing? expanding sales to new territories?) or it is perceived as being valuable but cannot be exploited (maybe due to lack of skills and training or data problems).

I’m thinking about this for my upcoming keynote at PyConIreland, would you please give me some feedback in the survey below (no sign-up required)?

To be clear – this is an anonymous survey, I’ll have no idea who gives the answers.

Create your free online surveys with SurveyMonkey , the world’s leading questionnaire tool.

 

If the above is interesting then note that we’ve got a data science training list where we make occasional announcements about our upcoming training and we have two upcoming training courses. We also discuss these topics at our PyDataLondon meetups. I also have a slightly longer survey (it’ll take you 2 minutes, no sign-up required), I’ll be discussing these results at the next PyDataLondon so please share your thoughts.


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

No Comments | Tags: ArtificialIntelligence, Data science, pydata, Python

20 August 2014 - 21:24Data Science Training Survey

I’ve put together a short survey to figure out what’s needed for Python-based Data Science training in the UK. If you want to be trained in strong data science, analysis and engineering skills please complete the survey, it doesn’t need any sign-up and will take just a couple of minutes. I’ll share the results at the next PyDataLondon meetup.

If you want training you probably want to be on our training announce list, this is a low volume list (run by MailChimp) where we announce upcoming dates and suggest topics that you might want training around. You can unsubscribe at any time.

I’ve written about the current two courses that run in October through ModelInsight, one focuses on improving skills around data science using Python (including numpy, scipy and TDD), the second on high performance Python (I’ve now finished writing O’Reilly’s High Performance Python book). Both courses focus on practical skills, you’ll walk away with working systems and a stronger understanding of key Python skills. Your developer skills will be stronger as will your debugging skills, in the longer run you’ll develop stronger software with fewer defects.

If you want to talk about this, come have a chat at the next PyData London meetup or in the pub after.


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

No Comments | Tags: Data science, pydata, Python

8 August 2014 - 17:59PyDataLondon 3rd event

This week we had our 3rd PyDataLondon meetup (@PyDataLondon), this builds on our 2nd event. We’re really happy to see the group grow to over 400 members, co-org Emlyn made a plot (see below) of our linear growth.

Our main speakers:

  • Andrew Clegg (chief Data Scientist at Pearson Publishing in London) spoke on his Snake Charmer vagrant distribution of common Python science packages. They use it to quickly run new experiments using disposable virtual machines. Andrew’s slides are online along with his IPython Notebook
  • Maria Rosario Mestre gave an introduction to Apache Spark based on recent usage at Skimlinks, the story was useful as it covered both pros and cons. We learned that Python is (currently) a second-class citizen, the API in general is rapidly evolving and debugging info is hard to come by – it feels not really ready for production usage (unless you want to put in additional hours). Slides here
  • Emlyn Clay gave a lightning talk debunking the ‘brain machine interface’. Slides here
  • I gave a lightning talk on my IPython Memory Usage Analyzer, slides here

Andrew’s talk gave a live demo of reading live wikipedia edit data and visualising, having rolled out a new environment using vagrant. This environment can be deleted and rebuilt easily allowing many local environments using entirely separate virtual box distributions:

Emlyn extracted the dates when each member joined the PyDataLondon meetup group, using this he plotted a cumulative growth chart. It looks rather like we have some growth ahead of us :-) The initial growth is after we announced the group at the start of May, a few months after our first conference. You can see some steps in the graph, that occurs during the run-up to each new event:

growth_image

Emlyn announced the growth during our new ‘news segment’, he showed textract as his module of the month. Please humour him and feed us some new news for next month’s event :-) I also got to announce that my High Performance Python book is days away from going to the publisher after 11 months work – yay! We also discussed Kim’s S2D2 (in the news) and the new Project Jupyter.

We ran the “want & need” card experiment to build on last month’s experiment, this enabled some of us to meet just-the-right-people in the pub after to swap helpful notes:

Finally I also announced the upcoming training courses that my ModelInsight will be running October, there’s a blog post here detailing the Intro to Data Science and High Performance Python courses (or sign-up to our low-volume announce list).


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

No Comments | Tags: ArtificialIntelligence, pydata, Python

4 July 2014 - 16:08Second PyDataLondon Meetup a Javascript/Analystic-tastic event

This week we ran our 2nd PyDataLondon meetup (@PyDataLondon), we had 70 in the room and a rather techy set of talks. As before we hosted by Pivotal (@gopivotal) via Ian – many thanks for the beer and pizza! I took everyone to the pub after for a beer on out  data science consultancy to help get everyone talking.

UPDATE As of October 2014 I’ll be teaching Data Science and High Performance Python in London, sign-up here if you’d like to be notified of our courses (no spam, just occasional notes about our plans).

As a point of admin – we’re very happy that people who were RSVPd but couldn’t make it were able to unRSVP to free up spots for those on the waitlist. This really helps with predicting the number of attendees (which we need for beer & pizza estimates) so we can get in everyone who wants to attend.

We’re now looking for speakers for our 3rd event – please get in contact via the meetup group.

First up we had Kyran Dale (my old co-founder in our ShowMeDo educational site) talking around his consulting speciality of JavaScript and Python, he covered ways to get started including ways to export Pandas data into D3 with example code, JavaScript pitfalls and linting in “Getting your Python data into the Browser“:

 

Next we had Laurie Clark-Michalek talking on “Day of the Ancient 2 Game Analysis using Python“, Laurie went low-level into Cython with profiling via gprof2dot (which incidently we cover in our HPC book) and gave some insight into the professional game-play and analysis world:

We then had 2 lightning talks:

We finished with a small experiment – I brought a set of cards and people filled in a list of problems they’d like to discuss and skills they could share. Here’s the set, we’ll run this experiment next month (and iterate, having learned a little from this one). In the pub after I had a couple of nice chats from my ‘want’ (around “company name cleaning” from free-text sources):

Topics listed on the cards included Apache Spark, network analysis, numpy, facial recognition, geospatial and a job post. I expect we’ll grow this idea over the next few events.

Please get in contact via the meetup group if you’d like to speak, next month we have a talk on a new data science platform. The event will be on Tues August 5th at the same location.

I’ll be out at EuroPython & PyDataBerlin later this month, I hope to see some of you there. EuroSciPy is in Cambridge this year in August.

 


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

No Comments | Tags: High Performance Python Book, Life, pydata, Python

26 June 2014 - 14:08PyDataLondon second meetup (July 1st)

Our second PyDataLondon meetup will be running on Tuesday July 1st at Pivotal in Shoreditch. The announce went out to the meetup group and the event was at capacity within 7 hours – if you’d like to attend future meetups please join the group (and the wait-list is open for our next event). Our speakers:

  1. Kyran Dale on “Getting your Python data onto a Browser” – Python+javascript from ex-academic turned Brighton-based freelance Javascript Pythonic whiz
  2. Laurie Clark-Michalek – “Defence of the Ancients Analysis: Using Python to provide insight into professional DOTA2 matches” – game analysis using the full range of Python tools from data munging, high performance with Cython and visualisation

We’ll also have several lightning talks, these are described on the meetup page.

We’re open to submissions for future talks and lightning talks, please send us an email via the meetup group (and we might have room for 1 more lightning talk for the upcoming pydata – get in contact if you’ve something interesting to present in 5 minutes).

Some other events might interest you – Brighton has a Data Visualisation event and recently Yves Hilpisch ran a QuantFinance training session and the slides are available. Also remember PyDataBerlin in July and EuroSciPy in Cambridge in August.

 


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

No Comments | Tags: Data science, Life, pydata, Python

4 June 2014 - 22:30First PyDataLondon meetup done, preparing the second

Last night we ran our first PyDataLondon meetup (@PyDataLondon). We had 80 data-focused Pythonistas in the room, co-organiser Emlyn lead the talks followed by a great set of Lightning Talks. Pivotal provided a cool venue (thanks Ian Huston!) with lovely pizza and beer in central Shoreditch – we’re much obliged to you. This was a grand first event and we look forward to running the next set this summer. Our ModelInsight got to sponsor the beers for everyone after, it was lovely to see everyone in the pub – helping to bind our young community is one of our goals for this summer.

Emlyn opened with a discussion on “MATLAB and Python for Life Sciences” covering syntax similarities, ways to port MATLAB libraries to Python and hardware interfacing:

pydatalondon_20140605_emlyn

After the break we had a wide range of lightning talks:

Here’s Jacqui talking on Viz using Python and D3 and introducing her part in the new Data Journalism book:

pydatalondon_20140605_jacqui

During the night I asked some questions of the audience. We had a room of mostly active Python users (mainly beginner or intermediate), the majority worked with data science on a weekly basis, almost all using Python 2 (not 3). 6 used R, 2 used MATLAB and 1 used Julia (and I’m still hoping to learn about Julia). A part of the reason for the question is that I’m interested in learning who needs what in our new community, I’m planning on re-running my 2 day High Performance Python tutorial in London in a couple of months and we aim to run an introduction to data science using Python too (mail me if you want to know more).

We’re looking for talk proposals for next month and the month after along with lightning talk proposals – either mail me or post via the meetup group (but do it quick).

I totally failed to remind everyone about the upcoming PyDataBerlin conference in Berlin in July, it runs inside EuroPython at the same venue (so come and stay all week, a bunch of us are!). I also forgot to announce EuroSciPy which runs here in Cambridge in August, you should definitely come to that too, I believe I’m teaching more High Performance Python.

The next event will be held on July 1st at the same location, keep an eye on the meetup group for details. I’m hoping next time to maybe put forward a Lightning Talk around my High Performance Python book as hopefully it’ll be mostly finished by then.

Thanks to my co-organisers Emlyn and Cecilia (and Florian – get well soon)!


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

No Comments | Tags: Life, pydata, Python

26 April 2014 - 21:09PyDataLondon Meetup Number 1 (June – JS, NLP, Kinects)

Our first PyData London Meetup will occur on June 3rd at Pivotal in Shoreditch. At our first night we’ll have:

  • Pete Passaro talking on javascript and natural language processing using Python
  • Emlyn Clay on Matlab and Python for science
  • Chipp Jansen on auto-sculpting with a Kinect
  • Ian Huston on Pivotal’s open source tools (I had no idea they supported Redis and so many other tools!)
  • <1 more, TBC – submit an idea if you’re interested>

After this we’ll head to a local pub. Details will be announced via @pydatalondon. I haven’t run an event since the Five Pound App nights down in Brighton, I’m rather looking forward to getting data-focused Pythonistas together for interesting talks and good beer :-)

This new meetup will occur every month, it builds on the success of our PyData London conference back in February, we had over 200 people and a lot of rather superb presentations. My time to run the PyData London meetup is supported by my new ModelInsight Data Science consultancy.

You might also be interested in Yves’ new Python for Quant Finance meetup.


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

2 Comments | Tags: pydata, Python

16 April 2014 - 21:112nd Early Release of High Performance Python (we added a chapter)

Here’s a quick book update – we just released a second Early Release of High Performance Python which adds a chapter on lists, tuples, dictionaries and sets. This is available to anyone who has bought it already (login into O’Reilly to get the update). Shortly we’ll follow with chapters on Matrices and the Multiprocessing module.

One bit of feedback we’ve had is that the images needed to be clearer for small-screen devices – we’ve increased the font sizes and removed the grey backgrounds, the updates will follow soon. If you’re curious about how much paper is involved in writing a book, here’s a clue:

We announce each updates along with requests for feedback via our mailing list.

I’m also planning on running some private training in London later in the year, please contact me if this is interesting? Both High Performance and Data Science are possible.

In related news – the PyDataLondon conference videos have just been released and you can see me talking on the High Performance Python landscape here.


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

No Comments | Tags: High Performance Python Book, Life, pydata, Python

24 February 2014 - 9:30PyDataLondon 2014 Write-up

We’ve just drawn PyDataLondon 2014 to a close, it has been a wonderfully successful weekend. The growth of Python’s use for data science in the last few years here in the UK is pretty phenomenal. Many thanks to Continuum Analytics and NumFocus for backing and organising the PyData conferences.

“Start of the week after busy weekend attending 1st #PyData London. Thanks to organisers & speakers: a smashing set of talks & gr8 community.” @lindauruchurtu

pydata_keynote_felix_fernandez_full_room

“If it weren’t for the succession of great talks at #pydata Ldn, I’d be getting quite upset about the AUS v SA test. Thank you @PyDataConf” @davisjmcc

We’ve had a fab weekend and a packed schedule with training and varied talks including:

  • Two great keynotes (Deutsche Börse finance by Felix Fernandez and ‘big data’ brain research on a budget by Gael Varoquaux)
  • Machine learning (mostly scikit-learn) and text processing
  • Lots of visualisation (mostly javascript)
  • Practical discussion of how and why things work (including some hard-won lessons on processes and statistics) and strong lessons on mistakes to avoid
  • Art and economics
  • Lots of IPython Notebook
  • Some R and Matlab (we need more of this!)
  • Great lightning talks including live rocket-science-in-the-Notebook to close the weekend
  • Speakers from both industry (including BAE, the Met Office and Hedge Funds through to fresh startups) and academia stretching through Europe

One outcome from Gael’s keynote was the importance of citing the open source projects that get used to help highlight their need for funding and resources:

“next time you write an article that uses scikit-learn and friends, cite the software you use, that will help authors, eg get funding #pydata” – @dimazest

I ran a panel asking “Shouldn’t more companies be using data science?” – the deliberately loaded question was addressed by a a range of industrial representatives including James from New York, Jonathan, Johnny, Dirk, Ian and Philip.  The short answer seemed to be that more companies were taking risks (and winning the rewards) of analysing their data and that some more training (both for scientists and for managers) could help things along.

“#PyData panel first question: have you done anything data analysis related within the last six month? (half the room raises hands)” James Powell

pydata_london_2041_panel

Through my Mor Consulting I talked on The Landscape of High Performance Python by taking a look at profiling techniques and compiler options for single-machine multi-core speed-ups, obviously this is somewhat connected to the High Performance Python book I’m working on (hopefully an early release of the first chapters will be out shortly).

“Like @ianozsvald ‘s ‘team velocity’ to describe how clean slow code can be better than complex fast code in terms of team development” – Mark Basham of Diamond Synchrotron

Renowned Brightonian artist Eric Drass spoke on the confluence of art, mass data, surveillance, the redaction of political positions (and how nothing is ever really removed from the internet – AlgoCameron) and Hugh Hefner:

pydata_eric_drass

Martin Goodson‘s “Most Winning A/B Test Results are Illusory” talk has hit HackerNews with good discussion via his published paper.

pydata_martin_abtests

(reformed string-theorist) Linda spoke on trying #sklearn as an avid R user for music recommendation, highlighting some of the highs and lows of both toolsets (and noting the sillyness of the ‘language wars’):

pydata_lynda

My colleague Bart Baddeley discussed problems and solutions in clustering approaches, IPython Notebook with all examples available online:

“Similarity matrices are a neat way of eye-balling whether you’ve chosen the right number of clusters #pydata” – Hugo Carr

pydata_bart

Kyran Dale (my co-founder from ShowMeDo and StrongSteam) spoke on powering javascript from live Python servers using techniques such as web sockets to visualise robot brain controllers and UK weather patterns:

pydata_kyran_dale

Neri covered NLP and ML using NLTK and scikitlearn for real-time customer support at Conversocial (a successful London customer support startup):

pydata_neri_nlp

Philippe Bracke spoke on house price rents and yields, modelled during his PhD:

“Interesting conclusion from @PhilippeBracke #pydata you earn less money from renting more expensive properties” – Ian Taylor

IMG_20140223_132947

SkimLinks sponsored a fun Saturday party (they’re hiring!). The conference series is generously sponsored by Continuum Analytics (it all started in the USA – hello Bryan!) and supported through the non-profit NumFocus organisation (and Leah does a rather ace job of pulling all the loose strands into a cohesive whole!).

Level39 in Canary Wharf provided the venue. Additional sponsors include Lyst who are hiring (hi Seb!), Python Academy (hello Mike!), Python Software Foundation, Knowsis, DataRobot (hello Jeremy and Peter!), Python Weekly and O’Reilly.

The view from Level39 was rather nice (their space is ace – visit it if you get a chance – thanking Jacqui for the photo):

pydata_view2

Clearly we have a strong base here to build from for future conferences. EuroSciPy 2014 (Cambridge, August) was discussed and PyDataBerlin was announced, it’ll happen in conjunction with EuroPython (July, Berlin). I’ll be at all three.

More write-ups are available:

For future events we’ll have to work on female attendance (I counted 10%  – this surely can be improved), we also want more interdisciplinary talks (we had some R and Matlab – we need more languages and other approaches). Overall I’m super happy with the outcome, we organised this in under two months, we got a fab turn-out and a stellar set of speakers (from nearby, throughout Europe and out to the USA). The next event can only be stronger still.

We collected slides and everything was recorded, videos will hopefully be up in a week.

I thank the organising team – Leah (NumFocus) kept us all on track, Emlyn, Cecilia, Florian, Yves and James here and our past-PyData American supporters all kept things moving in what appeared to be a rather effortless way. It wouldn’t have worked without everyone’s support including all the custodians of local usergroups who kindly spread the word – many thanks to you all.


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

33 Comments | Tags: pydata, Python