About

Ian Ozsvald picture

This is Ian Ozsvald's blog, I'm an entrepreneurial geek, a Data Science/ML/NLP/AI consultant, founder of the Annotate.io social media mining API, author of O'Reilly's High Performance Python book, co-organiser of PyDataLondon, co-founder of the SocialTies App, author of the A.I.Cookbook, author of The Screencasting Handbook, a Pythonista, co-founder of ShowMeDo and FivePoundApps and also a Londoner. Here's a little more about me.

High Performance Python book with O'Reilly View Ian Ozsvald's profile on LinkedIn Visit Ian Ozsvald's data science consulting business Protecting your bits. Open Rights Group

4 June 2014 - 22:30First PyDataLondon meetup done, preparing the second

Last night we ran our first PyDataLondon meetup (@PyDataLondon). We had 80 data-focused Pythonistas in the room, co-organiser Emlyn lead the talks followed by a great set of Lightning Talks. Pivotal provided a cool venue (thanks Ian Huston!) with lovely pizza and beer in central Shoreditch – we’re much obliged to you. This was a grand first event and we look forward to running the next set this summer. Our ModelInsight got to sponsor the beers for everyone after, it was lovely to see everyone in the pub – helping to bind our young community is one of our goals for this summer.

Emlyn opened with a discussion on “MATLAB and Python for Life Sciences” covering syntax similarities, ways to port MATLAB libraries to Python and hardware interfacing:

pydatalondon_20140605_emlyn

After the break we had a wide range of lightning talks:

Here’s Jacqui talking on Viz using Python and D3 and introducing her part in the new Data Journalism book:

pydatalondon_20140605_jacqui

During the night I asked some questions of the audience. We had a room of mostly active Python users (mainly beginner or intermediate), the majority worked with data science on a weekly basis, almost all using Python 2 (not 3). 6 used R, 2 used MATLAB and 1 used Julia (and I’m still hoping to learn about Julia). A part of the reason for the question is that I’m interested in learning who needs what in our new community, I’m planning on re-running my 2 day High Performance Python tutorial in London in a couple of months and we aim to run an introduction to data science using Python too (mail me if you want to know more).

We’re looking for talk proposals for next month and the month after along with lightning talk proposals – either mail me or post via the meetup group (but do it quick).

I totally failed to remind everyone about the upcoming PyDataBerlin conference in Berlin in July, it runs inside EuroPython at the same venue (so come and stay all week, a bunch of us are!). I also forgot to announce EuroSciPy which runs here in Cambridge in August, you should definitely come to that too, I believe I’m teaching more High Performance Python.

The next event will be held on July 1st at the same location, keep an eye on the meetup group for details. I’m hoping next time to maybe put forward a Lightning Talk around my High Performance Python book as hopefully it’ll be mostly finished by then.

Thanks to my co-organisers Emlyn and Cecilia (and Florian – get well soon)!


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

No Comments | Tags: Life, pydata, Python

2 June 2014 - 20:02New High Performance Python chapters online & teaching a 2 day course on HPC

The last month has been crazy busy, not least because I got to run my first High Performance Python 2 day tutorial at a university. I was out in Aalborg University teaching a PhD group, we covered four blocks:

  1. Profiling (CPU and RAM)
  2. Compilers and JITs
  3. Multi-core and distributed
  4. Using less RAM, storage systems and lessons

UPDATE As of October 2014 I’ll be teaching High Performance Python and Data Science in London, sign-up here to join our announce list (no spam, just occasional updates about our courses).

Here’s a picture of my class, it all went rather swimmingly. I plan to run the same class in London in the coming months (details to follow):

class_aalborg_teaching

On the same note we pushed some more chapters for our High Performance Python book on to O’Reilly’s build system a week back, we now have:

  • Introduction
  • Performant Python
  • Tuples and Dictionaries
  • Iterators and Generators
  • Profiling
  • Matrices with numpy
  • Compiling and JITs

More chapters will go live in a couple of weeks, we’re in the final editing phase now.

Don’t forget that PyDataBerlin is coming up in a couple of months, it runs during EuroPython. If you’re out for EuroPython then it makes a lot of sense to go to PyDataBerlin too :-)


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

No Comments | Tags: High Performance Python Book, Python

26 April 2014 - 21:09PyDataLondon Meetup Number 1 (June – JS, NLP, Kinects)

Our first PyData London Meetup will occur on June 3rd at Pivotal in Shoreditch. At our first night we’ll have:

  • Pete Passaro talking on javascript and natural language processing using Python
  • Emlyn Clay on Matlab and Python for science
  • Chipp Jansen on auto-sculpting with a Kinect
  • Ian Huston on Pivotal’s open source tools (I had no idea they supported Redis and so many other tools!)
  • <1 more, TBC – submit an idea if you’re interested>

After this we’ll head to a local pub. Details will be announced via @pydatalondon. I haven’t run an event since the Five Pound App nights down in Brighton, I’m rather looking forward to getting data-focused Pythonistas together for interesting talks and good beer :-)

This new meetup will occur every month, it builds on the success of our PyData London conference back in February, we had over 200 people and a lot of rather superb presentations. My time to run the PyData London meetup is supported by my new ModelInsight Data Science consultancy.

You might also be interested in Yves’ new Python for Quant Finance meetup.


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

2 Comments | Tags: pydata, Python

16 April 2014 - 21:112nd Early Release of High Performance Python (we added a chapter)

Here’s a quick book update – we just released a second Early Release of High Performance Python which adds a chapter on lists, tuples, dictionaries and sets. This is available to anyone who has bought it already (login into O’Reilly to get the update). Shortly we’ll follow with chapters on Matrices and the Multiprocessing module.

One bit of feedback we’ve had is that the images needed to be clearer for small-screen devices – we’ve increased the font sizes and removed the grey backgrounds, the updates will follow soon. If you’re curious about how much paper is involved in writing a book, here’s a clue:

We announce each updates along with requests for feedback via our mailing list.

I’m also planning on running some private training in London later in the year, please contact me if this is interesting? Both High Performance and Data Science are possible.

In related news – the PyDataLondon conference videos have just been released and you can see me talking on the High Performance Python landscape here.


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

No Comments | Tags: High Performance Python Book, Life, pydata, Python

10 March 2014 - 14:15High Performance Book almost ready for Early Release preview

Our first few chapters of High Performance Python are nearly ready for the Early Release on O’Reilly’s platform, we’ll be releasing:

  • Understanding Performance Python (an overview of the virtual machine and modern PC hardware)
  • Profiling Python Code (for CPU and RAM profiling, lots of options)
  • Pure Python (generators, the guts of dicts and the like)

We’ll announce the release via our mailing list, sign-up if you want to know as soon as it is available. Overall Micha and I have written half the book now (although this first Early Release will be just the first 3 chapters), we aim to finish the other half in the next few months.

The process of writing is very iterative…which means it is far too easy to write a bunch of stuff and then realise there’s a bunch of holes that need filling in (some of which turn into real rabbit-holes one call fall into for days!). Out the other side you get a nice narrative, lots of verified results and some nice explanatory plots. It just takes rather a long time to get there.

Here’s the first couple of pages of the start of the (just being written-up) multiprocessing chapter, I’m using a Monte Carlo Pi estimator to discuss performance characteristics of threads vs processes. Notice the black pen and scrawls to improves the diagrams – this happens a lot:

HPC book multiprocessing scrawls

Right now I’ve been playing with Pi estimation using straight Python and numpy over many cores using both threads and processes (this chapter is just looking at multiprocessing, not JITs or compilers [that's a later chapter]). Processes obviously provide a linear speed-up for this sort of problem, exploring the right number of processes and the right job sizes (especially for variable-time workloads) was fun.

Threads on numpy do something interesting – you can actually use multiple cores and so code can run faster (about 10% in this case compared to a single core), although they’re often not as efficient as using multiple processes. Here’s a plot of CPU use for 4 cores (and 4 HyperThreads) over time with threads. It turns out that the random number generation is GIL bound but the vectorised operations for Pi’s estimation aren’t:colour_08_pi_lists_parallel_py_4

Join the mailing list for updates. Work on the book is supported via my London based Artificial Intelligence agency Mor Consulting.


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

7 Comments | Tags: High Performance Python Book, Python

9 March 2014 - 12:52PyPy commercial support now available by core devs

I meant to note this a week or so back – some of the core PyPy dev team (Fijal and Armin) have put together a new consultancy focused on PyPy commercial support. The move was warmly received on reddit. They aim to provide training, tuning and custom work and have a team in various bits of the world. I look forward to watching this develop.


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

2 Comments | Tags: Python

24 February 2014 - 9:30PyDataLondon 2014 Write-up

We’ve just drawn PyDataLondon 2014 to a close, it has been a wonderfully successful weekend. The growth of Python’s use for data science in the last few years here in the UK is pretty phenomenal. Many thanks to Continuum Analytics and NumFocus for backing and organising the PyData conferences.

“Start of the week after busy weekend attending 1st #PyData London. Thanks to organisers & speakers: a smashing set of talks & gr8 community.” @lindauruchurtu

pydata_keynote_felix_fernandez_full_room

“If it weren’t for the succession of great talks at #pydata Ldn, I’d be getting quite upset about the AUS v SA test. Thank you @PyDataConf” @davisjmcc

We’ve had a fab weekend and a packed schedule with training and varied talks including:

  • Two great keynotes (Deutsche Börse finance by Felix Fernandez and ‘big data’ brain research on a budget by Gael Varoquaux)
  • Machine learning (mostly scikit-learn) and text processing
  • Lots of visualisation (mostly javascript)
  • Practical discussion of how and why things work (including some hard-won lessons on processes and statistics) and strong lessons on mistakes to avoid
  • Art and economics
  • Lots of IPython Notebook
  • Some R and Matlab (we need more of this!)
  • Great lightning talks including live rocket-science-in-the-Notebook to close the weekend
  • Speakers from both industry (including BAE, the Met Office and Hedge Funds through to fresh startups) and academia stretching through Europe

One outcome from Gael’s keynote was the importance of citing the open source projects that get used to help highlight their need for funding and resources:

“next time you write an article that uses scikit-learn and friends, cite the software you use, that will help authors, eg get funding #pydata” – @dimazest

I ran a panel asking “Shouldn’t more companies be using data science?” – the deliberately loaded question was addressed by a a range of industrial representatives including James from New York, Jonathan, Johnny, Dirk, Ian and Philip.  The short answer seemed to be that more companies were taking risks (and winning the rewards) of analysing their data and that some more training (both for scientists and for managers) could help things along.

“#PyData panel first question: have you done anything data analysis related within the last six month? (half the room raises hands)” James Powell

pydata_london_2041_panel

Through my Mor Consulting I talked on The Landscape of High Performance Python by taking a look at profiling techniques and compiler options for single-machine multi-core speed-ups, obviously this is somewhat connected to the High Performance Python book I’m working on (hopefully an early release of the first chapters will be out shortly).

“Like @ianozsvald ‘s ‘team velocity’ to describe how clean slow code can be better than complex fast code in terms of team development” – Mark Basham of Diamond Synchrotron

Renowned Brightonian artist Eric Drass spoke on the confluence of art, mass data, surveillance, the redaction of political positions (and how nothing is ever really removed from the internet – AlgoCameron) and Hugh Hefner:

pydata_eric_drass

Martin Goodson‘s “Most Winning A/B Test Results are Illusory” talk has hit HackerNews with good discussion via his published paper.

pydata_martin_abtests

(reformed string-theorist) Linda spoke on trying #sklearn as an avid R user for music recommendation, highlighting some of the highs and lows of both toolsets (and noting the sillyness of the ‘language wars’):

pydata_lynda

My colleague Bart Baddeley discussed problems and solutions in clustering approaches, IPython Notebook with all examples available online:

“Similarity matrices are a neat way of eye-balling whether you’ve chosen the right number of clusters #pydata” – Hugo Carr

pydata_bart

Kyran Dale (my co-founder from ShowMeDo and StrongSteam) spoke on powering javascript from live Python servers using techniques such as web sockets to visualise robot brain controllers and UK weather patterns:

pydata_kyran_dale

Neri covered NLP and ML using NLTK and scikitlearn for real-time customer support at Conversocial (a successful London customer support startup):

pydata_neri_nlp

Philippe Bracke spoke on house price rents and yields, modelled during his PhD:

“Interesting conclusion from @PhilippeBracke #pydata you earn less money from renting more expensive properties” – Ian Taylor

IMG_20140223_132947

SkimLinks sponsored a fun Saturday party (they’re hiring!). The conference series is generously sponsored by Continuum Analytics (it all started in the USA – hello Bryan!) and supported through the non-profit NumFocus organisation (and Leah does a rather ace job of pulling all the loose strands into a cohesive whole!).

Level39 in Canary Wharf provided the venue. Additional sponsors include Lyst who are hiring (hi Seb!), Python Academy (hello Mike!), Python Software Foundation, Knowsis, DataRobot (hello Jeremy and Peter!), Python Weekly and O’Reilly.

The view from Level39 was rather nice (their space is ace – visit it if you get a chance – thanking Jacqui for the photo):

pydata_view2

Clearly we have a strong base here to build from for future conferences. EuroSciPy 2014 (Cambridge, August) was discussed and PyDataBerlin was announced, it’ll happen in conjunction with EuroPython (July, Berlin). I’ll be at all three.

More write-ups are available:

For future events we’ll have to work on female attendance (I counted 10%  – this surely can be improved), we also want more interdisciplinary talks (we had some R and Matlab – we need more languages and other approaches). Overall I’m super happy with the outcome, we organised this in under two months, we got a fab turn-out and a stellar set of speakers (from nearby, throughout Europe and out to the USA). The next event can only be stronger still.

We collected slides and everything was recorded, videos will hopefully be up in a week.

I thank the organising team – Leah (NumFocus) kept us all on track, Emlyn, Cecilia, Florian, Yves and James here and our past-PyData American supporters all kept things moving in what appeared to be a rather effortless way. It wouldn’t have worked without everyone’s support including all the custodians of local usergroups who kindly spread the word – many thanks to you all.


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

33 Comments | Tags: pydata, Python

23 February 2014 - 11:53High Performance Python at PyDataLondon 2014

Yesterday I spoke on The High Performance Python Landscape at PyDataLondon 2014 (our first PyData outside of the USA – see my write-up). I was blessed with a full room and interesting questions. With Micha I’m authoring a High Performance Python book with O’Reilly (email list for early access) and I took the topics from a few of our chapters.

“@ianozsvald providing eye-opening discussion of tools for high-performance #Python: #Cython, #ShedSkin, #Pythran, #PyPy, #numba… #pydata” – @davisjmcc

Overall I covered:

  • line_profiler for CPU profiling in a function
  • memory_profiler for RAM profiling in a function
  • memory_profiler’s %memit
  • memory_profiler’s mprof to graph memory use during program’s runtime
  • thoughts on adding network and disk I/O tracking to mprof
  • Cython on lists
  • Cython on numpy by dereferencing elements (which would normally be horribly inefficient) plus OpenMP
  • ShedSkin‘s annotated output and thoughts on using this as an input to Cython
  • PyPy and numpy in PyPy
  • Pythran with numpy and OpenMP support (you should check this out)
  • Numba
  • Concluding thoughts on why you should probably use JITs over Cython

Here’s my room full of happy Pythonistas :-)

pydatalondon2014_highperformancepython

“Really useful and practical performance tips from @ianozsvald @pydata #pydata speeding up #Python code” – @iantaylorfb

Slides from the talk:

 

 

UPDATE Armin and Maciej came back today with some extra answers about the PyPy-numpy performance (here and here), the bottom line is that they plan to fix it (Maciej says it is now fixed – quick service!). Maciej also notes improvements planned using e.g. vectorisation in numpy.

VIDEO TO FOLLOW


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

9 Comments | Tags: High Performance Python Book, pydata, Python

4 February 2014 - 10:28PyData London Abstracts Announced

I’m very pleased to say that the talks and tutorials are public now, listed on the Abstracts page, this is the first draft of the acceptances so maybe there will be some changes but we’re treating it as ‘mostly done’. The schedule will follow later (the conference is backed by a non-profit and the organisation is all volunteer based).

Early bird tickets run for 1 week from this announce, grab ‘em quick. We’ll cover lots of Python, some R and Matlab, lots of data analysis, visualisation and machine learning, also some economics and art. We’re aiming to bring a wide range of people together to help build the local data analysis community, the goal is to start lots of conversations and to encourage collaborations. We plan to have a panel, lightning talks and there will be lots of evening beer to drink.

Some of our speakers are coming in from both the USA and further into Europe, they have links to the older PyData and SciPy conferences along with EuroSciPy and EuroPython. You’ll recognise core contributors for numpy, scipy, scikit-learn and the like at the conference.


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

9 Comments | Tags: Python

21 January 2014 - 23:43PyData London conference keynotes and topics coming together

We’ve got our keynoters for PyData London (Feb Fri 21- Sun 23):

  • Gael Varoquaux (INRIA) with “Building a Cutting-Edge Data Processing Environment on a Budget”
  • Felix Fernandez (Deutsche Börse) with “Python in the Financial Industry: The Universal Tool for End-to-End Development”

Gael is a core committer for scikit-learn and Felix is the Business CIO for the Cash & Derivatives IT department.

Along with the keynoters we have a nice set of talks lining up (the Call for Proposals is open!), the topics will probably include:

  • Javascript frameworks for data visualisation
  • Statistical approaches to problem solving
  • An economist’s view into data science
  • Art in the realm of Big Data
  • Data clustering techniques
  • High performance Python processing

Our Call for Proposals is open until the end of the month, I’m really keen to see stories and tutorials around the solving of interesting problems with data. Whilst the conference is themed for Python I’m keen to see proposals that use other languages (or no language – art & stats!) to do interesting things around data. I’m more focused on interesting topics and lively discussion. Are there any R, Matlab and Julia users who’d like to share their experience?

Please do consider putting forward a proposal, the conversation that will come out of the conference looks to be rather interesting already. If you’ve never spoken at a conference before then this would be a rather ideal place to start.


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

8 Comments | Tags: Python