About

Ian Ozsvald picture

This is Ian Ozsvald's blog, I'm an entrepreneurial geek, a Data Science/ML/NLP/AI consultant, founder of the Annotate.io social media mining API, author of O'Reilly's High Performance Python book, co-organiser of PyDataLondon, co-founder of the SocialTies App, author of the A.I.Cookbook, author of The Screencasting Handbook, a Pythonista, co-founder of ShowMeDo and FivePoundApps and also a Londoner. Here's a little more about me.

High Performance Python book with O'Reilly View Ian Ozsvald's profile on LinkedIn Visit Ian Ozsvald's data science consulting business Protecting your bits. Open Rights Group

10 February 2013 - 14:28Applied Parallel Computing at PyCon 2013 (March)

Minesh B. Amin (MBA Sciences) and I (Mor Consulting) are teaching Applied Parallel Computing at PyCon in San Jose in just over a month, here’s an outline of the tutorial. The conference is sold out but there’s still tickets for the tutorials (note that they’re selling quickly too).

Typically a recording of the tutorial is released a couple of months after PyCon to PyVideo – you miss out on the networking but you can at least catch up on the material. The source code will also be released.

Our tutorial uses a lot of tools so we’re providing a VirtualBox image (32 bit requiring about 5GB of disk space, runs on Win/Lin/Mac). Those who choose not to use the VBox image will have to install the requirements themselves, for some parts this is a bit tough so we strong recommend using the VBox image. Details of the image will be provided to students a few weeks before the conference.

Parts of my tutorial build on my PyCon 2012 High Performance Python 1 tutorial. You might also be interested in the (slightly vague!) idea I have of writing a book on these topics – if so you should add your name to my High Performance Python Mailing List (it is an announce list for when/if I make progress on this project, very lightweight).

This year’s 3 hour tutorial is split into five sections:

  1. Types of parallelism
  2. Hard-won lessons in building reliable/debuggable/extensible parallel systems
  3. “List of tasks” – solving a Mandelbrot task using multiprocessing (single machine), parallelpython (can run multi-machine), redis queue (multi machine and language)
  4. “Map/reduce” – investigating and understanding a set of Tweets using Disco, practical guide to configuration, visualisation with word-cloud and matplotlib, possibly moving on to social network connectivity analysis and visualisation
  5. “Hyperparameter optimisation” – solving a many-paramemter optimisation problem whose parameter space is not fixed at the start of the run

During the Mandelbrot solver we’ll look at where the complexity lies in generating an image like this:

Mandelbrot Surface

During the Disco problem we’ll visualise the results using Andreas’ word-cloud tool, we may also cover the use of map/reduce for social network exploration:

Word-cloud of Apple mentions

Install requirements will be announced closer to the tutorial along with the (recommended!) VirtualBox image. I’m probably providing more material than we can cover for my two sections (Mandelbrot, Disco – how far we get depends on the size and capabilities of the class), all the material will be provided for keen students to continue and we’ll run an after-class session for those with more questions.

 


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

No Comments | Tags: Data science, Python

Comments are closed.