About

Ian Ozsvald picture

This is Ian Ozsvald's blog, I'm an entrepreneurial geek, a Data Science/ML/NLP/AI consultant, founder of the Annotate.io social media mining API, author of O'Reilly's High Performance Python book, co-organiser of PyDataLondon, co-founder of the SocialTies App, author of the A.I.Cookbook, author of The Screencasting Handbook, a Pythonista, co-founder of ShowMeDo and FivePoundApps and also a Londoner. Here's a little more about me.

High Performance Python book with O'Reilly View Ian Ozsvald's profile on LinkedIn Visit Ian Ozsvald's data science consulting business Protecting your bits. Open Rights Group

26 June 2014 - 14:08PyDataLondon second meetup (July 1st)

Our second PyDataLondon meetup will be running on Tuesday July 1st at Pivotal in Shoreditch. The announce went out to the meetup group and the event was at capacity within 7 hours – if you’d like to attend future meetups please join the group (and the wait-list is open for our next event). Our speakers:

  1. Kyran Dale on “Getting your Python data onto a Browser” – Python+javascript from ex-academic turned Brighton-based freelance Javascript Pythonic whiz
  2. Laurie Clark-Michalek – “Defence of the Ancients Analysis: Using Python to provide insight into professional DOTA2 matches” – game analysis using the full range of Python tools from data munging, high performance with Cython and visualisation

We’ll also have several lightning talks, these are described on the meetup page.

We’re open to submissions for future talks and lightning talks, please send us an email via the meetup group (and we might have room for 1 more lightning talk for the upcoming pydata – get in contact if you’ve something interesting to present in 5 minutes).

Some other events might interest you – Brighton has a Data Visualisation event and recently Yves Hilpisch ran a QuantFinance training session and the slides are available. Also remember PyDataBerlin in July and EuroSciPy in Cambridge in August.

 


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

No Comments | Tags: Data science, Life, pydata, Python

23 June 2014 - 22:47High Performance Python manuscript submitted to O’Reilly

I’m super-happy to say that Micha and I have submitted the manuscript to O’Reilly for our High Performance Python book. Here’s the final chapter list:

  • Understanding Performant Python
  • Profiling to find bottlenecks (%timeit, cProfile, line_profiler, memory_profiler, heapy and more)
  • Lists and Tuples (how they work under the hood)
  • Dictionaries and Sets (under the hood again)
  • Iterators and Generators (introducing intermediate-level Python techniques)
  • Matrix and Vector Computation (numpy and scipy and Linux’s perf)
  • Compiling to C (Cython, Shed Skin, Pythran, Numba, PyPy) and building C extensions
  • Concurrency (getting past IO bottlenecks using Gevent, Tornado, AsyncIO)
  • The multiprocessing module (pools, IPC and locking)
  • Clusters and Job Queues (IPython, ParallelPython, NSQ)
  • Using less RAM (ways to store text with far less RAM, probabilistic counting)
  • Lessons from the field (stories from experienced developers on all these topics)

August is still the expected publication date, a soon-to-follow Early Release will have all the chapters included. Next up I’ll be teaching on some of this in August at EuroSciPy in Cambridge.

Some related (but not covered in the book) bit of High Performance Python news:

  • PyPy.js is now faster than CPython (but not as fast as PyPy) – crazy and rather cutting effort to get Python code running on a javascript engine through the RPython PyPy toolchain
  • Micropython runs in tiny memory environments, it aims to runs on embedded devices (e.g. ARM boards) with low RAM where CPython couldn’t possibly run, it is pretty advanced and lets us use Python code in a new class of environment
  • cytools offers Cython compiled versions of the pytoolz extended iterator objects, running faster than pytoolz and via iterators probably using significantly less RAM than when using standard Python containers

Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

No Comments | Tags: Data science, High Performance Python Book, Python

19 June 2014 - 16:34Flask + mod_uwsgi + Apache + Continuum’s Anaconda

I’ve spent the morning figuring out how to use Flask through Anaconda with Apache and uWSGI on an Amazon EC2 machine, side-stepping the system’s default Python. I’ll log the main steps in, I found lots of hints on the web but nothing that tied it all together for someone like me who lacks Apache config experience. The reason for deploying using Anaconda is to keep a consistent environment against our dev machines.

First it is worth noting that mod_wsgi and mod_uwsgi (this is what I’m using) are different things, Flask’s Apache instructions talk about mod_wsgi and describes mod_uwsgi for nginx. Continuum’s Anaconda forum had a hint but not a worked solution.

I’ve used mod_wsgi before with a native (non-Anaconda) Python installation (plus a virtualenv of numpy, scipy etc), I wanted to do something similar using an Anaconda install of an internal recommender system for a client.  The following summarises my working notes, please add a comment if you can improve any steps.

  • Setup an Ubuntu 12.04 AMI on EC2
  • source activate production  # activate the Anaconda environment
  •   (I'm assuming you've setup an environment and
  •   put your src onto this machine)
  • conda install -c https://conda.binstar.org/travis uwsgi
  •   # install uwsgi 2.0.2 into your Anaconda environment
  •   using binstar (other, newer versions might be available)
  • uwsgi --http :9090 --uwsgi-socket localhost:56708
  •   --wsgi-file <path>/server.wsgi
  •   # run uwsgi locally on a specified TCP/IP port
  • curl localhost:9090  # calls localhost:9090/ to test
  •   your Flask app is responding via uwsgi

If you get uwsgi running locally and you can talk to it via curl then you’ve got an installed uwsgi gateway running with Anaconda – that’s the less-discussed-on-the-web part done.

Now setup Apache:

  • sudo apt-get install lamp-server^
  •   # Install the LAMP stack
  • sudo a2dissite 000-default
  •   # disable the default Apache app
  • # I believe the following is sensible but if there's
  •   an easier or better way to talk to uwsgi, please
  •   leave me a comment (should I prefer unix sockets maybe?)
  • sudo apt-get install libapache2-mod-uwsgi  # install mod_uwsgi
  • sudo a2enmod uwsgi  # activate mod_uwsgi in Apache
  • # create myserver.conf (see below) to configure Apache
  • sudo a2ensite myserver.conf
  •   # enable your server configuration in Apache
  • service apache2 reload  # somewhere around now you'll have
  •   to reload Apache so it sees the new configurations, you
  •   might have had to do it earlier

My server.wsgi lives in with my source (outside of the Apache folders), as noted in the Flask wsgi page it contains:

import sys
sys.path.insert(0, "<path>/mysource")
from server import app as application

Note that it doesn’t need the virtualenv hack as we’re not using virtualenv, you’ve already got uwsgi running with Anaconda’s Python (rather than the system’s default Python).

The Apache configuration lives in /etc/apache2/sites-available/myserver.conf and it has only the following lines (credit: Django uwsgi doc), note the specified port is the same as we used when running uwsgi:

<VirtualHost *:80>
  <Location />
    SetHandler uwsgi-handler
    uWSGISocket 127.0.0.1:56708
  </Location>
</VirtualHost>

Once Apache is running, if you stop your uwsgi process then you’ll get 502 Bad Gateway errors, if you restart your uwsgi process then your server will respond again. There’s no need to restart Apache when you restart your uwsgi process.

For debugging note that /etc/apache2/mods-available/ will contain uwsgi.load once mod_uwsgi is installed. The uwsgi binary lives in your Anaconda environment (for me it is ~/anaconda/envs/production/bin/uwsgi), it’ll only be active once you’ve activated this environment. Useful(ish) error messages should appear in /var/log/apache2/error.log. uWSGI has best practices and a FAQ.

Having made this run at the command line it now needs to be automated. I’m using Circus. I’ve installed this via the system Python (not via Anaconda) as I wanted to treat it as being outside of the Anaconda environment (just as Upstart, cron etc would be outside of this environment), this means I needed a bit of tweaking. Specifically PATH must be configured to point at Anaconda and a fully qualified path to uwsgi must be provided:

#circus.ini
[circus]
check_delay = 5
endpoint = tcp://127.0.0.1:5555
pubsub_endpoint = tcp://127.0.0.1:5556

[env:myserver]
PATH=/home/ubuntu/anaconda/bin:$PATH

[watcher:myserver]
cmd = <path_anaconda>/envs/production/bin/uwsgi
args = --http :9090 --uwsgi-socket localhost:56708  
  --wsgi-file <config_dir>/server.wsgi 
  --chdir <working_dir>
warmup_delay = 0
numprocesses = 1

 

This can be run with “circusd <config>/circus.ini –log-level debug” which prints out a lot of debug info to the console, remember to run this with a login shell and not in the Anaconda environment if you’ve installed it without using Anaconda.

Once this works it can be configured for control by the system, I’m using systemd on Ubuntu via the Circus Deployment instructions with a /etc/init/circus.conf script, configured to its own directory.

If you know that mod_wsgi would have been a better choice then please let me know (though dev for the project looks very slow [it says "it is resting"]), I’m experimenting with mod_uwsgi (it seems to be more actively developed) but this is a foreign area for me, I’d be happy to learn of better ways to crack this nut. A quick glance suggests that both support Python 3.


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

No Comments | Tags: Data science, Python

9 June 2014 - 22:297 chapters of “High Performance Python” now live

O’Reilly have just released another update to our High Performance Python book, in total we’ve now released the following:

  • Understanding Performance Python
  • Profiling to find bottlenecks (%timeit, cProfile, line_profiler, memory_profiler, heapy)
  • Lists and Tuples
  • Dictionaries and Sets
  • Iterators and Generators
  • Matrix and Vector Computation (numpy and scipy)
  • Compiling to C (Cython, Shed Skin, Pythran, Numba, PyPy)

We’re in the final edit cycle, we have a lot of edits to commit to the main chapters over the next week for the next Early Release. All going well the book will be published in August.


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

No Comments | Tags: High Performance Python Book, Python

4 June 2014 - 22:30First PyDataLondon meetup done, preparing the second

Last night we ran our first PyDataLondon meetup (@PyDataLondon). We had 80 data-focused Pythonistas in the room, co-organiser Emlyn lead the talks followed by a great set of Lightning Talks. Pivotal provided a cool venue (thanks Ian Huston!) with lovely pizza and beer in central Shoreditch – we’re much obliged to you. This was a grand first event and we look forward to running the next set this summer. Our ModelInsight got to sponsor the beers for everyone after, it was lovely to see everyone in the pub – helping to bind our young community is one of our goals for this summer.

Emlyn opened with a discussion on “MATLAB and Python for Life Sciences” covering syntax similarities, ways to port MATLAB libraries to Python and hardware interfacing:

pydatalondon_20140605_emlyn

After the break we had a wide range of lightning talks:

Here’s Jacqui talking on Viz using Python and D3 and introducing her part in the new Data Journalism book:

pydatalondon_20140605_jacqui

During the night I asked some questions of the audience. We had a room of mostly active Python users (mainly beginner or intermediate), the majority worked with data science on a weekly basis, almost all using Python 2 (not 3). 6 used R, 2 used MATLAB and 1 used Julia (and I’m still hoping to learn about Julia). A part of the reason for the question is that I’m interested in learning who needs what in our new community, I’m planning on re-running my 2 day High Performance Python tutorial in London in a couple of months and we aim to run an introduction to data science using Python too (mail me if you want to know more).

We’re looking for talk proposals for next month and the month after along with lightning talk proposals – either mail me or post via the meetup group (but do it quick).

I totally failed to remind everyone about the upcoming PyDataBerlin conference in Berlin in July, it runs inside EuroPython at the same venue (so come and stay all week, a bunch of us are!). I also forgot to announce EuroSciPy which runs here in Cambridge in August, you should definitely come to that too, I believe I’m teaching more High Performance Python.

The next event will be held on July 1st at the same location, keep an eye on the meetup group for details. I’m hoping next time to maybe put forward a Lightning Talk around my High Performance Python book as hopefully it’ll be mostly finished by then.

Thanks to my co-organisers Emlyn and Cecilia (and Florian – get well soon)!


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

No Comments | Tags: Life, pydata, Python

2 June 2014 - 20:02New High Performance Python chapters online & teaching a 2 day course on HPC

The last month has been crazy busy, not least because I got to run my first High Performance Python 2 day tutorial at a university. I was out in Aalborg University teaching a PhD group, we covered four blocks:

  1. Profiling (CPU and RAM)
  2. Compilers and JITs
  3. Multi-core and distributed
  4. Using less RAM, storage systems and lessons

UPDATE As of October 2014 I’ll be teaching High Performance Python and Data Science in London, sign-up here to join our announce list (no spam, just occasional updates about our courses).

Here’s a picture of my class, it all went rather swimmingly. I plan to run the same class in London in the coming months (details to follow):

class_aalborg_teaching

On the same note we pushed some more chapters for our High Performance Python book on to O’Reilly’s build system a week back, we now have:

  • Introduction
  • Performant Python
  • Tuples and Dictionaries
  • Iterators and Generators
  • Profiling
  • Matrices with numpy
  • Compiling and JITs

More chapters will go live in a couple of weeks, we’re in the final editing phase now.

Don’t forget that PyDataBerlin is coming up in a couple of months, it runs during EuroPython. If you’re out for EuroPython then it makes a lot of sense to go to PyDataBerlin too :-)


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

No Comments | Tags: High Performance Python Book, Python