Ian Ozsvald picture

This is Ian Ozsvald's blog, I'm an entrepreneurial geek, a Data Science/ML/NLP/AI consultant, founder of the Annotate.io social media mining API, author of O'Reilly's High Performance Python book, co-organiser of PyDataLondon, co-founder of the SocialTies App, author of the A.I.Cookbook, author of The Screencasting Handbook, a Pythonista, co-founder of ShowMeDo and FivePoundApps and also a Londoner. Here's a little more about me.

High Performance Python book with O'Reilly View Ian Ozsvald's profile on LinkedIn Visit Ian Ozsvald's data science consulting business Protecting your bits. Open Rights Group

25 July 2011 - 9:42High Performance Python tutorial v0.2 (from EuroPython 2011)

My updated High Performance Python tutorial is now available as a 55 page PDF. The goal is to take you on several journeys which show you different ways of making Python code run much faster (up to 75* on the CPU, faster with a GPU).

UPDATE (2014) I’ve written the High Performance Python book with O’Reilly, it is publishing this year.

UPDATE 1 this talk is superseded by my High Performance Python 1 tutorial from PyCon 2012.

UPDATE 2 I’m thinking of writing an updated guide, if you’re interested in hearing about it please join the High Performance Python Mailing List (I’ve only got a list right now). I’ll make an announce once I know more.

UPDATE 3 (Sept 2012) I was missing the EuroPython video, I’ve embedded it down below.

This is an update to the 49 page v0.1 I published three weeks ago after running the tutorial at EuroPython 2011 in Florence.

Topics covered:

  • Python profiling (cProfile, RunSnake, line_profiler) – find bottlenecks
  • PyPy – Python’s new Just In Time compiler, a note on the new numpy module
  • Cython – annotate your code and compile to C
  • numpy integration with Cython – fast numerical Python library wrapped by Cython
  • ShedSkin – automatic code annotation and conversion to C
  • numpy vectors – fast vector operations using numpy arrays
  • NumExpr on numpy vectors – automatic numpy compilation to multiple CPUs and vector units
  • multiprocessing – built-in module to use multiple CPUs
  • ParallelPython – run tasks on multiple computers
  • pyCUDA – run tasks on your Graphics Processing Unit
  • Other algorithmic choices and options you have

The improvement over the last version (v0.1) is that I’ve filled in all the sections now including pyCUDA (there are still a few IAN_TODOs marked, I hope to finish these in a future v0.3). I’ve also added a short section on Algorithmic Choices, link to the new Cython prange operator and show the new numpy module in PyPy.

Here’s the video from EuroPython (via pyvideo.org):

The source code is on my github page. The original slides are on slideshare too. If you’re after a challenge then at the end of the report I suggest some ported versions of the code that I’d like to see.

The report is licensed Creative Commons by Attribution (please link back here) – I’ll also happily accept a beer if you meet me in person! If you’re curious about this sort of work then note that I offer A.I. and high performance computing consulting and training via my Mor Consulting.

Update – ShedSkin 0.9 adds faster complex number support. I haven’t added it to the report yet, evidence in the ShedSkin Group suggests it gets closer to the non-complex-number version (i.e. you don’t have to do more work but you get a nice speed boost whilst still using complex numbers).

Update (Nov 2011) – Antonio and Armin posted a note which explains some of the slowness in PyPy and show how it is competitive, under the right conditions. Armin also contributed a C version which shows PyPy to run as fast as C (for their chosen configuration).

Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight and Mor Consulting, founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

10 Comments | Tags: ArtificialIntelligence, Python

13 July 2011 - 22:40SocialTies is coming

Slowly but surely we’re getting there with our social-discovery app for conferences. We aim to have SocialTies in the UK iPhone App Store in the next few weeks. I demoed it to several hundred folk at EuroPython a few weeks back and it was rather well received.

Currently the Android BETA is linked from the homepage, Emily is working on the iPhone version and we figure it is time to make it public (albeit just in the UK at first). Once we’ve had the initial round of feedback we’ll open it up to US and European users (I have to do some server-side plumbing for that to work yet). We’ll also put the Android v1 into the Android App Store shortly.

It is definitely just-out-of-beta, having said that we get good feedback and we’ve got users waiting for the iPhone release. If you’re curious, visit the site and add your email to the announce mailing list, we’ll let you know when the iPhone version is published.

Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight and Mor Consulting, founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

No Comments | Tags: Life

13 July 2011 - 19:25Dell E6420 with Ubuntu 11.04 (Natty Narwahl) 64 bit

I’ve just treated myself to a quad core (8 virtual core) Dell E6420 with 8GB RAM, 128GB SSD, NVIDIA NVS 4200M GPU (with integrated Intel GPU) and the high spec screen. It is rather nice. It comes with Win 7 Pro installed (fine for some libraries I’ll need and maybe some MS Office), I’ve just installed Ubuntu 11.04 64 bit. Obviously there were some hiccups…

Update – I’ve regressed to the 10.10 drivers from Dell (see: E6420 with Ubuntu 10.10) which seems to fix all the problems I had with 11.04.

Out of the box Ubuntu Natty Narwahl installed just fine, it took about 10 minutes from CD. It reported on first boot that I had an incompatible graphics card. This is because the Optimus NVIDIA technology (which swaps between low-power mode on the Intel card and high-power GPU mode with the 4200M) only works on Windows. I rebooted, dropped to the BIOS and disabled Optimus, after this I got no more warnings from X Windows. Details on Optimus here.

On my next boot I asked the Additional Drivers system to enable my NVIDIA card. It installed, then I had to run ‘sudo nvidia-xconfig’. This worked fine. Note that at first (before disabling Optimus) I tried this and got a ‘VALIDATION ERROR: Data incomplete in file /etc/X11/xorg.conf.’ error – just disable Optimus and that problem goes away. You might need to manually move your /etc/X11/xorg.conf.backup file to xorg.conf if you’ve lost graphics along the way.

The touchpad (as a basic mouse), sound and wifi worked fine. Flash ran in YouTube but there was no sound – under the Sound dialog I had to go to Output and choose ‘Internal Audio Analogue Stereo’ rather than ‘HDA NVIDIA Digital Stereo (HDMI)’, in the background the YouTube video that was playing suddenly played through the speakers.

The touchpad doesn’t have multitouch features yet – there’s a proprietary driver on Windows which doesn’t exist on Linux yet (though progress is being made). Details here. If you touch the touchpad and it makes a click and you don’t like the behaviour, disable it here. Personally I’m happy with the touch behaviour.

I upgraded to the latest NVIDIA drivers using:

  • sudo add-apt-repository ppa:ubuntu-x-swat/x-updates
  • sudo apt-get update
  • sudo apt-get install nvidia-current
  • sudo apt-get install nvidia-settings

After a reboot I could see the latest drivers (via Synaptic). Running a video in VLC took no extra CPU (meaning that the work is done on the GPU via VDPAU). There is an upcoming project called Bumblebee that will let us use the lower-power Intel GPU for normal graphics and will only switch to the NVIDIA card for intensive work (I want it for CUDA programming) but for now it looks like a bit of a faff. I’m just going to leave the NVIDIA 4200M running full time.

I’m also happy to see that HDMI support worked out of the box – I plugged in a cable, then had to go to the NVIDIA Settings tool to Auto-detect the monitors, then enabled TwinView and had a double-width monitor setup. The built-in Monitor tool didn’t see my extra monitor (but I guess this is all controlled by the NVIDIA stuff).

I’ve had about 3 hours for this session on the regular 6 Cell battery, that included downloading a lot of stuff and rebooting a number of times. This seems reasonable given that the NVIDIA 4200M is power hungry and runs all the time. Apparently in Windows I’d see up to 7 hours on the same hardware if Optimus is enabled. Ho hum.

I’m very impressed with the SSD – it is silent and everything feels much snappier. It was worth spending a few hundred extra pounds and losing a lot of space, the experience is far nicer. The screen is also beautiful (though the viewing angle isn’t amazing – but fine for single-use).

EDIT – installing CUDA 4 takes 10 minutes with these great instructions. It is fun to see the GPU clocking in at 90 degrees C whilst running randomFog, smokeParticles and the nbody demos.

EDIT – this CPU Frequency meter is nice as is this CPU/MEM/GPU meter.

EDIT – installing mongodb auto-starts it, it is controlled as a system service using ‘sudo [start|stop] mongodb’ as detailed here.

EDIT – installing matplotlib on Ubuntu 11.04 was a touch annoying. ‘pip install’ got a really old version (0.91!). Instead I grabbed the src for 1.0.1 and then manually had to install libfreetype6-dev2.4.4-1ubuntu2 and libpng12-dev. After that the usual setup.py process worked fine.

EDIT – to fix the hang-on-reboot issue that I’ve noticed I followed this and added “reboot=pci” as noted to /etc/default/grub (and then ran ‘sudo update-grub’). Now reboots work correctly (previously only a Shutdown would work correctly).

EDIT – I was running laptop-mode-tools and PowerTop reported 13-15W usage. I’ve disabled it for now as I have a Suspend/Hibernate bug, without laptop-mode PowerTop is reporting 15-18W usage (both after boot, doing almost nothing – the CUDA card is power hungry!).

Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight and Mor Consulting, founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

2 Comments | Tags: Ubuntu