About

Ian Ozsvald picture

This is Ian Ozsvald's blog, I'm an entrepreneurial geek, an A.I. consultant, author of the A.I.Cookbook, professional screencast producer, author of The Screencasting Handbook, a Pythonista, co-founder of ShowMeDo and FivePoundApps and also a Brightonian. Here's a little more about me.

View Ian Ozsvald's profile on LinkedIn Protecting your bits. Open Rights Group

26 January 2010 - 14:01pyCUDA on Windows and Mac for super-fast Python math using CUDA

I’ve just started to play with pyCUDA which lets you run parallel math operations on a CUDA-compliant NVidia graphics card through Python.

CUDA stands for Compute Unified Device Architecture – it is an architecture that lets us program the Graphics Processing Unit (GPU) on a high powered graphics card to do scientific or graphical math calculations rather than the usual texture processing for games.  In essence it is a mini supercomputer that is specialised just for fast math operations – if you can figure out how to use it.

The goal is to off-load the CPU-intensive calculations for two of my clients (a physics company and a flood modelling company) to achieve 10* to 100* speed-ups using commodity graphics cards.

pyCUDA makes it easy to interactively program a CUDA device rather than hitting C++ code with the slow write/compile/debug loop.  Recent MacBooks (mine was bought in January 2009) have NVidia cards with CUDA-compatible devices built-in (mine is a 9400M).  For my desktop computer I have a 9800 GT (costing £100).

It turns out that this is bleeding-edge stuff – getting pyCUDA compiled on my MacBook and Win XP machine took some time (forum posts for Mac and Windows issues) thankfully the group is helpful and the wiki has an installation section for Windows, Mac and Linux and some reasonable documentation.

Right now I’ve got as far as running some of the demo code on my MacBook (showing a 5* speed-up over the CPU) and my desktop (showing a 30* speed-up over the CPU).  I’ll report more as I progress.

Update – pyCUDA works inside IPython too, lovely.

Update – I don’t have OpenGL working for gl_interop.py but as noted here you need “CUDA_ENABLE_GL = True” in siteconf.py and you need PyOpenGL installed.  When rebuilding my MSVC threw a hissy fit, it isn’t essential to my work so I’m skipping this demo.

Update – I’ve submitted a patch and two examples to the wiki (SimpleSpeedTest, Mandelbrot). I get 200* speed-ups on the speed test (using a for loop on a sin() calculation) and 5 to 20* speed-up on Mandelbrots (it seems to scale very well vs numpy with increasing dimensions).

Update – There are lots of interesting papers for CUDA surfacing like this one showing a 3* speed-up for voice recognition tasks (using CPU and GPU together) and yet another way to improve fluid dynamic simulations. This Tom’s 3D article gives a great write-up (starting with the history of audio cards) on where 3D is right now and how NVidia is beating ATI for scientific computing.

Books to read:

The following CUDA books will help you understand the basics of CUDA programming – I particularly like the first (Kirk and Hwu).


Ian applies Artificial Intelligence for companies (Mor Consulting), programs Python, produces professional screencasts (ProCasts), writes The Screencasting Handbook and is also a sea-side dweller and consumer of fine coffees.

No Comments | Tags: Python

Add a Comment