Entrepreneurial Geekiness
pyCUDA on Windows and Mac for super-fast Python math using CUDA
I’ve just started to play with pyCUDA which lets you run parallel math operations on a CUDA-compliant NVidia graphics card through Python.
Update – I’ve written a High Performance Python tutorial (July 2011, 55 pages) which covers pyCUDA and other technologies, you might find it useful.
CUDA stands for Compute Unified Device Architecture – it is an architecture that lets us program the Graphics Processing Unit (GPU) on a high powered graphics card to do scientific or graphical math calculations rather than the usual texture processing for games. In essence it is a mini supercomputer that is specialised just for fast math operations – if you can figure out how to use it.
The goal is to off-load the CPU-intensive calculations for two of my clients (a physics company and a flood modelling company) to achieve 10* to 100* speed-ups using commodity graphics cards.
pyCUDA makes it easy to interactively program a CUDA device rather than hitting C++ code with the slow write/compile/debug loop. Recent MacBooks (mine was bought in January 2009) have NVidia cards with CUDA-compatible devices built-in (mine is a 9400M). For my desktop computer I have a 9800 GT (costing £100).
It turns out that this is bleeding-edge stuff – getting pyCUDA compiled on my MacBook and Win XP machine took some time (forum posts for Mac and Windows issues) thankfully the group is helpful and the wiki has an installation section for Windows, Mac and Linux and some reasonable documentation.
Right now I’ve got as far as running some of the demo code on my MacBook (showing a 5* speed-up over the CPU) and my desktop (showing a 30* speed-up over the CPU). I’ll report more as I progress.
Update – pyCUDA works inside IPython too, lovely.
Update – I don’t have OpenGL working for gl_interop.py but as noted here you need “CUDA_ENABLE_GL = True” in siteconf.py and you need PyOpenGL installed. When rebuilding my MSVC threw a hissy fit, it isn’t essential to my work so I’m skipping this demo.
Update – I’ve submitted a patch and two examples to the wiki (SimpleSpeedTest, Mandelbrot). I get 200* speed-ups on the speed test (using a for loop on a sin() calculation) and 5 to 20* speed-up on Mandelbrots (it seems to scale very well vs numpy with increasing dimensions).
Update – There are lots of interesting papers for CUDA surfacing like this one showing a 3* speed-up for voice recognition tasks (using CPU and GPU together) and yet another way to improve fluid dynamic simulations. This Tom’s 3D article gives a great write-up (starting with the history of audio cards) on where 3D is right now and how NVidia is beating ATI for scientific computing.
Books to read:
The following CUDA books will help you understand the basics of CUDA programming – I particularly like the first (Kirk and Hwu).
Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.
Come to my screencasting SkillSwap in Brighton on Jan 27th
On January 27th here in Brighton I’m co-running a SkillSwap evening, I’ll spend 45 minutes teaching screencasting (based on a Mac) and Andy White will spend 45 minutes teaching podcasting. We’ll cover planning, recording, editing, distributing and mics between us.
We’re both aiming the talks at freelancers (so they can communicate better with clients) and small companies (for training, marketing and demos). We’re also the authors of The Screencasting Handbook and Podcasting Unleashed.
I’ll cover at least these topics:
- Free and commercial tools on a Mac (and Windows/Linux if requested)
- Recording your first screencast with Jing and hosting it on the Web
- Planning your screencast so it meets the needs of your audience
- The differences between a sales/marketing screencast and a tutorial
- Using ScreenFlow to record, edit and produce a screencast and then upload it to YouTube
- Hosting your own screencast and other distribution options
If you bring a laptop then I can get you started with the free Jing so you can walk away with a recording and hosting solution for Mac and Windows.
If you’re in Brighton then the event is free, see details in Upcoming and sign-up on EventBrite. SkillSwap has been running for years – cheers to Nat and James for finding a spot for us.
Madgex will be sponsoring beer and nibbles, the atmosphere will be relaxed and friendly. Nat is recording the audio for a podcast and I intend to record a video of the evening for distribution via Vimeo (but of course that won’t be the same as being there and being able to ask questions!).
Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.
Text to Speech – Festival (cross platform) and MacSpeechX (Python on Mac)
I wanted to play with text to speech, I’ve been looking for a cross-platform open-source solution that sounds reasonable. I’m really impressed with the festival project, the web demo lets you enter your own text.
Update – I’m including this post in my plans for an Artificial Intelligence Handbook.
Festival is cross-platform but compiling it on a Mac takes a touch of effort (it looks like it is easier on Linux and Win).
This article shows you how to use it and how to web-enable it with some php. For the simplest demo I used ‘bin/text2wave input.txt -o output.wav’ with input.txt containing a sentence.
To get started, get the latest code. I have v1.96beta. You may also want the official festlang-talk list and possibly this more complete archive.
Compiling speech_tools-1.2.96-beta.tar.gz
It ought to have been as simple as ‘make clean; make’ but there’s a few changes to make first. First we need this fix or we get a compile error in macosxaudio in kAudioUnitProperty_SetInputCallback:
If you add #include <AudioUnit/AUNTComponent.h> after the include block on lines 45-48 in audio/macosxaudio.cc the problem should be solved. By the way, remember to change the byte order if you have an intel mac, i.e. on line 131: waveformat.mFormatFlags = kLinearPCMFormatFlagIsSignedInteger | kLinearPCMFormatFlagIsPacked; // For Intel | kLinearPCMFormatFlagIsPacked; // For PowerPC | kLinearPCMFormatFlagIsPacked | kLinearPCMFormatFlagIsBigEndian;
The following was a trickier error to solve:
g++ -c -fno-implicit-templates -O3 -Wall -I../include sigpr_frame.cc sigpr_frame.cc: In function ‘void lpc2cep(const EST_FVector&, EST_FVector&)’: sigpr_frame.cc:318: error: ‘__isnan’ was not declared in this scope make[1]: *** [sigpr_frame.o] Error 1 make: *** [sigpr] Error 2
The fix was known but the relevant archive was missing, some googling for ‘__isnan mac‘ results in this cached 2006 page:
--- ../test/speech_tools/include/EST_math.h 2006-08-03 08:49:35.000000000 -0500 +++ include/EST_math.h 2006-08-17 17:53:33.000000000 -0500 @@ -43,7 +43,7 @@ #if defined(__APPLE__) /* Not sure why I need this here, but I do */ -extern "C" int isnan(double); +extern "C" int isnan(float); #endif /* this isn't included from c, but just to be safe... */ @@ -101,7 +101,6 @@ /* Apple OSX */ #if defined(__APPLE__) #define isnanf(X) isnan(X) -#define isnan(X) __isnan(X) #endif /* FreeBSD *and other 4.4 based systems require anything, isnanf is defined */
Compiling festival-1.96-beta.tar.gz
Once speech-tools is compiled, getting ‘festival-1.96-beta.tar.gz’ compiled is as easy as ‘make clean;make’.
Python’s MacSpeechX
I also had a play with the macspeechx module which ties Python to the Mac’s voice-synthesiser. See list_voice_name() in macspeechX.py for an example of how it all works.
It works to power the speech synthesiser but it doesn’t appear to let you record the speech to a file (unlike festival above).
Update – Mike Driscoll has a post about pyTTS which hooks into Microsoft’s SAPI on Windows and pyTTSX which is cross-platform, along with some speech recognition links.
Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.
ConceptNetDaily Twitter Bot
I’ve just launched my second Twitter bot – @ConceptNetDaily takes a random concept from the A.I. site ConceptNet and posts it to Twitter with a link back to the site. A tweet looks like:
“When humans own horses, humans groom and ride horses.” http://tinyurl.com/ydvf7vg
The TinyURL expands out to an address like: http://openmind.media.mit.edu/en/assertion/143313/
The aim of the site is to build a large repository of common-sense knowledge, exactly the kind of knowledge that humans take for granted and never write down as statements for a computer to understand. Currently it tracks over 1,026,553 statements.
Using the link you can vote on the concept. Vote up if the concept is solid (i.e. something a human would say is ‘right’) or down if it is wrong, silly or erroneous. The site supports OpenID which makes starting a touch easier.
My goal with this bot is to remind people every day to vote on the concepts and to add new knowledge. If a concept has many votes then we can have faith that it is ‘common-sense knowledge’. If a concept is voted down enough then we can have faith that it is ‘unhelpful or wrong’.
You’ll find a searchable list of Concepts and some random examples on the English homepage. For good examples see all the information that ConceptNet knows about humans, chess and girls.
Details:
I’ve written the bot in Python using PyYAML, Python-tinyurl and Python-twitter. It runs every day via a cron job. It works by guessing a random id for a raw_assertion and checking to see if a concept lives at the URL. See this XML example for id 143313, I extract the .yaml version via PyYAML but the .xml version renders nicely in your browser if you want a peek.
ConceptNet’s web API is well documented. ConceptNet itself is written in Python using Django but I’m not using the downloaded version here, just the web API.
My first Twitter bot – @BrightonJobDoom:
Just in case you live here in Brighton you might want to track @BrightonJobDoom to see how healthy (or…not) the job market is in the UK during this rather wobbly recession 🙂 I wrote this bot for our £5 App’s 5k coding competition.
Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.
Eucalyptus Clustering – follow-up
A month back I tried to build an Ubuntu-based Eucalyptus cloud/cluster environment for a client for a parallel processing research project. The project was thwarted by an overly aggressive corporate firewall and my lack of understanding of low-level network config-fu.
I’ve revisited the project using the same machines but with an external public internet connection (no firewall – yay!).
Grub2
On the node machine I still needed to dual-boot to Windows. Unfortunately whilst reboots to Linux are fine, if Windows is booted it ‘does something’ to the MBR and the machine is unbootable. I delved into the boot-loader and had to learn some Grub2-fu.
Grub2 was introduced in Ubuntu 9.10, it replaces Grub which in turn replaced boot managers like lilo. The wiki page is pretty good for recovering a boot-loader using an Ubuntu LiveCD but it didn’t work quite to plan.
The step for ‘sudo chroot /mnt’ fails as bash or sh can’t be run from within /mnt (which at this point is looking at the originally installed hd). There is something odd going on with the LiveCD, much googling didn’t seem to reveal the answer.
To run grub-install on the hd, rather than via the CD (because chroot fails) I used ‘sudo grub-install –root-directory=/mnt /dev/sda’, it reports that ‘(hd0) /dev/sda’ is installed.
Sidenote – on later attempts somehow a reference to (fd0) got involved and this broke the boot process. I edited /mnt/boot/grub/device.map to remove the fd0 reference, leaving the hd0 reference. I ran grub-install again and all was fine. Now the machine can boot again.
Mounting a USB memory stick
Whilst a 8Gb memory stick was recognised, it didn’t get mounted. I had to edit /etc/fstab and add:
/dev/sdf1 /mnt/stick auto umask=0,user,iocharset=iso8859-1,sync,codepage=850,noauto,exec,users 0 0
After this I used ‘sudo mkdir /mnt/stick’, ‘sudo mount /dev/sdf1’ and it mounted just fine.
Installing Eucalyptus
The install process this time around was much the same as before, except this time without the firewall it all ‘just worked’. Seeing the fnords part 1 took me through the basic install.
I got the feeling from later steps that the cloud controller needs a static IP so I switched the cluster controller from DHCP to a static IP and rebooted.
The discover nodes process (‘sudo euca_conf –no-rsync –discover-nodes’) for euca_conf also required that I’d setup ssh keys on the Node, step 6 in the NodeInstall doc has the instruction. Typo note – if you spell ‘eucalyptus’ wrong you’ll go round in circles trying to figure out why the password won’t work!
Sometimes I couldn’t get ‘euca-describe-availability-zones verbose’ to work, it’d just report ‘No route to host’. It seems that a reboot of the CC and Node are required, plus a minute or so of patience after boot for Apache to sort itself out, before this problems just goes away.
Using the Ubuntu Store
Having installed the CC and registered a Node, next I ran the web interface via ‘https://10.0.0.4:8443’. Note ‘https’. If you visit the website too soon after a reboot (i.e. <1 minute) then the webapp won’t respond or maybe it won’t recognise the admin user. Having logged in, the first login forces a password change.
Next check the ‘Configuration’ tab and verify the IP addresses. For reasons beyond my understanding our switch rebooted during my first attempt to setup the cluster and it switched from the ‘192.168.x.x’ address range to ’10.x.x.x’ – this royally barfed my configuration. I chose to re-install the CC from scratch (I was plagued by ‘no route to host’ problems no matter how much tweaking I tried).
Next visit the ‘Store’ tab and download an image, I’m using ‘Ubuntu 9.10 Karmic Koala (i386)’. Today this works – I’ve spent 2.5 days building and re-building the cluster to get it to this point. Often the Store would download an image and then report ‘no route to host’. This process is pretty darned frustrating and seems to lack useful error messages.
But ultimately – no cigar
Rather frustratingly I can’t get my Node to run an image. I can see that the Node exists though ‘euca-describe-availability-zones verbose’ shows that a Node exists but doesn’t list its IP address which is odd, the online docs say it should be shown.
If I run an image then it enters the ‘pending’ state and then the ‘terminating’ state. Digging around in Google shows that other people currently have the same problem, it might be related to the lack of Hypervisor instructions on my Node machine (though they’re not supposed to be required…). Possibly also the current build in unstable, there’s a lot of bug-fixing going on.
Debug notes
Eucalyptus has a trouble-shooting guide, this blog series is very useful.
Conclusion
Eucalyptus should give you an EC2-like cloud that runs on your own machines, using an EC2-compatible API so you could move to the cloud when you want to scale up or are less concerned about the privacy of your data. Currently I can’t get it to work but others do have it working – it seems to depend upon your hardware. It also lacks clear error messages so debugging is hard – I resorted to clean installs on three occasions.
Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.
Read my book
AI Consulting
Co-organiser
Trending Now
1Leadership discussion session at PyDataLondon 2024Data science, pydata, RebelAI2What I’ve been up to since 2022pydata, Python3Upcoming discussion calls for Team Structure and Buidling a Backlog for data science leadsData science, pydata, Python4My first commit to PandasPython5Skinny Pandas Riding on a Rocket at PyDataGlobal 2020Data science, pydata, PythonTags
Aim Api Artificial Intelligence Blog Brighton Conferences Cookbook Demo Ebook Email Emily Face Detection Few Days Google High Performance Iphone Kyran Laptop Linux London Lt Map Natural Language Processing Nbsp Nltk Numpy Optical Character Recognition Pycon Python Python Mailing Python Tutorial Robots Running Santiago Seb Skiff Slides Startups Tweet Tweets Twitter Ubuntu Ups Vimeo Wikipedia