10 February 2010 - 4:11Fix for ConceptNet error “Settings cannot be imported, because environment variable DJANGO_SETTINGS_MODULE is undefined”

If you’re using ConceptNet and you see:

ImportError: Settings cannot be imported, because environment variable
DJANGO_SETTINGS_MODULE is undefined.

then the fix is simple (I’ve been hacking away at an idea whilst at IUI2010 – thanks Rob for the fix).

To replicate the error run:

from csc.nl import get_nl
en_nl = get_nl('en')
en_nl.is_stopword('the')

The fix is to run:

import csc.conceptnet.models

which sets up Django, the call is_stopword again and all is fine.


Ian produces professional screencasts (ProCasts), writes The Screencasting Handbook, programs Python, researches Artificial Intelligence (Mor Consulting) and is also a sea-side dweller and consumer of fine coffees.

No Comments | Tags: Python

26 January 2010 - 14:01pyCUDA on Windows and Mac for super-fast Python math using CUDA

I’ve just started to play with pyCUDA which lets you run parallel math operations on a CUDA-compliant NVidia graphics card through Python.

CUDA stands for Compute Unified Device Architecture – it is an architecture that lets us program the Graphics Processing Unit (GPU) on a high powered graphics card to do scientific or graphical math calculations rather than the usual texture processing for games.  In essence it is a mini supercomputer that is specialised just for fast math operations – if you can figure out how to use it.

The goal is to off-load the CPU-intensive calculations for two of my clients (a physics company and a flood modelling company) to achieve 10* to 100* speed-ups using commodity graphics cards.

pyCUDA makes it easy to interactively program a CUDA device rather than hitting C++ code with the slow write/compile/debug loop.  Recent MacBooks (mine was bought in January 2009) have NVidia cards with CUDA-compatible devices built-in (mine is a 9400M).  For my desktop computer I have a 9800 GT (costing £100).

It turns out that this is bleeding-edge stuff – getting pyCUDA compiled on my MacBook and Win XP machine took some time (forum posts for Mac and Windows issues) thankfully the group is helpful and the wiki has an installation section for Windows, Mac and Linux and some reasonable documentation.

Right now I’ve got as far as running some of the demo code on my MacBook (showing a 5* speed-up over the CPU) and my desktop (showing a 30* speed-up over the CPU).  I’ll report more as I progress.

Update – pyCUDA works inside IPython too, lovely.

Update – I don’t have OpenGL working for gl_interop.py but as noted here you need “CUDA_ENABLE_GL = True” in siteconf.py and you need PyOpenGL installed.  When rebuilding my MSVC threw a hissy fit, it isn’t essential to my work so I’m skipping this demo.

Update – I’ve submitted a patch and two examples to the wiki (SimpleSpeedTest, Mandelbrot). I get 200* speed-ups on the speed test (using a for loop on a sin() calculation) and 5 to 20* speed-up on Mandelbrots (it seems to scale very well vs numpy with increasing dimensions).

Update – There are lots of interesting papers for CUDA surfacing like this one showing a 3* speed-up for voice recognition tasks (using CPU and GPU together) and yet another way to improve fluid dynamic simulations. This Tom’s 3D article gives a great write-up (starting with the history of audio cards) on where 3D is right now and how NVidia is beating ATI for scientific computing.


Ian produces professional screencasts (ProCasts), writes The Screencasting Handbook, programs Python, researches Artificial Intelligence (Mor Consulting) and is also a sea-side dweller and consumer of fine coffees.

No Comments | Tags: Python

13 December 2009 - 17:05Text to Speech – Festival (cross platform) and MacSpeechX (Python on Mac)

I wanted to play with text to speech, I’ve been looking for a cross-platform open-source solution that sounds reasonable.  I’m really impressed with the festival project, the web demo lets you enter your own text.

Festival is cross-platform but compiling it on a Mac takes a touch of effort (it looks like it is easier on Linux and Win).

This article shows you how to use it and how to web-enable it with some php.  For the simplest demo I used ‘bin/text2wave input.txt -o output.wav’ with input.txt containing a sentence.

To get started, get the latest code.  I have v1.96beta.  You may also want the official festlang-talk list and possibly this more complete archive.

Compiling speech_tools-1.2.96-beta.tar.gz

It ought to have been as simple as ‘make clean; make’ but there’s a few changes to make first.  First we need this fix or we get a compile error in macosxaudio in kAudioUnitProperty_SetInputCallback:

If you add
#include <AudioUnit/AUNTComponent.h>
after the include block on lines 45-48 in audio/macosxaudio.cc the
problem should be solved.

By the way, remember to change the byte order if you have an intel
mac, i.e. on line 131:
     waveformat.mFormatFlags = kLinearPCMFormatFlagIsSignedInteger
		 | kLinearPCMFormatFlagIsPacked;
	// For Intel	| kLinearPCMFormatFlagIsPacked;
     // For PowerPC    | kLinearPCMFormatFlagIsPacked |
kLinearPCMFormatFlagIsBigEndian;

The following was a trickier error to solve:

g++ -c -fno-implicit-templates -O3 -Wall -I../include sigpr_frame.cc
sigpr_frame.cc: In function
‘void lpc2cep(const EST_FVector&, EST_FVector&)’:
sigpr_frame.cc:318: error: ‘__isnan’ was not declared in this scope
make[1]: *** [sigpr_frame.o] Error 1
make: *** [sigpr] Error 2

The fix was known but the relevant archive was missing, some googling for ‘__isnan mac‘ results in this cached 2006 page:

--- ../test/speech_tools/include/EST_math.h     2006-08-03  
08:49:35.000000000 -0500
+++ include/EST_math.h  2006-08-17 17:53:33.000000000 -0500
@@ -43,7 +43,7 @@
#if defined(__APPLE__)
/* Not sure why I need this here, but I do */
-extern "C" int isnan(double);
+extern "C" int isnan(float);
#endif
/* this isn't included from c, but just to be safe... */
@@ -101,7 +101,6 @@
/* Apple OSX */
#if defined(__APPLE__)
#define isnanf(X) isnan(X)
-#define isnan(X) __isnan(X)
#endif
/* FreeBSD *and other 4.4 based systems require anything, isnanf is  
defined */

Compiling festival-1.96-beta.tar.gz

Once speech-tools is compiled, getting ‘festival-1.96-beta.tar.gz’ compiled is as easy as ‘make clean;make’.

Python’s MacSpeechX

I also had a play with the macspeechx module which ties Python to the Mac’s voice-synthesiser.  See list_voice_name() in macspeechX.py for an example of how it all works.

It works to power the speech synthesiser but it doesn’t appear to let you record the speech to a file (unlike festival above).


Ian produces professional screencasts (ProCasts), writes The Screencasting Handbook, programs Python, researches Artificial Intelligence (Mor Consulting) and is also a sea-side dweller and consumer of fine coffees.

No Comments | Tags: ArtificialIntelligence, Python

9 April 2009 - 18:28New Python tutorials at ShowMeDo using Learning Paths

The latest development at ShowMeDo is a new learning system called Learning Paths.  The Paths are ordered collections of videos and series where individual items are pulled together to make a journey (a ‘learning trajectory’) for the learner to achieve one particular goal.

The Path also allow for dependencies so ‘Fully worked Python Projects’ depends upon  ‘Beginning Python Programming’ and that depends upon ‘Setting up Python’.

At present we have an initial set of Paths and more will follow very soon:

The Paths mix our free and Club content, all authors have edit rights so everyone can add the right material to the Paths so they tell exactly the right story.

We’re very keen to see the Paths used, I’ve already blogged about this on the main ShowMeDo Blog.  If you like what we’re trying to achieve, perhaps you could help us to spread the word by blogging or tweeting?


Ian produces professional screencasts (ProCasts), writes The Screencasting Handbook, programs Python, researches Artificial Intelligence (Mor Consulting) and is also a sea-side dweller and consumer of fine coffees.

1 Comment | Tags: Python, ShowMeDo

31 March 2009 - 12:46Adding PIL (Python Imaging Library) to Mac OS X

I continue my newbie MacBook exploits, currently I’m enjoying the fragmented installation process on a Mac…why is it harder to get stuff installed than on both Ubuntu (lovely apt-get!) and Windows?

Installing the Python Imaging Library takes a couple of steps.  There is a 3rd party installer but it assumes you’ve installed their base Python2.5 install…but Py2.5 comes pre-installed on Macs now anyway.

Thankfully there are instructions here for adding a soft-link that lets the installer find the existing Python 2.5.  Next, get the PIL diskimage (via Python Mac) and this time it’ll install happily.

Next I created ~/.bash_profile (not .bash_rc as suggested in the article – it didn’t get picked up) and added the required:

export PYTHONPATH=/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/

and then I started Python, did ‘from PIL import Image’ and all was well.  Woot-te-toot, now on with coding another ShowMeDo Club series (on File I/O for Python Beginners).


Ian produces professional screencasts (ProCasts), writes The Screencasting Handbook, programs Python, researches Artificial Intelligence (Mor Consulting) and is also a sea-side dweller and consumer of fine coffees.

1 Comment | Tags: Python

13 March 2009 - 13:48ShowMeDo server move + Python 3 videos

We’ve spent the last few weeks migrating ShowMeDo to its own server after 3 years operating out of a shared box.  Moving the site was a pain as I’m not a low-level Apache hacker but all in everything seems fine now and we have extra capacity to grow.

Kyran has skinned the blog so it fits with the overall theme.  The new Learning Paths feature is close to being released, this’ll really tie together all the learning resources in the site so visitors can get a threaded path through all the videos.

Kyran has explained some of the move and has configured ShowMeDo’s frontpage to show some of the posts, this is a really nice way to integrate the blog into the main site.  We also have a Hall of Fame now where all authors are ranked by a number of measures.

Two authors have added Python 3 videos, Gasto summarises some of the changes in 3.0 and chyld shows 3.0 in action in 2 videos on lists and del.icio.us.  These and all the other Python videos are here.


Ian produces professional screencasts (ProCasts), writes The Screencasting Handbook, programs Python, researches Artificial Intelligence (Mor Consulting) and is also a sea-side dweller and consumer of fine coffees.

No Comments | Tags: Life, Python, ShowMeDo

26 February 2009 - 18:49Screencast Interviews with Open-Source teachers

Recently I’ve interviewed four experienced open-source screencasters, three from ShowMeDo (all of which are Pythonistas) and Remy, a Brighton local, who teaches jQuery to designers:

Each screencaster uses different techniques and platforms and all have learned to screencast ‘the hard way’, particularly by learning from their mistakes and improving with each new ‘cast.

If you’ve never screencasted but like the idea of a video that demos your preferred tool (whilst you sleep!), do consider playing with the free tools like Jing, ScreenCast-o-matic and ScreenToaster (and perhaps moving to more advanced tools like CamTasia and ScreenFlow).  You can distribute your videos through sites like ShowMeDo, Vimeo and YouTube.


Ian produces professional screencasts (ProCasts), writes The Screencasting Handbook, programs Python, researches Artificial Intelligence (Mor Consulting) and is also a sea-side dweller and consumer of fine coffees.

No Comments | Tags: Python, Screencasting

13 February 2009 - 19:51Brighton Python meetup – Weds 18th Feb, Farm Tavern (near Hove)

We’re having the 3rd in our intermittent series of Brighton Python meetups, next week we join 40+ local geek freelancers of Brighton Farm to help them celebrate their 6th anniversary.

Please come along on Wednesday 18th Feb at the Farm Tavern for Python discussion, celebrating our new mail list and the odd beer or three.  On the table we’ll have the recent Expert Python Programming.

From Brighton station you need to travel part-way into Hove, the busses along Western Road will all take you in the right direction, get off by Palmeira Square and go back a couple of side-streets and you’ll find Farm Road heading North, the Farm Tavern is the second pub up the street.

John and I will be there, along with others on the mailing list.


Ian produces professional screencasts (ProCasts), writes The Screencasting Handbook, programs Python, researches Artificial Intelligence (Mor Consulting) and is also a sea-side dweller and consumer of fine coffees.

No Comments | Tags: Python

8 February 2009 - 17:54Review: Expert Python Programming by Packt (2008)

Before Christmas I was asked if I’d like to review Packt’s new Expert Python Programming (at Packt).

I’ve always recommended Beginning Python – From Novice to Professional to Python first-timers (new 2nd ed) along with Python in a Nutshell for those who want a thorough reference guide.

I sat on the review for a while as at first I figured the book was a bit light on interesting material.  My mistake – upon proper investigation I realised there’s quite a lot in there, I’ll definitely be using the book in the future.

Here’s a summary by Chapter:

  1. Installing Python and iPython, not much here if you’re comfy installing Python
  2. Lower-level code basics including list comprehensions, iterators and decorators
  3. Subclassing and use of ’super’
  4. Naming – a very important topic!  This is something I pick-up on in my ShowMeDo videos with Pep 008, getting new developers to name consistently and sensibly is very important.  Includes a short description of building eggs, deprecating code, pyLint and CloneDigger (a useful-looking tool to find repeated code segments that could be refactored)
  5. Using DistUtils and SetupTools to creates eggs and upload them to the PyPI (CheeseShop of old). I’ve never done this and this guide looks useful
  6. Building a new application using virtualenv.  This focuses on setting up a package to aggregate RSS feeds with feedparser and SQLAlchemy
  7. Distributing the entire app using zc.buildout using a self-contained directory structure
  8. Version control, focuses on the distributed version control tool Mercurial and the continuous integration tool BuildBot
  9. Project Lifecycle – covers Waterfall and Agile techniques, I thought this was a very light chapter.
  10. Documentation – how to document for others to read, writing using reSructuredText
  11. Test Driven Development – a very important chapter for any longer-running project, covers unittest, doctest and nose
  12. Optimisation introduction – profiling speed, memory usage (Guppy-PE) and network usage – a good intro if this is new to you
  13. Optimisation techniques – focuses on using more efficient collections, multi-processing with pyProcessing and caching
  14. Design Patterns – a simple overview of some of the more obvious patterns

Do I recommend the book?  I think if the above topics are new to you then you’d benefit from the book.  My main criticism is that the chapters aren’t very deep, sometimes the topics provide little more than an intro to further on-line research.  That said, if the topics are new then the advice generally seems helpful.

A nit pick is that the editorial control needs improvement, I came across typos and odd grammar.  Maybe I’m spoiled by O’Reilly books where you rarely find typos.

For anyone coming to the Brighton Python usergroup on Feb 18th (google group) I’ll bring along my copy.

Here’s a free chapter and a set of other reviews.


Ian produces professional screencasts (ProCasts), writes The Screencasting Handbook, programs Python, researches Artificial Intelligence (Mor Consulting) and is also a sea-side dweller and consumer of fine coffees.

No Comments | Tags: Python

17 November 2008 - 18:16Making Python math 196* faster with shedskin

Dr. Michael Thomas approached me with an interesting A.I. job to see if we could speed up his neural network code from a 10 year old research platform called PlaNet. Using new Sun boxes they weren’t getting the speed-ups they expected, old libs or other monkey business were suspected.

As a first investigation I took Neil Schemenauer’s bpnn.py (a 200 line back-prop artificial neural network library with doc and comparison). The intention was to see how much faster the code might run using psyco and shedskin.

The results were really quite surprising, notes and src follow.

Addition – Leonardo Maffi has written a companion piece showing that his ShedSkin output is 1.5 to 7* slower than hand-coded C.  He also shows solutions using the D language and runtimes for Python 2.6 (I use Python 2.5 below).  He notes:

“I have translated the Python code to D (using my D libraries) in just few minutes, something like 15-20 minutes, and the translation was mostly painless and sometimes almost mechanical. I have translated the D code to C in many hours. Translating Python => C may require something like 20-30 times the time you need to translate Python => D + my libs. And this despite I have used a rigorous enough method to perform the translation, and despite at the end I am not sure the C code is bug-free. This is an enormous difference.”

End addition.

Addition – Robert Bradshaw has created a Cython version with src, see comments. End addition.

The run-time in minutes for the my harder test case are below.  Note that these are averages of 4 runs each:

  1. Vanilla Python 153 minutes
  2. Python + Psyco 1.6.0.final.0 57 minutes (2.6* faster)
  3. Shedskin 0.0.29 0.78 minutes [47 seconds] (196* faster)

The test machines uses Python 2.5.2 on Ubuntu 8.04. The box is an Intel Core Duo 2.4GHz running a single process.

The ‘hard’ problem trains the ANN using 508 patterns with 57 input neurons, 50 hidden and 62 output neurons over 1000 iterations. If you know ANNs then the configuration (0.1 learning rate, 0 momentum) might seem unusual, be assured that this is correct for my researcher’s problem.

There is a shorter version of this problem using just 2 patterns, this is useful if you want to replicate these results but don’t want to wait 3 hours on your first run.

My run times for the shorter problem are (again averaged using 4 runs):

  1. Vanilla Python 42 seconds
  2. Python + Psyco 14 seconds
  3. Shedskin 0.2 seconds (210* faster)

Shedskin has an issue with numerical stability – it seems that internally some truncation occurs with floating point math. Whilst the results for vanilla Python and Python+Psyco were identical, the results with Shedskin were similar but with fractional divergences in each result.

Whilst these divergences caused some very different results in the final weights for the ANN, my researcher confirms that all the results look equivalent.

Mark Dufour (Shedskin’s author) confirms that Python’s C double is used the same in Shedskin but notes that rounding (or a bug) may be the culprit. Shedskin is a young project, Mark will welcome extra eyes if you want to look into this.

Running the code with Shedskin was fairly easy. On Ubuntu I had to install libgc-dev and libpcre3-dev (detailed in the Shedskin docs) and g++, afterwards shedskin was ready. From download to first run was 15 minutes.

On my first attempt to compile bpnn.py with Shedskin I received an error as the ‘raise’ keyword isn’t yet supported. I replaced the ‘raise’ calls with ‘assert False’ for sanity, afterwards compilation was fine.

Edit – Mark notes that the basic form of ‘raise’ is supported but the version used in bpnn.py isn’t yet supported.  Something like ‘raise ValueError(’some msg’)’ works fine.

Mark notes that Shedskin currently works well up to 500 lines (maybe up to 1000), since bpnn.py is only 200 lines compilation is quick.

Note that if you can’t use Psyco because you aren’t on x86, Shedskin might be useful to you since it’ll work anywhere that Python and g++ compile.

Running this yourself

If you want to recreate my results, download bpnn_shedskin_src_20081117.zip. You’ll see bpnn_shedskin.py, this is the main code. bpnn_shedskin.py includes either ‘examples_short.py’ or ‘examples_full.py’, short is the easier 2 pattern problem and full has 508 patterns.

Note that these patterns are stored as lists of tuples (Shedskin doesn’t support the csv module so I hardcoded the input patterns to speed development), the full version is over 500 lines of Python and this slows Shedskin’s compilation somewhat.

By default the imports for Psyco are commented out and the short problem is configured. At the command line you’ll get an output like this:

python bpnn_shedskin.py
Using 2 examples
ANN uses 57 input, 50 hidden, 62 output, 1000 iterations, 0.100000 learning rate, 0.000000 momentum
error 65.454309      2008-11-17 15:22:58.318593
error 45.176110      2008-11-17 15:22:59.060787
error 44.616933      2008-11-17 15:23:00.246280
error 44.026883      2008-11-17 15:23:01.743821
error 44.049276      2008-11-17 15:23:02.815876
error 44.905183      2008-11-17 15:23:03.860352
error 44.674506      2008-11-17 15:23:05.270307
error 43.365627      2008-11-17 15:23:06.757126
error 43.299160      2008-11-17 15:23:08.244466
error 42.540076      2008-11-17 15:23:09.732035
Elapsed: 0:00:41.472192

If you uncomment the two Psyco lines your code will run about 2.6* faster.

Using Shedskin

To use shedskin, first run the Python through shedskin and then ‘make’ the result. The compiled binary will run much faster than the vanilla Python code, the result below shows the short problem taking 0.19 seconds compared to 41 seconds above.

shedskin bpnn_shedskin.py
*** SHED SKIN Python-to-C++ Compiler 0.0.29 ***
Copyright 2005-2008 Mark Dufour; License GNU GPL version 3 (See LICENSE)
[iterative type analysis..]
***
iterations: 3 templates: 519
[generating c++ code..]
*WARNING* bpnn_shedskin.py:178: function (class NN, 'weights') not called!
*WARNING* bpnn_shedskin.py:156: function (class NN, 'test') not called!

make
g++  -O2 -pipe -Wno-deprecated  -I. -I/usr/lib/shedskin/lib /usr/lib/shedskin/lib/string.cpp /usr/lib/shedskin/lib/random.cpp /usr/lib/shedskin/lib/datetime.cpp examples_short.cpp bpnn_shedskin.cpp /usr/lib/shedskin/lib/builtin.cpp /usr/lib/shedskin/lib/time.cpp /usr/lib/shedskin/lib/math.cpp -lgc  -o bpnn_shedskin

./bpnn_shedskin
Using 2 examples
ANN uses 57 input, 50 hidden, 62 output, 1000 iterations, 0.100000 learning rate, 0.000000 momentum
error 65.454309      2008-11-17 16:11:08.452087
error 44.970416      2008-11-17 16:11:08.476869
error 46.444249      2008-11-17 16:11:08.506324
error 44.209054      2008-11-17 16:11:08.519375
error 44.058518      2008-11-17 16:11:08.532430
error 45.655892      2008-11-17 16:11:08.545741
error 44.518816      2008-11-17 16:11:08.558520
error 43.643572      2008-11-17 16:11:08.571705
error 44.800429      2008-11-17 16:11:08.584241
error 43.710905      2008-11-17 16:11:08.597465
Elapsed: 0:00:00.198747

Why is the math different?

An open question remains as to why the evolution of the floating point arithmetic is different between Python and Shedskin. If anyone is interested in delving in to this, I’d be very interested in hearing from you.

Extension modules

Mark notes that the extension module support is perhaps a more useful way to use Shedskin for this sort of problem.

A single module can be compiled (e.g. ’shedskin -e module.py’) and with Python you just import it (e.g. ‘import module’) and use it…with a big speed-up.

This ties the code to your installed libs – not so great for easy distribution but great for lone researchers needing a speed boost.

Shedskin 0.1 in the works

Mark’s plan is to get 0.1 released over the coming months. One aim is to get the extension module to a similar level of functionality as SWIG and improve the core library support so that Shedskin comes with (some more) Batteries Included.

Mark is open to receiving code (up to 1000 lines) that doesn’t compile.  The project would always happily accept new contributors.

See the Shedskin homepage, blog and group.


Ian produces professional screencasts (ProCasts), writes The Screencasting Handbook, programs Python, researches Artificial Intelligence (Mor Consulting) and is also a sea-side dweller and consumer of fine coffees.

4 Comments | Tags: ArtificialIntelligence, Programming, Python