17 November 2008 - 18:16Making Python math 196* faster with shedskin
Dr. Michael Thomas approached me with an interesting A.I. job to see if we could speed up his neural network code from a 10 year old research platform called PlaNet. Using new Sun boxes they weren’t getting the speed-ups they expected, old libs or other monkey business were suspected.
As a first investigation I took Neil Schemenauer’s bpnn.py (a 200 line back-prop artificial neural network library with doc and comparison). The intention was to see how much faster the code might run using psyco and shedskin.
The results were really quite surprising, notes and src follow.
Addition – Leonardo Maffi has written a companion piece showing that his ShedSkin output is 1.5 to 7* slower than hand-coded C. He also shows solutions using the D language and runtimes for Python 2.6 (I use Python 2.5 below). He notes:
“I have translated the Python code to D (using my D libraries) in just few minutes, something like 15-20 minutes, and the translation was mostly painless and sometimes almost mechanical. I have translated the D code to C in many hours. Translating Python => C may require something like 20-30 times the time you need to translate Python => D + my libs. And this despite I have used a rigorous enough method to perform the translation, and despite at the end I am not sure the C code is bug-free. This is an enormous difference.”
Addition – Robert Bradshaw has created a Cython version with src, see comments. End addition.
The run-time in minutes for the my harder test case are below. Note that these are averages of 4 runs each:
- Vanilla Python 153 minutes
- Python + Psyco 1.6.0.final.0 57 minutes (2.6* faster)
- Shedskin 0.0.29 0.78 minutes [47 seconds] (196* faster)
The test machines uses Python 2.5.2 on Ubuntu 8.04. The box is an Intel Core Duo 2.4GHz running a single process.
The ‘hard’ problem trains the ANN using 508 patterns with 57 input neurons, 50 hidden and 62 output neurons over 1000 iterations. If you know ANNs then the configuration (0.1 learning rate, 0 momentum) might seem unusual, be assured that this is correct for my researcher’s problem.
There is a shorter version of this problem using just 2 patterns, this is useful if you want to replicate these results but don’t want to wait 3 hours on your first run.
My run times for the shorter problem are (again averaged using 4 runs):
- Vanilla Python 42 seconds
- Python + Psyco 14 seconds
- Shedskin 0.2 seconds (210* faster)
Shedskin has an issue with numerical stability – it seems that internally some truncation occurs with floating point math. Whilst the results for vanilla Python and Python+Psyco were identical, the results with Shedskin were similar but with fractional divergences in each result.
Whilst these divergences caused some very different results in the final weights for the ANN, my researcher confirms that all the results look equivalent.
Mark Dufour (Shedskin’s author) confirms that Python’s C double is used the same in Shedskin but notes that rounding (or a bug) may be the culprit. Shedskin is a young project, Mark will welcome extra eyes if you want to look into this.
Running the code with Shedskin was fairly easy. On Ubuntu I had to install libgc-dev and libpcre3-dev (detailed in the Shedskin docs) and g++, afterwards shedskin was ready. From download to first run was 15 minutes.
On my first attempt to compile bpnn.py with Shedskin I received an error as the ‘raise’ keyword isn’t yet supported. I replaced the ‘raise’ calls with ‘assert False’ for sanity, afterwards compilation was fine.
Edit – Mark notes that the basic form of ‘raise’ is supported but the version used in bpnn.py isn’t yet supported. Something like ‘raise ValueError(‘some msg’)’ works fine.
Mark notes that Shedskin currently works well up to 500 lines (maybe up to 1000), since bpnn.py is only 200 lines compilation is quick.
Note that if you can’t use Psyco because you aren’t on x86, Shedskin might be useful to you since it’ll work anywhere that Python and g++ compile.
Running this yourself
If you want to recreate my results, download bpnn_shedskin_src_20081117.zip. You’ll see bpnn_shedskin.py, this is the main code. bpnn_shedskin.py includes either ‘examples_short.py’ or ‘examples_full.py’, short is the easier 2 pattern problem and full has 508 patterns.
Note that these patterns are stored as lists of tuples (Shedskin doesn’t support the csv module so I hardcoded the input patterns to speed development), the full version is over 500 lines of Python and this slows Shedskin’s compilation somewhat.
By default the imports for Psyco are commented out and the short problem is configured. At the command line you’ll get an output like this:
python bpnn_shedskin.py Using 2 examples ANN uses 57 input, 50 hidden, 62 output, 1000 iterations, 0.100000 learning rate, 0.000000 momentum error 65.454309 2008-11-17 15:22:58.318593 error 45.176110 2008-11-17 15:22:59.060787 error 44.616933 2008-11-17 15:23:00.246280 error 44.026883 2008-11-17 15:23:01.743821 error 44.049276 2008-11-17 15:23:02.815876 error 44.905183 2008-11-17 15:23:03.860352 error 44.674506 2008-11-17 15:23:05.270307 error 43.365627 2008-11-17 15:23:06.757126 error 43.299160 2008-11-17 15:23:08.244466 error 42.540076 2008-11-17 15:23:09.732035 Elapsed: 0:00:41.472192
If you uncomment the two Psyco lines your code will run about 2.6* faster.
To use shedskin, first run the Python through shedskin and then ‘make’ the result. The compiled binary will run much faster than the vanilla Python code, the result below shows the short problem taking 0.19 seconds compared to 41 seconds above.
shedskin bpnn_shedskin.py *** SHED SKIN Python-to-C++ Compiler 0.0.29 *** Copyright 2005-2008 Mark Dufour; License GNU GPL version 3 (See LICENSE) [iterative type analysis..] *** iterations: 3 templates: 519 [generating c++ code..] *WARNING* bpnn_shedskin.py:178: function (class NN, 'weights') not called! *WARNING* bpnn_shedskin.py:156: function (class NN, 'test') not called! make g++ -O2 -pipe -Wno-deprecated -I. -I/usr/lib/shedskin/lib /usr/lib/shedskin/lib/string.cpp /usr/lib/shedskin/lib/random.cpp /usr/lib/shedskin/lib/datetime.cpp examples_short.cpp bpnn_shedskin.cpp /usr/lib/shedskin/lib/builtin.cpp /usr/lib/shedskin/lib/time.cpp /usr/lib/shedskin/lib/math.cpp -lgc -o bpnn_shedskin ./bpnn_shedskin Using 2 examples ANN uses 57 input, 50 hidden, 62 output, 1000 iterations, 0.100000 learning rate, 0.000000 momentum error 65.454309 2008-11-17 16:11:08.452087 error 44.970416 2008-11-17 16:11:08.476869 error 46.444249 2008-11-17 16:11:08.506324 error 44.209054 2008-11-17 16:11:08.519375 error 44.058518 2008-11-17 16:11:08.532430 error 45.655892 2008-11-17 16:11:08.545741 error 44.518816 2008-11-17 16:11:08.558520 error 43.643572 2008-11-17 16:11:08.571705 error 44.800429 2008-11-17 16:11:08.584241 error 43.710905 2008-11-17 16:11:08.597465 Elapsed: 0:00:00.198747
Why is the math different?
An open question remains as to why the evolution of the floating point arithmetic is different between Python and Shedskin. If anyone is interested in delving in to this, I’d be very interested in hearing from you.
Mark notes that the extension module support is perhaps a more useful way to use Shedskin for this sort of problem.
A single module can be compiled (e.g. ‘shedskin -e module.py’) and with Python you just import it (e.g. ‘import module’) and use it…with a big speed-up.
This ties the code to your installed libs – not so great for easy distribution but great for lone researchers needing a speed boost.
Shedskin 0.1 in the works
Mark’s plan is to get 0.1 released over the coming months. One aim is to get the extension module to a similar level of functionality as SWIG and improve the core library support so that Shedskin comes with (some more) Batteries Included.
Mark is open to receiving code (up to 1000 lines) that doesn’t compile. The project would always happily accept new contributors.
Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.