About

Ian Ozsvald picture

This is Ian Ozsvald's blog (@IanOzsvald), I'm an entrepreneurial geek, a Data Science/ML/NLP/AI consultant, founder of the Annotate.io social media mining API, author of O'Reilly's High Performance Python book, co-organiser of PyDataLondon, co-founder of the SocialTies App, author of the A.I.Cookbook, author of The Screencasting Handbook, a Pythonista, co-founder of ShowMeDo and FivePoundApps and also a Londoner. Here's a little more about me.

High Performance Python book with O'Reilly View Ian Ozsvald's profile on LinkedIn Visit Ian Ozsvald's data science consulting business Protecting your bits. Open Rights Group

26 October 2011 - 11:29StrongSteam alpha, HackerNewsLondon, Startup-Chile

I’m a little behind with the blogging so here’s the short version. StrongSteam has been under constant dev for 2 months, we’re close to putting up the first AI tools behind a few Python demos (hopefully it’ll be up next week). I’m talking on this at HackerNewsLondon tomorrow night.

We haven’t (quite) finished the demos so it’ll be a slideshow, I’m thinking of running a workshop in a month or so to show what’s possible, talk through the limitations and possibilities and help people got comfy with the API.

I’m also very pleased to say that we were accepted into the StartupChile programme alongside RadicalRobot (my better half). In StrongSteam Kyran and I will get 6 months in Santiago with a $40k budget (for no equity!) to build our API and this opens the door to further travel. We’re also very happy to welcome Balthazar Rouberol (linkedin) to our team, he’ll be joining us remotely as an intern for 6 months.

Our biggest priority now is to get the alpha out there. If you’re curious to see what we’re doing please follow us via @strongsteamapi and join the mailing list on the strongsteam homepage.

We also have two surveys – the first is so you can tell us about your general AI interest, the second focuses on some of the points raised in the first to tell us more about your needs. We’d really appreciate your input here if you have 10 minutes to spare.


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

No Comments | Tags: ArtificialIntelligence, Programming, Python

6 November 2010 - 16:04Building a Social Microprinter

Over the last couple of months I’ve been building up a social microprinter (inspired by Tom Taylor‘s implementation and Matt Webb‘s original idea). Here’s the current version – Arduino+WiShield+CBM231+off-site server (powered partly by BenOSteen’s Python driver):

There’s a second quick video and talk for the £5 App event I ran earlier in the week.

The goal is to build a social microprinter – a printer that’d live in a social environment (currently The Skiff co-working office in Brighton) which would help bring people a little bit closer. Currently it prints tweets (for ‘theskiff’) and shows events, later it’ll show recent Gowalla check-ins and maybe some local news headlines or the weather (but there’s got to be better stuff to show, right?…ideas on a postcard please).

My original intent was to build a device that could be stuck on the wall in a cafe, it would show tweets on a screen (probably under the cafe’s or Brighton’s hashtag) and let non-Internet folk post their own messages back. Doing this nicely would have needed a screen, machine, wall space etc – using a receipt printer seemed like an easy way to prototype the idea.

Jumping forward, here’s an early version – this is a CBM231 connected to my Ubuntu laptop via a USB->RS232 lead (note – this lead is good, the cheap ones on eBay can be bad – see below). Here I’m using BenOSteen’s Python driver to send tweets via serial to the printer.

This device has done the rounds, here it is on display at BuildBrighton’s talk to the British Computer Society:

Here it is in use at Likemind Brighton showing international #likemind tweets as other groups meet around the world on Friday morning (note – unicode converted to ‘?’ as I haven’t figured out if/how to get international characters out of the printer yet!):

It ran during the weekend of Barcamp Brighton and printed out barcampy stuff, I added some notes about local cafes and a job ad for one of the companies:

The goal all along was to build an independent controller (so removing the laptop from the equation). For this I coupled an Arduino with a WiShield 1.0. The WiShield libraries are easy enough to work with, after an hour’s experimentation I got WPA2 working (it takes 25 seconds to negotiate the connection on each attempt), we use WPA2 at home and in The Skiff.

Coupling the Arduino to the printer was easy enough, I have been trying (and so far failing) to get a Max233 chip acting as a voltage level converter so for now I’m using a pre-built RS232 Level Shifter. This converts the Arduino’s 0V/5V TTL to +12V/-12V RS232 levels (powered from the Aruino’s 5V out). To output text I’m using Roo Reynold’s Aduino sketch, this handily includes some control codes to cut the receipt after printing.

Next I wanted live data. At first I simply put a short plain text file on a web site, used the WiShield to fetch it and Roo’s code to print it. Now I’m using a hacked version of Ben’s code to write tweets (including bold and underline control codes) to a text file which is stored online (microprinter.ianozsvald.com), this ready-to-print file is grabbed over the WiShield, printed and then cut. The online file is updated every 2 minutes.

The final tweak was to add a button to the printer. Using the Arduino’s demo button sketch I hooked up a big thumb-sized button. The Arduino’s main loop is looking for a combination of ‘at least 5 seconds have passed since the last print’ and ‘button pressed’, then it’ll kick off the web request for new data. Once this request returns it prints out the text.

I look for the pattern “————–” (14 dashes) to start and end the message, before this we get HTTP headers (from the WiShield) that I didn’t want to print.

Here’s the finished hardware:

This is a WiShield 1.0. The button (shown just out of shot top-left) is connected 3.3V->button, button->Pin 6 AND Ground (via a 15k resistor). For the printer I’m using Pin 8 for tx (blue lead on the RS232 level converter) and Ground, the level converter is powered by the 5V out.

Here’s the connector:

The connector is overly-connected in this image. I think all you actually need is Pin 2 from the RS232 Level Converter to Pin 3 on the 25 pin connector along with Pin 5 (GND) to Pin 7 (GND on 25 pin connector). With yellow wires I’ve shorted Pins 4&5 and 8&20 but I think this is overkill (they’re used for bus control but they’re probably ignored in this configuration).  Here’s a full pinout.

During all the hacking our faithful cat Mia has attempted to assist whenever she could. Here she’s taken ownership of the bag used to transport the early versions:

Along the way I also acquired an Epson TM T88 II receipt printer, it is ‘just another serial printer’ but takes different control codes (and it looks like it might have a smaller character set than the CBM 231). As yet I’ve only tried printing plain ASCII, I’d like to investigate further and build a library that supports this printer too.

Note on buying leads from eBay! be aware that if you buy cheap leads from eBay (e.g. £2 silver/blue leads) then you might end up with a pack of 5 (because if you buy 5 and one breaks, you’ve got 4 more that work, right?), you might have 5 dead-on-arrival leads. You could then report the problem and the nice people could then ship you a replacement set, but then you might discover that you’ve got another 5 DOA leads. You have been warned.

If you’re buying your first microprinter do try to buy a working serial lead with it (it’ll probably be a 9 pin to 25 pin converter lead) – if you get the wrong lead (null modem vs straight serial – I forget which you need!) then you won’t get anything (the bane of my first few week’s of testing). Buy a printer+lead that’s known to work and you won’t go wrong.

Spend the £8 per lead and buy from Amazon if you don’t want to waste hours wondering why your printer is just printing out reams of ‘?’ rubbish:

If you want to build your own then the first best source of info is the microprinter wiki. Roo Reynolds has Arduino drivers (which I hacked a bit for my implementation) that don’t depend on external data sources.

You’ll find my Python server source and Arduino sketch (which assumes you’ve got a WiShield 1.0) here: social_microprinter. Note that the code is horribly hacky, it was written over many short sessions when I could steal an hour or two from other projects.

It could do with being straightened out and commented and a few nice new features would include Gowalla check-in notifications, event RSS reading and weather printing.

Many thanks to my fellow hackers at BuildBrighton for help debugging my early serial problems and to Barney for the lend of his RS232 Shifter (I’ll soon get this Max233 working, promise!).

Here’s the finished, installed unit on the work bench at BuildBrighton in The Skiff (just by the social kitchen space). Once it is a bit more robust it’ll move to the front of the building:


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

2 Comments | Tags: Life, Programming, projectbrightonblogs, Python

18 June 2010 - 15:30Talking on Artificial Intelligence next Tuesday at FlashBrighton

I’ve been invited to speak with John Montgomery next Tuesday at FlashBrighton – 7pm at The Werks for 1.5-2 hours or so of demos. We’ll be covering:

  • Head tracking robot (build your own in a few hours!)
  • Skiff Privacy Invasion – what we can learn from data mining the SkiffCam (the Gov’t can do it – now you can too)
  • Optical Character Recognition web service with an iPhone visual-assistant demo
  • Automatic transcription of OpenPlaques images (because Google can’t read images!)
  • Extracting text from videos to feed Google (because Google can’t read videos!)
  • Face detection proof of concept web service

Which, frankly, is quite a lot to cover in 1.5 hours and a couple of the demos still need some development…but that’s part of the fun, right? The demos are mostly in Python and will be written up on the A.I. Cookbook. The goal is to show non-A.I. programmers that a lot of A.I. is pretty accessible now via good open-source libraries.

Richard has given me a lovely Victorian-researcher inspired write-up, it is worth a proper read:

I have spoken this night with Sir Seb Lee-Delisle, the gentleman who runs the FlashBrighton club, an institution of long standing repute. He expressed great delight with my research into Artificial Intelligence, which he assuryes me he has been following with the greatest assiduity, and kindly invited me to present my findings at his club. I did of course accept, and have spent the remaynder of the day deliberating over how I might present these goode labours. I have settled on involving my £5 app collaborator Mr. John Montgomery, with whom I have been engaged on a number of projects for some little time now. …

[keep reading]

We’ll hope to see you along!


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

No Comments | Tags: ArtificialIntelligence, Programming, Python, sussexdigital

17 May 2010 - 21:06Extracting keyword text from screencasts with OCR

Last week I played with the Optical Character Recognition system tesseract applied to video data. The goal – extract keywords from the video frames so Google has useful text to index.

I chose to work with ShowMeDo‘s screencasts as many show programming in action – there’s great keyword information in these videos that can be exposed for Google to crawl. This builds on my recent OCR for plaques project.

I’ll blog in the future about the full system, this is a quick how-to if you want to try the system yourself.

First – get a video. I downloaded video 10370000.flv from Introducing numpy arrays (part 1 of 11).

Next – extract a frame. Using ffmpeg I extracted a frame at 240 seconds as a JPG:

ffmpeg -i 10370000.flv -y -f image2 -ss 240 -sameq -t 0.001  10370000_240.jpg

Tesseract needs TIF input files (not JPGs) so I used GIMP to convert to TIF.

Finally I applied tesseract to extract text:

tesseract 10370000_30.tif 10370000_30 -l eng

This yields:

than rstupr .
See Also
linspate : Evenly spaced numbers with  careful handling of endpoints.
grid: Arrays of evenly spared numbers  in Nrdxmensmns
grid: Grid—shaped arrays of evenly spaced numbers in  Nwiunensxnns
Examples
>>> np.arange(3)
¤rr¤y([¤. 1.  2])
>>> np4arange(3.B)
array([ B., 1., 2.])
>>>  np.arange(3,7)
array([3, A, S, 6])
>>> np.arange(3,7,?)
·=rr··¤y<[3.  5])
III
Ill

Obviously there’s some garbage in the above but there are also a lot of useful keywords!

To clean up the extraction I’ll be experimenting with:

  • Using the original AVI video rather than the FLV (which contains compression artefacts which reduce the visual quality), the FLV is also watermarked with ShowMeDo’s logo which hurts some images
  • Cleaning the image – perhaps applying some thresholding or highlighting to make the text stand out, possibly the green text is causing a problem in this image
  • Training tesseract to read the terminal fonts commonly found in ShowMeDo videos

I tried four images for this test, in all cases useful text was extracted. I suspect that by rejecting short words (less than four characters) and using words that appear at least twice in the video then I’ll have a clean set of useful keywords.

Update – the blog for the A.I. Cookbook is now active, more A.I. and robot updates will occur there.


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

1 Comment | Tags: ArtificialIntelligence, Life, Programming, Screencasting, ShowMeDo

4 April 2010 - 19:39New book/wiki – a practical artificial intelligence ‘cookbook’

Having almost completed The Screencasting Handbook I’m now thinking about my next project. I’ve been involved in the field of artificial intelligence since my first computer (a Commodore 64 back in the 80s) and I’ve continued to be paid to work in this area since the end of the 90s.

Update – as mentioned below the new project has started – read more at the A.I. Cookbook blog.

My goal now is to write a collaborative book (probably using a wiki) that takes a very practical look at the use of artificial intelligence in web-apps and desktop software. The big goal would be to teach you how to effectively use A.I. techniques in your job and for your own research. Here’s a few of the topics that could be covered:

  • Using open source and commercial tools for face, object and speech recognition
  • Playing with open source and commercial text to speech tools (e.g. the open source festival)
  • Automated control of driving and flight simulators with artificial brains
  • Building chatbot systems using tools like AIML, CHAT-L and natural language parsing kits
  • Using natural language parsing to add some smarts to apps – maybe for reading and identifying interesting people in Twitter and on blogs
  • Building useful demos around techniques like neural networks and evolutionary optimisation
  • Adding brains to real robots with some Arduinos and open source robot kits
  • Teaching myself machine learning and pattern matching (an area I’m weak on) along with useful libraries like Bayesian classification (Python’s reverend is great for this)
  • Parallel computation engines like Amazon’s EC2, libcloud and GPU programming with CUDA and OpenCL
  • Using Python and C++ for prototyping (along with Matlab and some other relevant languages)
  • and a whole bunch of other stuff – your input is very welcome

I’ve noticed that there are an awful lot of open source (and commercial) toolkits but very few practical guides to using them in your own software. What I want to encourage are some fun projects that’ll run for a month or two, here are some ideas:

  • Using optical character recognition engines to augment projects like OpenPlaques.org with free meta data from real-world photos (for a start see my Tesseract OCR post)
  • Collaborating in real-world competitions like the Simulated Car Racing Competition 2010: Demolition Derby (they’re running a simulated project that’s not unlike the DARPA Grand Challenge)
  • Applying face recognition algorithms to flickr photos so we can track who is posting images of us for identity management
  • Creating a Twitter bot that responds to questions and maybe can have a chat (checking the weather should be easy, some memory could be useful – using Twitter as an interface to tools like OCR for plaques might be fun too) – I have one of these in development right now
  • Build a Zork-solving bot (using NLP and tools like ConceptNet) that can play interactive fiction, build maps and try to solve puzzles
  • Using evolutionary optimisation techniques like genetic algorithms on the traveling salesman problem
  • Building Braitenberg-like brains for open source robot kits (like those by Steve at BotBuilder)
  • Crate a QR code and Bar Code reader, tied to a camera

LinkedIn has my history – here’s my work site (please forgive it being a little…simple) Mor Consulting Ltd, I’m the AI Consultant for Qtara.com and I used to be the Senior Programmer for the UK R&D arm of MasaGroup.net/BlueKaizen.com.

I don’t have a definite timeline for the book, I’ll be making that up with you and everyone else once I’ve finished The Screencasting Handbook (end of April).

The Artificial Intelligence Cookbook project has started – the blog is currently active (along with the @aicookbook Twitter account). There is a mailing list to join for occasional updates – email AICookbook@Aweber.com to join.

It will be a commercial project and I will be looking to make it very relevant to however you’re using AI. Sign-up and you’ll get some notifications from me as the project develops.


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

2 Comments | Tags: ArtificialIntelligence, Programming, Python

2 March 2010 - 15:13Science companies around Brighton

Two years back I posted an entry listing the science companies I knew around Brighton who are involved in high-tech software (i.e. not science companies who make physical products). The list has changed a bit with some nice additions so I’ve updated it below.  If you know of one that I’m missing do send me an update. I’m interested because I’m an A.I. researcher for industry by trade.

  • PANalytical at SInC (one of my current employers for interesting A.I. work – I work on CUDA for parallelisation and pattern recognition and optimisation for solution finding, Prof. Paul Fewster is the head of the R&D team)
  • Qtara (a new employer of mine creating a cutting-edge Intelligent Virtual Human)
  • BrandWatch in the BrightonMediaCentre (a social metrics company using natural language processing)
  • SecondLife in the North Laines (this office is a big part of their European presence)
  • Ambiental at SInC (great flood-risk simulations and modelling, I help them with speeding up and improving the science behind their flood models, Justin Butler is the founder)
  • Proneta at SInC (very small company, John Hother sometimes has A.I. related questions)
  • Observatory Sciences at SInC (Philip Taylor is the main chap here, they use EPICS and LabView)
  • Ricardo in Shoreham (a big engineering consultancy)
  • Elektro Magnetix at SInC
  • NeuroRobotics at SInC
  • MindLab at SInC (they do non-invasive brain monitoring)
  • Animazoo in Shoreham (they build motion-capture suits for dancers and actors)
  • BotBuilder in Brighton (a robot focused design and build company)

Another nice addition to Brighton is the BrightonHackerSpace, a collective of like-minded souls who build new electronic devices and pull things apart to understand how they work. This HackerSpace has spawned BotBuilder (above) and I’m looking forward to seeing a few more created.

A little further away up in London I also know of:

  • Smesh who offer a brand monitoring system similar to BrandWatch
  • CognitiveMatch ‘who match customers to products in real time’
  • Maxeler Technologies in London create parallelised solutions, they appear to specialise in finance and oil modeling

And even further out in Cambridge:

  • EmotionAI create realistic emotion-expressing 3D avatars via the Cambridge Science Park

Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

1 Comment | Tags: ArtificialIntelligence, Entrepreneur, Programming

1 December 2008 - 19:00£5 App Xmas Special Listing Details

We’re plotting our 14th £5 App meet.  This, our second Christmas Special, will have a gamesy happy crimbo feel.  Picture (from a few months back) kudos to Josh:

£5 App evening in full swing

Date: Wednesday 10th December, sign-up on Upcoming please.  Location The Werks (Hove nr Palmeira Sq) – note their piggy bank for a second projector at the end of this post please.

Our very own Aleks Krotoski will lead the evening with the launch of the Guardian’s new on-line text adventure SpaceShip!

Following Aleks’ main talk we’ll have a set of shorter 10 minute demos:

  1. Lightsaber mobile phone duelling by Marko (Lastminute.com Labs)
  2. Fighting Mini-sumo robots by Emily
  3. In-development 3D iPhone game + backstory by Dominic Mason
  4. Flash 3D Snow and Xmas games by Seb (PluginMedia)
  5. Eye-controlled Pong by Ben Rubinstein (CogApp)

You’ll meet lots of local developers, freelancers and business founders including people of Farm Brighton, Girl Geeks, Sussex Innovation Centre, Inuda and The Skiff, The Werks, EuroGamer, BrandWatch, ClearLeft and Madgex.

The Ribots are sponsoring us with Festive Alcohol and (fingers crossed) Xmas Cakery.

Related – Seb’s Big Screen Bonanza Flash night is the day before ours, check it out for 200+ seatage Flash-demo crazyness.

Note – The Werks are looking for donations towards a second projector, do the right thing and support ‘em here:

Click here to lend your support to: The Werks and make a donation at www.pledgie.com !


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

1 Comment | Tags: Business Idea, Entrepreneur, Programming, projectbrightonblogs, sussexdigital, £5 App Meet

17 November 2008 - 18:16Making Python math 196* faster with shedskin

Dr. Michael Thomas approached me with an interesting A.I. job to see if we could speed up his neural network code from a 10 year old research platform called PlaNet. Using new Sun boxes they weren’t getting the speed-ups they expected, old libs or other monkey business were suspected.

As a first investigation I took Neil Schemenauer’s bpnn.py (a 200 line back-prop artificial neural network library with doc and comparison). The intention was to see how much faster the code might run using psyco and shedskin.

The results were really quite surprising, notes and src follow.

Addition – Leonardo Maffi has written a companion piece showing that his ShedSkin output is 1.5 to 7* slower than hand-coded C.  He also shows solutions using the D language and runtimes for Python 2.6 (I use Python 2.5 below).  He notes:

“I have translated the Python code to D (using my D libraries) in just few minutes, something like 15-20 minutes, and the translation was mostly painless and sometimes almost mechanical. I have translated the D code to C in many hours. Translating Python => C may require something like 20-30 times the time you need to translate Python => D + my libs. And this despite I have used a rigorous enough method to perform the translation, and despite at the end I am not sure the C code is bug-free. This is an enormous difference.”

End addition.

Addition – Robert Bradshaw has created a Cython version with src, see comments. End addition.

The run-time in minutes for the my harder test case are below.  Note that these are averages of 4 runs each:

  1. Vanilla Python 153 minutes
  2. Python + Psyco 1.6.0.final.0 57 minutes (2.6* faster)
  3. Shedskin 0.0.29 0.78 minutes [47 seconds] (196* faster)

The test machines uses Python 2.5.2 on Ubuntu 8.04. The box is an Intel Core Duo 2.4GHz running a single process.

The ‘hard’ problem trains the ANN using 508 patterns with 57 input neurons, 50 hidden and 62 output neurons over 1000 iterations. If you know ANNs then the configuration (0.1 learning rate, 0 momentum) might seem unusual, be assured that this is correct for my researcher’s problem.

There is a shorter version of this problem using just 2 patterns, this is useful if you want to replicate these results but don’t want to wait 3 hours on your first run.

My run times for the shorter problem are (again averaged using 4 runs):

  1. Vanilla Python 42 seconds
  2. Python + Psyco 14 seconds
  3. Shedskin 0.2 seconds (210* faster)

Shedskin has an issue with numerical stability – it seems that internally some truncation occurs with floating point math. Whilst the results for vanilla Python and Python+Psyco were identical, the results with Shedskin were similar but with fractional divergences in each result.

Whilst these divergences caused some very different results in the final weights for the ANN, my researcher confirms that all the results look equivalent.

Mark Dufour (Shedskin’s author) confirms that Python’s C double is used the same in Shedskin but notes that rounding (or a bug) may be the culprit. Shedskin is a young project, Mark will welcome extra eyes if you want to look into this.

Running the code with Shedskin was fairly easy. On Ubuntu I had to install libgc-dev and libpcre3-dev (detailed in the Shedskin docs) and g++, afterwards shedskin was ready. From download to first run was 15 minutes.

On my first attempt to compile bpnn.py with Shedskin I received an error as the ‘raise’ keyword isn’t yet supported. I replaced the ‘raise’ calls with ‘assert False’ for sanity, afterwards compilation was fine.

Edit – Mark notes that the basic form of ‘raise’ is supported but the version used in bpnn.py isn’t yet supported.  Something like ‘raise ValueError(‘some msg’)’ works fine.

Mark notes that Shedskin currently works well up to 500 lines (maybe up to 1000), since bpnn.py is only 200 lines compilation is quick.

Note that if you can’t use Psyco because you aren’t on x86, Shedskin might be useful to you since it’ll work anywhere that Python and g++ compile.

Running this yourself

If you want to recreate my results, download bpnn_shedskin_src_20081117.zip. You’ll see bpnn_shedskin.py, this is the main code. bpnn_shedskin.py includes either ‘examples_short.py’ or ‘examples_full.py’, short is the easier 2 pattern problem and full has 508 patterns.

Note that these patterns are stored as lists of tuples (Shedskin doesn’t support the csv module so I hardcoded the input patterns to speed development), the full version is over 500 lines of Python and this slows Shedskin’s compilation somewhat.

By default the imports for Psyco are commented out and the short problem is configured. At the command line you’ll get an output like this:

python bpnn_shedskin.py
Using 2 examples
ANN uses 57 input, 50 hidden, 62 output, 1000 iterations, 0.100000 learning rate, 0.000000 momentum
error 65.454309      2008-11-17 15:22:58.318593
error 45.176110      2008-11-17 15:22:59.060787
error 44.616933      2008-11-17 15:23:00.246280
error 44.026883      2008-11-17 15:23:01.743821
error 44.049276      2008-11-17 15:23:02.815876
error 44.905183      2008-11-17 15:23:03.860352
error 44.674506      2008-11-17 15:23:05.270307
error 43.365627      2008-11-17 15:23:06.757126
error 43.299160      2008-11-17 15:23:08.244466
error 42.540076      2008-11-17 15:23:09.732035
Elapsed: 0:00:41.472192

If you uncomment the two Psyco lines your code will run about 2.6* faster.

Using Shedskin

To use shedskin, first run the Python through shedskin and then ‘make’ the result. The compiled binary will run much faster than the vanilla Python code, the result below shows the short problem taking 0.19 seconds compared to 41 seconds above.

shedskin bpnn_shedskin.py
*** SHED SKIN Python-to-C++ Compiler 0.0.29 ***
Copyright 2005-2008 Mark Dufour; License GNU GPL version 3 (See LICENSE)
[iterative type analysis..]
***
iterations: 3 templates: 519
[generating c++ code..]
*WARNING* bpnn_shedskin.py:178: function (class NN, 'weights') not called!
*WARNING* bpnn_shedskin.py:156: function (class NN, 'test') not called!

make
g++  -O2 -pipe -Wno-deprecated  -I. -I/usr/lib/shedskin/lib /usr/lib/shedskin/lib/string.cpp /usr/lib/shedskin/lib/random.cpp /usr/lib/shedskin/lib/datetime.cpp examples_short.cpp bpnn_shedskin.cpp /usr/lib/shedskin/lib/builtin.cpp /usr/lib/shedskin/lib/time.cpp /usr/lib/shedskin/lib/math.cpp -lgc  -o bpnn_shedskin

./bpnn_shedskin
Using 2 examples
ANN uses 57 input, 50 hidden, 62 output, 1000 iterations, 0.100000 learning rate, 0.000000 momentum
error 65.454309      2008-11-17 16:11:08.452087
error 44.970416      2008-11-17 16:11:08.476869
error 46.444249      2008-11-17 16:11:08.506324
error 44.209054      2008-11-17 16:11:08.519375
error 44.058518      2008-11-17 16:11:08.532430
error 45.655892      2008-11-17 16:11:08.545741
error 44.518816      2008-11-17 16:11:08.558520
error 43.643572      2008-11-17 16:11:08.571705
error 44.800429      2008-11-17 16:11:08.584241
error 43.710905      2008-11-17 16:11:08.597465
Elapsed: 0:00:00.198747

Why is the math different?

An open question remains as to why the evolution of the floating point arithmetic is different between Python and Shedskin. If anyone is interested in delving in to this, I’d be very interested in hearing from you.

Extension modules

Mark notes that the extension module support is perhaps a more useful way to use Shedskin for this sort of problem.

A single module can be compiled (e.g. ‘shedskin -e module.py’) and with Python you just import it (e.g. ‘import module’) and use it…with a big speed-up.

This ties the code to your installed libs – not so great for easy distribution but great for lone researchers needing a speed boost.

Shedskin 0.1 in the works

Mark’s plan is to get 0.1 released over the coming months. One aim is to get the extension module to a similar level of functionality as SWIG and improve the core library support so that Shedskin comes with (some more) Batteries Included.

Mark is open to receiving code (up to 1000 lines) that doesn’t compile.  The project would always happily accept new contributors.

See the Shedskin homepage, blog and group.


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

4 Comments | Tags: ArtificialIntelligence, Programming, Python

18 February 2008 - 13:23First Brighton Python Meet – Weds 20th

John and I are holding our first Brighton Python meet this Wednesday 20th at The Hampton Arms. Paul Silver’s The Farm is running on the same night – we’ll be sitting on a nearby table.

We’ll have a copy of Learning Python on the table, I’ll have my laptop with ShowMeDo‘s TurboGears code and John should have his laptop with the Django-based FivePoundApp.com code.

We can talk about A.I. and C-integration stuff (IPython, scipy, matplotlib, Numpy, ctypes) too, along with IDEs, resources and anything else you need to know. You can be experienced or ‘just interested’ – all are very welcome.

No Comments | Tags: Programming, ShowMeDo

11 October 2007 - 17:04FivePoundApp *Day* during Digital Festival

Our Five Pound App Day [Upcoming] is listed on the Digital Festival site now, running on Saturday 10th November from 10am-6pm. Please sign-up on Upcoming so we can plan our numbers.

The theme is ‘moving start-ups a step forward’, we’ll have four sessions during the day on:

  1. The Perils of Bootstrapping (by me + other founders)
  2. Developing your £5 app (John Montgomery and others)
  3. Paul Silver discussing SEO
  4. A guide to successful Copy-writing (Ellen)

Each session lasts about 1.5 hours, there will be a short talk followed by an interactive session. Preferably several people will have flagged the issues they’d like to discuss, e.g. someone’s site copy-writing which needs improving or they’d like to learn about improving their search ranking results.

The first talk will focus on the ups and downs of boot-strapping (drawing on examples from ShowMeDo and others) with thoughts on why it may (or may not) be for you.

John will lead the second talking about the how of developing an application – looking at various technology areas and pointing out things which will save a new boot-strapped effort a lot of wasted time.

The third and fourth talks will focus on existing websites and how+why they can be improved.

We’ll be looking for volunteers to put up their site/business for use in the discussions, along (obviously) with questions during the sessions.

The day will be ad-hoc (i.e. you can come and go), you’ll need to provide your own drinks+food. We will provide office-space for the talks and work and wi-fi.

The event is kindly sponsored by Alan Newman, founder of Sensible Development.

No Comments | Tags: Entrepreneur, Programming, ShowMeDo, sussexdigital, £5 App Meet