About

Ian Ozsvald picture

This is Ian Ozsvald's blog, I'm an entrepreneurial geek, a Data Science/ML/NLP/AI consultant, founder of the Annotate.io social media mining API, author of O'Reilly's High Performance Python book, co-organiser of PyDataLondon, co-founder of the SocialTies App, author of the A.I.Cookbook, author of The Screencasting Handbook, a Pythonista, co-founder of ShowMeDo and FivePoundApps and also a Londoner. Here's a little more about me.

High Performance Python book with O'Reilly View Ian Ozsvald's profile on LinkedIn Visit Ian Ozsvald's data science consulting business Protecting your bits. Open Rights Group

13 December 2009 - 17:05Text to Speech – Festival (cross platform) and MacSpeechX (Python on Mac)

I wanted to play with text to speech, I’ve been looking for a cross-platform open-source solution that sounds reasonable.  I’m really impressed with the festival project, the web demo lets you enter your own text.

Update – I’m including this post in my plans for an Artificial Intelligence Handbook.

Festival is cross-platform but compiling it on a Mac takes a touch of effort (it looks like it is easier on Linux and Win).

This article shows you how to use it and how to web-enable it with some php.  For the simplest demo I used ‘bin/text2wave input.txt -o output.wav’ with input.txt containing a sentence.

To get started, get the latest code.  I have v1.96beta.  You may also want the official festlang-talk list and possibly this more complete archive.

Compiling speech_tools-1.2.96-beta.tar.gz

It ought to have been as simple as ‘make clean; make’ but there’s a few changes to make first.  First we need this fix or we get a compile error in macosxaudio in kAudioUnitProperty_SetInputCallback:

If you add
#include <AudioUnit/AUNTComponent.h>
after the include block on lines 45-48 in audio/macosxaudio.cc the
problem should be solved.

By the way, remember to change the byte order if you have an intel
mac, i.e. on line 131:
     waveformat.mFormatFlags = kLinearPCMFormatFlagIsSignedInteger
		 | kLinearPCMFormatFlagIsPacked;
	// For Intel	| kLinearPCMFormatFlagIsPacked;
     // For PowerPC    | kLinearPCMFormatFlagIsPacked |
kLinearPCMFormatFlagIsBigEndian;

The following was a trickier error to solve:

g++ -c -fno-implicit-templates -O3 -Wall -I../include sigpr_frame.cc
sigpr_frame.cc: In function
‘void lpc2cep(const EST_FVector&, EST_FVector&)’:
sigpr_frame.cc:318: error: ‘__isnan’ was not declared in this scope
make[1]: *** [sigpr_frame.o] Error 1
make: *** [sigpr] Error 2

The fix was known but the relevant archive was missing, some googling for ‘__isnan mac‘ results in this cached 2006 page:

--- ../test/speech_tools/include/EST_math.h     2006-08-03  
08:49:35.000000000 -0500
+++ include/EST_math.h  2006-08-17 17:53:33.000000000 -0500
@@ -43,7 +43,7 @@
#if defined(__APPLE__)
/* Not sure why I need this here, but I do */
-extern "C" int isnan(double);
+extern "C" int isnan(float);
#endif
/* this isn't included from c, but just to be safe... */
@@ -101,7 +101,6 @@
/* Apple OSX */
#if defined(__APPLE__)
#define isnanf(X) isnan(X)
-#define isnan(X) __isnan(X)
#endif
/* FreeBSD *and other 4.4 based systems require anything, isnanf is  
defined */

Compiling festival-1.96-beta.tar.gz

Once speech-tools is compiled, getting ‘festival-1.96-beta.tar.gz’ compiled is as easy as ‘make clean;make’.

Python’s MacSpeechX

I also had a play with the macspeechx module which ties Python to the Mac’s voice-synthesiser.  See list_voice_name() in macspeechX.py for an example of how it all works.

It works to power the speech synthesiser but it doesn’t appear to let you record the speech to a file (unlike festival above).

Update – Mike Driscoll has a post about pyTTS which hooks into Microsoft’s SAPI on Windows and pyTTSX which is cross-platform, along with some speech recognition links.


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

1 Comment | Tags: ArtificialIntelligence, Python

12 December 2009 - 23:13ConceptNetDaily Twitter Bot

I’ve just launched my second Twitter bot – @ConceptNetDaily takes a random concept from the A.I. site ConceptNet and posts it to Twitter with a link back to the site. A tweet looks like:

“When humans own horses, humans groom and ride horses.” http://tinyurl.com/ydvf7vg

The TinyURL expands out to an address like: http://openmind.media.mit.edu/en/assertion/143313/

The aim of the site is to build a large repository of common-sense knowledge, exactly the kind of knowledge that humans take for granted and never write down as statements for a computer to understand.  Currently it tracks over 1,026,553 statements.

Using the link you can vote on the concept.  Vote up if the concept is solid (i.e. something a human would say is ‘right’) or down if it is wrong, silly or erroneous.  The site supports OpenID which makes starting a touch easier.

My goal with this bot is to remind people every day to vote on the concepts and to add new knowledge.  If a concept has many votes then we can have faith that it is ‘common-sense knowledge’.  If a concept is voted down enough then we can have faith that it is ‘unhelpful or wrong’.

You’ll find a searchable list of Concepts and some random examples on the English homepage.  For good examples see all the information that ConceptNet knows about humans, chess and girls.

Details:

I’ve written the bot in Python using PyYAML, Python-tinyurl and Python-twitter.  It runs every day via a cron job.  It works by guessing a random id for a raw_assertion and checking to see if a concept lives at the URL.  See this XML example for id 143313, I extract the .yaml version via PyYAML but the .xml version renders nicely in your browser if you want a peek.

ConceptNet’s web API is well documented.  ConceptNet itself is written in Python using Django but I’m not using the downloaded version here, just the web API.

My first Twitter bot – @BrightonJobDoom:

Just in case you live here in Brighton you might want to track @BrightonJobDoom to see how healthy (or…not) the job market is in the UK during this rather wobbly recession :-)  I wrote this bot for our £5 App’s 5k coding competition.


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

No Comments | Tags: ArtificialIntelligence, Life, £5 App Meet

11 December 2009 - 16:35Eucalyptus Clustering – follow-up

A month back I tried to build an Ubuntu-based Eucalyptus cloud/cluster environment for a client for a parallel processing research project.  The project was thwarted by an overly aggressive corporate firewall and my lack of understanding of low-level network config-fu.

I’ve revisited the project using the same machines but with an external public internet connection (no firewall – yay!).

Grub2

On the node machine I still needed to dual-boot to Windows.  Unfortunately whilst reboots to Linux are fine, if Windows is booted it ‘does something’ to the MBR and the machine is unbootable.  I delved into the boot-loader and had to learn some Grub2-fu.

Grub2 was introduced in Ubuntu 9.10, it replaces Grub which in turn replaced boot managers like lilo.  The wiki page is pretty good for recovering a boot-loader using an Ubuntu LiveCD but it didn’t work quite to plan.

The step for ‘sudo chroot /mnt’ fails as bash or sh can’t be run from within /mnt (which at this point is looking at the originally installed hd).  There is something odd going on with the LiveCD, much googling didn’t seem to reveal the answer.

To run grub-install on the hd, rather than via the CD (because chroot fails) I used ‘sudo grub-install –root-directory=/mnt /dev/sda’, it reports that ‘(hd0) /dev/sda’ is installed.

Sidenote – on later attempts somehow a reference to (fd0) got involved and this broke the boot process.  I edited /mnt/boot/grub/device.map to remove the fd0 reference, leaving the hd0 reference.  I ran grub-install again and all was fine.  Now the machine can boot again.

Mounting a USB memory stick

Whilst a 8Gb memory stick was recognised, it didn’t get mounted.  I had to edit /etc/fstab and add:

/dev/sdf1 /mnt/stick auto umask=0,user,iocharset=iso8859-1,sync,codepage=850,noauto,exec,users 0 0

After this I used ‘sudo mkdir /mnt/stick’, ‘sudo mount /dev/sdf1′ and it mounted just fine.

Installing Eucalyptus

The install process this time around was much the same as before, except this time without the firewall it all ‘just worked’.  Seeing the fnords part 1 took me through the basic install.

I got the feeling from later steps that the cloud controller needs a static IP so I switched the cluster controller from DHCP to a static IP and rebooted.

The discover nodes process (‘sudo euca_conf –no-rsync –discover-nodes’) for euca_conf also required that I’d setup ssh keys on the Node, step 6 in the NodeInstall doc has the instruction.  Typo note – if you spell ‘eucalyptus’ wrong you’ll go round in circles trying to figure out why the password won’t work!

Sometimes I couldn’t get ‘euca-describe-availability-zones verbose’ to work, it’d just report ‘No route to host’.  It seems that a reboot of the CC and Node are required, plus a minute or so of patience after boot for Apache to sort itself out, before this problems just goes away.

Using the Ubuntu Store

Having installed the CC and registered a Node, next I ran the web interface via ‘https://10.0.0.4:8443′.  Note ‘https’.  If you visit the website too soon after a reboot (i.e. <1 minute) then the webapp won’t respond or maybe it won’t recognise the admin user.  Having logged in, the first login forces a password change.

Next check the ‘Configuration’ tab and verify the IP addresses.  For reasons beyond my understanding our switch rebooted during my first attempt to setup the cluster and it switched from the ’192.168.x.x’ address range to ’10.x.x.x’ – this royally barfed my configuration.  I chose to re-install the CC from scratch (I was plagued by ‘no route to host’ problems no matter how much tweaking I tried).

Next visit the ‘Store’ tab and download an image, I’m using ‘Ubuntu 9.10 Karmic Koala (i386)’.  Today this works – I’ve spent 2.5 days building and re-building the cluster to get it to this point.  Often the Store would download an image and then report ‘no route to host’.  This process is pretty darned frustrating and seems to lack useful error messages.

But ultimately – no cigar

Rather frustratingly I can’t get my Node to run an image.  I can see that the Node exists though ‘euca-describe-availability-zones verbose’ shows that a Node exists but doesn’t list its IP address which is odd, the online docs say it should be shown.

If I run an image then it enters the ‘pending’ state and then the ‘terminating’ state.  Digging around in Google shows that other people currently have the same problem, it might be related to the lack of Hypervisor instructions on my Node machine (though they’re not supposed to be required…).  Possibly also the current build in unstable, there’s a lot of bug-fixing going on.

Debug notes

Eucalyptus has a trouble-shooting guide, this blog series is very useful.

Conclusion

Eucalyptus should give you an EC2-like cloud that runs on your own machines, using an EC2-compatible API so you could move to the cloud when you want to scale up or are less concerned about the privacy of your data.  Currently I can’t get it to work but others do have it working – it seems to depend upon your hardware.  It also lacks clear error messages so debugging is hard – I resorted to clean installs on three occasions.


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

No Comments | Tags: Life

4 December 2009 - 14:12Sharing the Mac OS X clipboard with X11 apps

I’m using WingIDE on my MacBook and I couldn’t get copy/paste to work between WingIDE (running in X11) and native apps.  This meant copying URLs and code snippets was impossible…hugely frustrating!

There is a simple fix, as outlined here just run the Property List Editor, open the specifed .plist, tick the 5 checkboxes, save, restart all of X11 and then the clipboard is shared between X11 and native apps.  Phew.


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

No Comments | Tags: Life

4 December 2009 - 13:29£5 App Music-Themed Xmas Special

On Wednesday night we ran our music-themed £5 App Xmas Special (fivepoundapp.com).

It was fab!  John and I had a fab time organising things and watching the night run so down-to-earthly – it seems that many others did too.  I particularly like:

“I bloomin’ love £5 app! The event that’s happy to be itself, and is more rewarding for all as a result. Here with @ribot & @lastminute teams” – ribotminimus

“Home from #fivepoundapp, letting the awesomeness sink in.” – j4mie

Get to the end of the very last video and you’ll hear a special £5 App rendition of Jingle Bells.

Note: I want photos!  Email me links to flickr’d images please.  I also want your blog write-ups, mail them to me or comment down below.

Particular thanks for our sponsors Alan Newman (Sensible Development) and Paul Silver (PaulSilver.co.uk) along with John (psychicorigami) and my ProCasts for putting up cash to fund a few hours of free beer.  Also super-huge thanks to the Ribots for supplying piles of mince pies (yummy!) and John for baking a batch of crunchy cookies.

The event was organised through Philip and Declan of PlayGroup, they use Hector’s House for arts and science gigs (thanks BuildBrighton for the connection!).  Cheers chaps, it was exactly the space we needed!

“Seb’s Slightly Failed Music Career”

Seb spoke on the highs and lows of forming a band, showed previously-unseen footage and generally gave the lowdown on how it all works. Rick-rolling was included.  Seb has his own write-up.

Sadly Seb’s hard-drive died after the talk taking all his transcoded footage but on the flip-side Seb inspired Simon to share footage from his old cover band.

Here’s the 60 minute video of Seb’s talk:

£5 App #20 “Seb’s Slightly Failed Music Career” for the 2009 Xmas Special from IanProCastsCoUk on Vimeo.

We were absolutely honoured that Seb and Jenny unveiled their new Xmas song tonight, see it here and share it around:

“Toby Cole – Zero to Theremin in 20 days” (with demo)

Toby Cole shows the ThereThing constructed through BuildBrighton and unveiled at a live gig the previous month.

Paul Silver took a video of the ThereThing in action:

Sadly the ThereThing is slightly out of shot during the video of the talk but you can hear Toby and see the screen just fine (and the ThereThing link shows it in detail).

£5 App #20 “Toby Cole – Zero to Theremin in 20 days” for the 2009 Xmas Special from Ian Ozsvald on Vimeo.

“Jim – Mrmr/LiveAPI guitar-mounted iPhone ableton live interface”

Jim Purbrick showed Mrmr, the LiveAPI guitar mounted iPhone Ableton live interface.  Jim’s also the head of Second Life (UK) and is known for building robots.

£5 App #20 “Jim Purbrick – Mrmr/LiveAPI guitar mounted ableton live interface” for the 2009 Xmas Special from Ian Ozsvald on Vimeo.

“Lastminute.com Lab’s with Bottle-Rock-It” (with an additional proper demo video)

Richard, Sam and Mathias (LastMinute.com Labs) came down from London (thanks guys!) to demo Bottle-Rock-It, a group iPhone musical instrument.

The background talk gives loads of detail, sadly the demo went a bit sideways so we sang Jingle Bells as a loud (and slightly tipsy) group instead.

Check this BBC News story to see Bottle Rock It in action.

£5 App #20 “Lastminute.com’s Bottle-Rock-It” for the 2009 Xmas Special from Ian Ozsvald on Vimeo.

100 Robots (band)

After the talks finished Jim Purbrick and Max went on to play live n’loud as 100 Robots.

2010 and beyond…

If you want to keep in touch with future £5 App events then join the £5 App Google Group – it is very low volume and is mostly there just for the announces.

We’ll probably run some more competitions next year, the 5k competition went very well and John wants to do more around that idea and I want to play with some open-source A.I. kits.  Details to follow.


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

2 Comments | Tags: Life, projectbrightonblogs, sussexdigital, £5 App Meet

2 December 2009 - 15:41A quick look at four chatbots

This is a quick review of four chatbot that are easily found on the web.  I’ll take a look at the granddaddy ELIZA (wikip), A.L.I.C.E. (wikip) which uses AIML, Fake Kirk (with speech synthesis and a face) and O2′s Ask Lucy.

The goal of these chats was to see how each of the bots broke down and to learn about how the different technologies worked.  ELIZA has a small, hard-coded rule set.  A.L.I.C.E. is rule driven but with a big AIML rule set.  Fake Kirk uses statistical training on Star Trek scripts.  I don’t know what O2 use (but it doesn’t feel very sophisticated at all!).

ELIZA

A.L.I.C.E.

Fake Kirk

I’m also adding this 10 min video on Fake Kirk by someone from Pandorabots, it might be the author of Fake Kirk:

O2′s Ask Lucy


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

3 Comments | Tags: Life