About

Ian Ozsvald picture

This is Ian Ozsvald's blog (@IanOzsvald), I'm an entrepreneurial geek, a Data Science/ML/NLP/AI consultant, author of O'Reilly's High Performance Python book, co-organiser of PyDataLondon, a Pythonista, co-founder of ShowMeDo and also a Londoner. Here's a little more about me.

High Performance Python book with O'Reilly

View Ian Ozsvald's profile on LinkedIn

ModelInsight Data Science Consultancy London Protecting your bits. Open Rights Group

Archive

13 December 2009 - 17:05Text to Speech – Festival (cross platform) and MacSpeechX (Python on Mac)

I wanted to play with text to speech, I’ve been looking for a cross-platform open-source solution that sounds reasonable.  I’m really impressed with the festival project, the web demo lets you enter your own text.

Update – I’m including this post in my plans for an Artificial Intelligence Handbook.

Festival is cross-platform but compiling it on a Mac takes a touch of effort (it looks like it is easier on Linux and Win).

This article shows you how to use it and how to web-enable it with some php.  For the simplest demo I used ‘bin/text2wave input.txt -o output.wav’ with input.txt containing a sentence.

To get started, get the latest code.  I have v1.96beta.  You may also want the official festlang-talk list and possibly this more complete archive.

Compiling speech_tools-1.2.96-beta.tar.gz

It ought to have been as simple as ‘make clean; make’ but there’s a few changes to make first.  First we need this fix or we get a compile error in macosxaudio in kAudioUnitProperty_SetInputCallback:

If you add
#include <AudioUnit/AUNTComponent.h>
after the include block on lines 45-48 in audio/macosxaudio.cc the
problem should be solved.

By the way, remember to change the byte order if you have an intel
mac, i.e. on line 131:
     waveformat.mFormatFlags = kLinearPCMFormatFlagIsSignedInteger
		 | kLinearPCMFormatFlagIsPacked;
	// For Intel	| kLinearPCMFormatFlagIsPacked;
     // For PowerPC    | kLinearPCMFormatFlagIsPacked |
kLinearPCMFormatFlagIsBigEndian;

The following was a trickier error to solve:

g++ -c -fno-implicit-templates -O3 -Wall -I../include sigpr_frame.cc
sigpr_frame.cc: In function
‘void lpc2cep(const EST_FVector&, EST_FVector&)’:
sigpr_frame.cc:318: error: ‘__isnan’ was not declared in this scope
make[1]: *** [sigpr_frame.o] Error 1
make: *** [sigpr] Error 2

The fix was known but the relevant archive was missing, some googling for ‘__isnan mac‘ results in this cached 2006 page:

--- ../test/speech_tools/include/EST_math.h     2006-08-03  
08:49:35.000000000 -0500
+++ include/EST_math.h  2006-08-17 17:53:33.000000000 -0500
@@ -43,7 +43,7 @@
#if defined(__APPLE__)
/* Not sure why I need this here, but I do */
-extern "C" int isnan(double);
+extern "C" int isnan(float);
#endif
/* this isn't included from c, but just to be safe... */
@@ -101,7 +101,6 @@
/* Apple OSX */
#if defined(__APPLE__)
#define isnanf(X) isnan(X)
-#define isnan(X) __isnan(X)
#endif
/* FreeBSD *and other 4.4 based systems require anything, isnanf is  
defined */

Compiling festival-1.96-beta.tar.gz

Once speech-tools is compiled, getting ‘festival-1.96-beta.tar.gz’ compiled is as easy as ‘make clean;make’.

Python’s MacSpeechX

I also had a play with the macspeechx module which ties Python to the Mac’s voice-synthesiser.  See list_voice_name() in macspeechX.py for an example of how it all works.

It works to power the speech synthesiser but it doesn’t appear to let you record the speech to a file (unlike festival above).

Update – Mike Driscoll has a post about pyTTS which hooks into Microsoft’s SAPI on Windows and pyTTSX which is cross-platform, along with some speech recognition links.


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

1 Comment | Tags: ArtificialIntelligence, Python

12 December 2009 - 23:13ConceptNetDaily Twitter Bot

I’ve just launched my second Twitter bot – @ConceptNetDaily takes a random concept from the A.I. site ConceptNet and posts it to Twitter with a link back to the site. A tweet looks like:

“When humans own horses, humans groom and ride horses.” http://tinyurl.com/ydvf7vg

The TinyURL expands out to an address like: http://openmind.media.mit.edu/en/assertion/143313/

The aim of the site is to build a large repository of common-sense knowledge, exactly the kind of knowledge that humans take for granted and never write down as statements for a computer to understand.  Currently it tracks over 1,026,553 statements.

Using the link you can vote on the concept.  Vote up if the concept is solid (i.e. something a human would say is ‘right’) or down if it is wrong, silly or erroneous.  The site supports OpenID which makes starting a touch easier.

My goal with this bot is to remind people every day to vote on the concepts and to add new knowledge.  If a concept has many votes then we can have faith that it is ‘common-sense knowledge’.  If a concept is voted down enough then we can have faith that it is ‘unhelpful or wrong’.

You’ll find a searchable list of Concepts and some random examples on the English homepage.  For good examples see all the information that ConceptNet knows about humans, chess and girls.

Details:

I’ve written the bot in Python using PyYAML, Python-tinyurl and Python-twitter.  It runs every day via a cron job.  It works by guessing a random id for a raw_assertion and checking to see if a concept lives at the URL.  See this XML example for id 143313, I extract the .yaml version via PyYAML but the .xml version renders nicely in your browser if you want a peek.

ConceptNet’s web API is well documented.  ConceptNet itself is written in Python using Django but I’m not using the downloaded version here, just the web API.

My first Twitter bot – @BrightonJobDoom:

Just in case you live here in Brighton you might want to track @BrightonJobDoom to see how healthy (or…not) the job market is in the UK during this rather wobbly recession 🙂  I wrote this bot for our £5 App’s 5k coding competition.


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

No Comments | Tags: ArtificialIntelligence, Life, £5 App Meet

11 December 2009 - 16:35Eucalyptus Clustering – follow-up

A month back I tried to build an Ubuntu-based Eucalyptus cloud/cluster environment for a client for a parallel processing research project.  The project was thwarted by an overly aggressive corporate firewall and my lack of understanding of low-level network config-fu.

I’ve revisited the project using the same machines but with an external public internet connection (no firewall – yay!).

Grub2

On the node machine I still needed to dual-boot to Windows.  Unfortunately whilst reboots to Linux are fine, if Windows is booted it ‘does something’ to the MBR and the machine is unbootable.  I delved into the boot-loader and had to learn some Grub2-fu.

Grub2 was introduced in Ubuntu 9.10, it replaces Grub which in turn replaced boot managers like lilo.  The wiki page is pretty good for recovering a boot-loader using an Ubuntu LiveCD but it didn’t work quite to plan.

The step for ‘sudo chroot /mnt’ fails as bash or sh can’t be run from within /mnt (which at this point is looking at the originally installed hd).  There is something odd going on with the LiveCD, much googling didn’t seem to reveal the answer.

To run grub-install on the hd, rather than via the CD (because chroot fails) I used ‘sudo grub-install –root-directory=/mnt /dev/sda’, it reports that ‘(hd0) /dev/sda’ is installed.

Sidenote – on later attempts somehow a reference to (fd0) got involved and this broke the boot process.  I edited /mnt/boot/grub/device.map to remove the fd0 reference, leaving the hd0 reference.  I ran grub-install again and all was fine.  Now the machine can boot again.

Mounting a USB memory stick

Whilst a 8Gb memory stick was recognised, it didn’t get mounted.  I had to edit /etc/fstab and add:

/dev/sdf1 /mnt/stick auto umask=0,user,iocharset=iso8859-1,sync,codepage=850,noauto,exec,users 0 0

After this I used ‘sudo mkdir /mnt/stick’, ‘sudo mount /dev/sdf1’ and it mounted just fine.

Installing Eucalyptus

The install process this time around was much the same as before, except this time without the firewall it all ‘just worked’.  Seeing the fnords part 1 took me through the basic install.

I got the feeling from later steps that the cloud controller needs a static IP so I switched the cluster controller from DHCP to a static IP and rebooted.

The discover nodes process (‘sudo euca_conf –no-rsync –discover-nodes’) for euca_conf also required that I’d setup ssh keys on the Node, step 6 in the NodeInstall doc has the instruction.  Typo note – if you spell ‘eucalyptus’ wrong you’ll go round in circles trying to figure out why the password won’t work!

Sometimes I couldn’t get ‘euca-describe-availability-zones verbose’ to work, it’d just report ‘No route to host’.  It seems that a reboot of the CC and Node are required, plus a minute or so of patience after boot for Apache to sort itself out, before this problems just goes away.

Using the Ubuntu Store

Having installed the CC and registered a Node, next I ran the web interface via ‘https://10.0.0.4:8443’.  Note ‘https’.  If you visit the website too soon after a reboot (i.e. <1 minute) then the webapp won’t respond or maybe it won’t recognise the admin user.  Having logged in, the first login forces a password change.

Next check the ‘Configuration’ tab and verify the IP addresses.  For reasons beyond my understanding our switch rebooted during my first attempt to setup the cluster and it switched from the ‘192.168.x.x’ address range to ’10.x.x.x’ – this royally barfed my configuration.  I chose to re-install the CC from scratch (I was plagued by ‘no route to host’ problems no matter how much tweaking I tried).

Next visit the ‘Store’ tab and download an image, I’m using ‘Ubuntu 9.10 Karmic Koala (i386)’.  Today this works – I’ve spent 2.5 days building and re-building the cluster to get it to this point.  Often the Store would download an image and then report ‘no route to host’.  This process is pretty darned frustrating and seems to lack useful error messages.

But ultimately – no cigar

Rather frustratingly I can’t get my Node to run an image.  I can see that the Node exists though ‘euca-describe-availability-zones verbose’ shows that a Node exists but doesn’t list its IP address which is odd, the online docs say it should be shown.

If I run an image then it enters the ‘pending’ state and then the ‘terminating’ state.  Digging around in Google shows that other people currently have the same problem, it might be related to the lack of Hypervisor instructions on my Node machine (though they’re not supposed to be required…).  Possibly also the current build in unstable, there’s a lot of bug-fixing going on.

Debug notes

Eucalyptus has a trouble-shooting guide, this blog series is very useful.

Conclusion

Eucalyptus should give you an EC2-like cloud that runs on your own machines, using an EC2-compatible API so you could move to the cloud when you want to scale up or are less concerned about the privacy of your data.  Currently I can’t get it to work but others do have it working – it seems to depend upon your hardware.  It also lacks clear error messages so debugging is hard – I resorted to clean installs on three occasions.


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

No Comments | Tags: Life

4 December 2009 - 14:12Sharing the Mac OS X clipboard with X11 apps

I’m using WingIDE on my MacBook and I couldn’t get copy/paste to work between WingIDE (running in X11) and native apps.  This meant copying URLs and code snippets was impossible…hugely frustrating!

There is a simple fix, as outlined here just run the Property List Editor, open the specifed .plist, tick the 5 checkboxes, save, restart all of X11 and then the clipboard is shared between X11 and native apps.  Phew.


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

No Comments | Tags: Life

4 December 2009 - 13:29£5 App Music-Themed Xmas Special

On Wednesday night we ran our music-themed £5 App Xmas Special (fivepoundapp.com).

It was fab!  John and I had a fab time organising things and watching the night run so down-to-earthly – it seems that many others did too.  I particularly like:

“I bloomin’ love £5 app! The event that’s happy to be itself, and is more rewarding for all as a result. Here with @ribot & @lastminute teams” – ribotminimus

“Home from #fivepoundapp, letting the awesomeness sink in.” – j4mie

Get to the end of the very last video and you’ll hear a special £5 App rendition of Jingle Bells.

Note: I want photos!  Email me links to flickr’d images please.  I also want your blog write-ups, mail them to me or comment down below.

Particular thanks for our sponsors Alan Newman (Sensible Development) and Paul Silver (PaulSilver.co.uk) along with John (psychicorigami) and my ProCasts for putting up cash to fund a few hours of free beer.  Also super-huge thanks to the Ribots for supplying piles of mince pies (yummy!) and John for baking a batch of crunchy cookies.

The event was organised through Philip and Declan of PlayGroup, they use Hector’s House for arts and science gigs (thanks BuildBrighton for the connection!).  Cheers chaps, it was exactly the space we needed!

“Seb’s Slightly Failed Music Career”

Seb spoke on the highs and lows of forming a band, showed previously-unseen footage and generally gave the lowdown on how it all works. Rick-rolling was included.  Seb has his own write-up.

Sadly Seb’s hard-drive died after the talk taking all his transcoded footage but on the flip-side Seb inspired Simon to share footage from his old cover band.

Here’s the 60 minute video of Seb’s talk:

£5 App #20 “Seb’s Slightly Failed Music Career” for the 2009 Xmas Special from IanProCastsCoUk on Vimeo.

We were absolutely honoured that Seb and Jenny unveiled their new Xmas song tonight, see it here and share it around:

“Toby Cole – Zero to Theremin in 20 days” (with demo)

Toby Cole shows the ThereThing constructed through BuildBrighton and unveiled at a live gig the previous month.

Paul Silver took a video of the ThereThing in action:

Sadly the ThereThing is slightly out of shot during the video of the talk but you can hear Toby and see the screen just fine (and the ThereThing link shows it in detail).

£5 App #20 “Toby Cole – Zero to Theremin in 20 days” for the 2009 Xmas Special from Ian Ozsvald on Vimeo.

“Jim – Mrmr/LiveAPI guitar-mounted iPhone ableton live interface”

Jim Purbrick showed Mrmr, the LiveAPI guitar mounted iPhone Ableton live interface.  Jim’s also the head of Second Life (UK) and is known for building robots.

£5 App #20 “Jim Purbrick – Mrmr/LiveAPI guitar mounted ableton live interface” for the 2009 Xmas Special from Ian Ozsvald on Vimeo.

“Lastminute.com Lab’s with Bottle-Rock-It” (with an additional proper demo video)

Richard, Sam and Mathias (LastMinute.com Labs) came down from London (thanks guys!) to demo Bottle-Rock-It, a group iPhone musical instrument.

The background talk gives loads of detail, sadly the demo went a bit sideways so we sang Jingle Bells as a loud (and slightly tipsy) group instead.

Check this BBC News story to see Bottle Rock It in action.

£5 App #20 “Lastminute.com’s Bottle-Rock-It” for the 2009 Xmas Special from Ian Ozsvald on Vimeo.

100 Robots (band)

After the talks finished Jim Purbrick and Max went on to play live n’loud as 100 Robots.

2010 and beyond…

If you want to keep in touch with future £5 App events then join the £5 App Google Group – it is very low volume and is mostly there just for the announces.

We’ll probably run some more competitions next year, the 5k competition went very well and John wants to do more around that idea and I want to play with some open-source A.I. kits.  Details to follow.


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

2 Comments | Tags: Life, projectbrightonblogs, sussexdigital, £5 App Meet

2 December 2009 - 15:41A quick look at four chatbots

This is a quick review of four chatbot that are easily found on the web.  I’ll take a look at the granddaddy ELIZA (wikip), A.L.I.C.E. (wikip) which uses AIML, Fake Kirk (with speech synthesis and a face) and O2’s Ask Lucy.

The goal of these chats was to see how each of the bots broke down and to learn about how the different technologies worked.  ELIZA has a small, hard-coded rule set.  A.L.I.C.E. is rule driven but with a big AIML rule set.  Fake Kirk uses statistical training on Star Trek scripts.  I don’t know what O2 use (but it doesn’t feel very sophisticated at all!).

ELIZA

A.L.I.C.E.

Fake Kirk

I’m also adding this 10 min video on Fake Kirk by someone from Pandorabots, it might be the author of Fake Kirk:

O2’s Ask Lucy


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

3 Comments | Tags: Life

22 November 2009 - 18:58Printable local data sheet for visitors?

Here’s a simple idea to help visitors to a new area.  Maybe it’s been done before and someone can leave a comment about it?

The problem – when you visit a place you don’t know you have no idea what you need to see, where to get a map, which pubs and cafes are nice, where the worthy landmarks are etc.

Possible solution – visit a site that gives you 1-2 pages of printable (or iPhoneable) data culled from WikiPedia, OpenStreetMap/GMaps, OpenPlaques, Flickr, Twitter and more.  The pages would give you a summary of what’s there to see, some history, maps and also some recent information (probably via Twitter).

The printable option would be useful, iPhone coverage still isn’t great in the UK in the smaller and more interesting towns.

Personally I’d use this – we go walking to places that we don’t know every weekend and some background, a map and some topical info (e.g. are there any fairs or events happening today?) would be super useful.  I’d guess that this would be useful for anyone visiting an area, even just for parents coming to visit for the weekend.

Does anything like this already exist?


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

2 Comments | Tags: Entrepreneur

22 November 2009 - 13:30How I’m writing The Screencasting Handbook

Many people have asked why I’m writing a book without a publisher.  The story has interested a bunch of people so I’ll outline the basics here.

Update: there’s a related article by Marc-André Cournoyer covering how he wrote his “Create your own programming language” eBook.

I started writing The Screencasting Handbook in the middle of this year (about 5 months back).  My primary motivation was to write a useful Handbook that teaches my 4 years of skills to new screencasters.  My main goals were to:

  • Release early, release often – so I can iterate based on the needs of my readers rather than the needs I’d guess that they have (based on some support at the Business of Software forum)
  • Get the written parts out as soon as possible – I didn’t want drafts kicking around for a year before a publisher released them to the readers, I wanted the chapters out in the hands of readers as soon as possible
  • Build a community (Google Group) around the Handbook – so my readers can ask and answer questions without me acting as a bottleneck

To achieve this I needed to create a site and determine if there was demand for the topic.  I had a WordPress theme created which signs potential readers up to an AWeber mailing list (costing $20USD/month) and I setup a Google Group.

I then put the word out to screencasters, mostly through ShowMeDo and by writing some useful blog posts that were picked up by screencasting companies.

At the same time I wrote a proposed Table of Contents (August) and released a survey via SurveyMonkey (free account).  I released this into the Google Group and asked for feedback.  I iterated a few times (September) based on feedback until everyone figured that I would cover the most beneficial topics.  At this point I added the Table of Contents as a PDF to the Handbook’s homepage.

By now I had 50 or so people signed up to the list – between the silent sign-ups and the active users in the Google Group I knew that the book would be in demand.  The survey detailed all the areas that caused problems for screencasters so I could be sure that by answering those questions, others would want the Handbook.

Pricing and releasing

At this point I cracked on with writing the Handbook.  I quickly went from 1,000 words to 10,300 and in October I announced that a new release was being prepared for sale.  I announced that the target price of the finished book would be $39USD and that early-bird purchasers could get it for $26USD (a 1/3 discount).  I also offer an unconditional refund at any time.

The payment gateway is PayPal and the front-end is e-junkie, they take payment and offer downloads for just $5/month.  Integrating the e-junkie basket into WordPress involves copying over a few lines of javascript, it is all very simple

At the start of November I released version 4 into the Google Group and announced it on the mailing list, this was quickly followed by a 5th release which added a new chapter.  I’m also about to decrease the discount by $1 taking the price up to $27USD.

After purchase everyone gets invited onto a second emailing list for Handbook Updates (and they’re removed from the first mailing list).  The second list is used to mail out links to updated versions of the PDF.  I also mail out a second survey about a week after purchase to ask the reader if they found the book useful and to ask what else I need to cover soon.  The feedback from the surveys and the Google Group is invaluable.

Figures so far – in several months with only a little effort at publicity I signed up over 200 users to the mailing list.  Just over 10% of those became buyers in the first week of releasing version 4 (given that the book is only about 1/6th written I’m pretty happy with this).  Next week I’ll be writing a couple of extra chapters and then I’ll be increasing my publicity.

I’m releasing my beginner screencasts on the Handbook’s blog for free, this will help prove the quality of the Handbook and it will bring in more visitors.

Print on demand?

Once I reach ‘edition 1’ I imagine I’ll release a print-on-demand version via lulu.  Several readers have already asked for a printed copy rather than a PDF.  ‘edition 1’ is a way off yet – probably early next year some time.

Tools

I’m writing the Handbook with Google Docs, I can edit it from home or whilst sitting in Cafe Delice.

To publish a new version I download a PDF.  I use Apple’s Preview to open the PDF and then ‘print to PDF’ a shorter version containing just the first 15 or so pages.

I upload the shorter version as the Outline to the Handbook’s homepage.  The longer version goes to e-junkie (for new purchasers) and to my second AWeber list (where everyone who has bought a copy gets notified about new releases).

I’ve used Google Website Optimizer to A/B test the landing page, with the Google Website Optimizer plugin for WordPress you just copy over the javascript that GWO provides to three pages (A, B and result page) and it starts to track conversions.  If there’s interest I’ll write some details on the (few) things that I’ve learned about landing page design.

I’ve already discussed AWeber, SurveyMonkey and Google Groups above.

Having an ‘accountability buddy’ helps!

Andy White is writing Podcasting Unleashed at the same time, we’re meeting every two weeks to push each other forwards and trade tips.  We’re both using WordPress and he’s about to move to Aweber so we’ll have pretty much the same setup.  Knowing that your partner is making progress when you’re having a slow day is a great motivator to write a few more pages!

Edition 2?

I’m thinking about the needs of a second edition, I’m wondering if a book format (with a linear series of pages) is wrong and perhaps a wiki is a better tool.  It would certainly allow collaborative content creation.  I’d also like to build some tools like an automatic de-noiser and a scripting tool.

Want to write you own eBook?

It occurs to me that the above process might be useful to other people who want to write their own book, particularly those who want to get early feedback from a potential audience before committing to write a full book.

One possibility is the construction of a site that makes ‘everything easy’ for a potential author.  If you’d like to know if I push this idea in the future, make a comment below which includes your email.


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

2 Comments | Tags: Business Idea, Entrepreneur, Life, ProCasts, Screencasting, The Screencasting Handbook

20 November 2009 - 11:37Building a high performance cluster with Ubuntu 9.10 and Eucalyptus

I’ve spent the last day (almost) installing Eucalyptus on Ubuntu 9.10 to create a mini ‘high performance computing’ environment.  We’re testing the concept and could build 100+ machines if the prototype works as expected.

This is a running log of my notes, for this post I only have a partial setup.

Note – I have a Eucalytpus follow-up which gets further than this post but ultimately fails.

To start you need Ubuntu 9.10 Server edition, this includes the open Eucalyptus software.  Eucalyptus is an API for cluster computing that is compatible with Amazon’s EC2.  This means you can build an in-house network for testing and private computation and later switch to EC2 if you want to scale up.  This is great if some clients need privacy and some want true utility computing.

Note that the process of installing Eucalyptus requires at least one CD download, or two if you need both the 64bit edition for the Node and 32bit for your Cluster Controller because the machine is too old.  The hardware requirements are a bit steep (Node machines need 1GB+ RAM, 40GB+ HD etc).  Once installed you’ll also have to download at least one instance image that will run on the Nodes, these are about 180MB.  This is a lot to download if you have a tiny VPN pipe to the outside world.

Two good background papers from open.eucalytpus.com are:

Installing UEC via the CD (and UEC main page) is fairly easy, I actually followed these notes (first of three parts) before finding the official docs.

Installing the server took about 30 minutes, most of that was spent reading from the CD.  The questions were pretty easy.  Some notes:

  • For the hard disk setup I used a fresh 40GB disk and chose ‘Guided – Use entire disk’ (not the LVM option)
  • I chose no email configuration (I don’t know the SMTP local details here in the client’s office)
  • For apt-get I had to configuring the proxy so it could see outside of the corporate firewall

To install the Node (1 client) I needed to dual-boot an existing Windows XP machine.  For this I had to use PartitionMagic to resize the 500Gb Windows partition down to 100GB.  This didn’t work – we kept getting ‘error 983 while executing batch’ and the resize would abort.  The solution (as noted many times on the web) is to run ‘chkdsk /f’ at the command prompt – it reboots, does the check, in our case it didn’t report any changes, then PartitionMagic worked.

The candidate Node machine recognises that the Cluster Controller is running on another machine so it nominates itself as a Node.  Only a few questions are asked (e.g. the keyboard) and then everything is installed.  For the HD installation I chose ‘Use the largest contiguous free space’ having blanked 360GB via PartitionMagic earlier.

For reasons that aren’t clear after installation it had trouble finding the network.  I had to ‘sudo /etc/init.d/networking restart’ before it could ‘ping slashdot.org’.  It still won’t do a full ‘sudo apt-get update’ (it completes just fine on the Cluster Controller) but I’ll assume that this isn’t a problem.

Now that the network is good, if I run ‘sudo euca_conf –no-rsync –discover-nodes’ on my Cloud Controller then it reports finding 1 Node.  I can accept the Node but after that I have some sort of authentication fail.  This might be due to the corporate network firewall.

If I jump a step forwards then I can run ‘sudo euca_conf –get-credentials mycreds.zip’, ‘unzip mycreds.zip’, ‘./eucarc’ but then when I run ‘euca-describe-availability-zones verbose’ I get an XML parse error much like this bug.

There are enough network errors here to suggest that the corporate firewall isn’t playing ball (it won’t be the first time).  I’ll restart installation on my two test machines when we have a public internet connection established that avoids the corporate firewall.  I’ll post another entry when I run the second experiment (December, all going well).

Update: I followed the NodeInstallation notes to set the Cloud Controller’s eucalyptus user’s public key into the Node Controller’s eucalyptus user’s authorized_keys file.  That hasn’t fixed the above two errors.

Books:

The following books will help you move forwards, the Eucalyptus one will make the above configuration easier and the second on EC2 will help you see how Eucalyptus and EC2 compare.


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

1 Comment | Tags: Life

11 November 2009 - 12:12£5 App Christmas Special – Weds 2nd December

John and I are very pleased to announce our upcoming music-themed £5 App Christmas Special on Wednesday 2nd December, 8-11pm at Hector’s House in collaboration with the lovely Playgroup guys.  Please do the usual – sign-up on Upcoming so we know how much beer to brew for you all.  If you don’t know what this is then see last year’s Xmas Special write-up and details of all the previous events (with videos).

We want 40-60 of you along this year so please spread the word – Tweets and blog posts would be hugely appreciated!

Outline:

  • Seb Lee-Delisle – “My life as a wannabe rock star at the birth of the internet music boom” – full description below
  • Toby Cole – “Zero to Theremin in 20 days” – How BuildBrighton built a feature rich, ultrasonic, laser etched MIDI controller in under three weeks”
  • Tom Hume – “You’re all an orchestra, get over it” – Bluetooth devices will interact with the audience to create changing ambient music, created by Future Platforms for a Music Hack Day
  • Jim Purbrick – “A short talk on the Mrmr/LiveAPI guitar mounted iPhone ableton live interface by the head of Second Life Europe and later a demo with 100Robots”
  • lastminute.com labs – Bottle-Rock-It, a music game for n iPhones where (with any luck) n > 3 (Richard, Russ, Sam, Mathias)
  • 100Robots – Jim and Max Williams play live and loud for us

Seb has the main talk, his full blurb is:

“Before Seb Lee-Delisle was peddling his digital creations, he had an entirely different life. He spent most of his 20s setting up Solar Records and promoting his band Stargirl (later Laine). Investing over £50,000 of their own money, they released their own CDs, made it onto the radio and TV, played in front of 30,000 people, recorded at George Martin’s Air Studios and had full page spreads in the nationals.

They were at the forefront internet music boom of the late 90s. The future was looking rosy for this group of dynamic 20-somethings. So come and find out what it was like, how the hell they got the £50K, and why their plans didn’t quite reach fruition…”

Beer – several of us who are doing well this year will put up some bar-money (Alan of SensibleDevelopment, Paul Silver of Brighton Farm and my ProCasts so far, several more to come, get in contact if you want to share the love).

Food – maybe nibbles.

Next, please sign-up on Upcoming so we know how much beer to provide and tweet/post about the event to help us spread the word.  Cheers!


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

2 Comments | Tags: Life, projectbrightonblogs, sussexdigital, £5 App Meet