Ian Ozsvald picture

This is Ian Ozsvald's blog (@IanOzsvald), I'm an entrepreneurial geek, a Data Science/ML/NLP/AI consultant, founder of the Annotate.io social media mining API, author of O'Reilly's High Performance Python book, co-organiser of PyDataLondon, co-founder of the SocialTies App, author of the A.I.Cookbook, author of The Screencasting Handbook, a Pythonista, co-founder of ShowMeDo and FivePoundApps and also a Londoner. Here's a little more about me.

High Performance Python book with O'Reilly View Ian Ozsvald's profile on LinkedIn Visit Ian Ozsvald's data science consulting business Protecting your bits. Open Rights Group

22 November 2009 - 18:58Printable local data sheet for visitors?

Here’s a simple idea to help visitors to a new area.  Maybe it’s been done before and someone can leave a comment about it?

The problem – when you visit a place you don’t know you have no idea what you need to see, where to get a map, which pubs and cafes are nice, where the worthy landmarks are etc.

Possible solution – visit a site that gives you 1-2 pages of printable (or iPhoneable) data culled from WikiPedia, OpenStreetMap/GMaps, OpenPlaques, Flickr, Twitter and more.  The pages would give you a summary of what’s there to see, some history, maps and also some recent information (probably via Twitter).

The printable option would be useful, iPhone coverage still isn’t great in the UK in the smaller and more interesting towns.

Personally I’d use this – we go walking to places that we don’t know every weekend and some background, a map and some topical info (e.g. are there any fairs or events happening today?) would be super useful.  I’d guess that this would be useful for anyone visiting an area, even just for parents coming to visit for the weekend.

Does anything like this already exist?

Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

2 Comments | Tags: Entrepreneur

22 November 2009 - 13:30How I’m writing The Screencasting Handbook

Many people have asked why I’m writing a book without a publisher.  The story has interested a bunch of people so I’ll outline the basics here.

Update: there’s a related article by Marc-André Cournoyer covering how he wrote his “Create your own programming language” eBook.

I started writing The Screencasting Handbook in the middle of this year (about 5 months back).  My primary motivation was to write a useful Handbook that teaches my 4 years of skills to new screencasters.  My main goals were to:

  • Release early, release often – so I can iterate based on the needs of my readers rather than the needs I’d guess that they have (based on some support at the Business of Software forum)
  • Get the written parts out as soon as possible – I didn’t want drafts kicking around for a year before a publisher released them to the readers, I wanted the chapters out in the hands of readers as soon as possible
  • Build a community (Google Group) around the Handbook – so my readers can ask and answer questions without me acting as a bottleneck

To achieve this I needed to create a site and determine if there was demand for the topic.  I had a WordPress theme created which signs potential readers up to an AWeber mailing list (costing $20USD/month) and I setup a Google Group.

I then put the word out to screencasters, mostly through ShowMeDo and by writing some useful blog posts that were picked up by screencasting companies.

At the same time I wrote a proposed Table of Contents (August) and released a survey via SurveyMonkey (free account).  I released this into the Google Group and asked for feedback.  I iterated a few times (September) based on feedback until everyone figured that I would cover the most beneficial topics.  At this point I added the Table of Contents as a PDF to the Handbook’s homepage.

By now I had 50 or so people signed up to the list – between the silent sign-ups and the active users in the Google Group I knew that the book would be in demand.  The survey detailed all the areas that caused problems for screencasters so I could be sure that by answering those questions, others would want the Handbook.

Pricing and releasing

At this point I cracked on with writing the Handbook.  I quickly went from 1,000 words to 10,300 and in October I announced that a new release was being prepared for sale.  I announced that the target price of the finished book would be $39USD and that early-bird purchasers could get it for $26USD (a 1/3 discount).  I also offer an unconditional refund at any time.

The payment gateway is PayPal and the front-end is e-junkie, they take payment and offer downloads for just $5/month.  Integrating the e-junkie basket into WordPress involves copying over a few lines of javascript, it is all very simple

At the start of November I released version 4 into the Google Group and announced it on the mailing list, this was quickly followed by a 5th release which added a new chapter.  I’m also about to decrease the discount by $1 taking the price up to $27USD.

After purchase everyone gets invited onto a second emailing list for Handbook Updates (and they’re removed from the first mailing list).  The second list is used to mail out links to updated versions of the PDF.  I also mail out a second survey about a week after purchase to ask the reader if they found the book useful and to ask what else I need to cover soon.  The feedback from the surveys and the Google Group is invaluable.

Figures so far – in several months with only a little effort at publicity I signed up over 200 users to the mailing list.  Just over 10% of those became buyers in the first week of releasing version 4 (given that the book is only about 1/6th written I’m pretty happy with this).  Next week I’ll be writing a couple of extra chapters and then I’ll be increasing my publicity.

I’m releasing my beginner screencasts on the Handbook’s blog for free, this will help prove the quality of the Handbook and it will bring in more visitors.

Print on demand?

Once I reach ‘edition 1’ I imagine I’ll release a print-on-demand version via lulu.  Several readers have already asked for a printed copy rather than a PDF.  ‘edition 1’ is a way off yet – probably early next year some time.


I’m writing the Handbook with Google Docs, I can edit it from home or whilst sitting in Cafe Delice.

To publish a new version I download a PDF.  I use Apple’s Preview to open the PDF and then ‘print to PDF’ a shorter version containing just the first 15 or so pages.

I upload the shorter version as the Outline to the Handbook’s homepage.  The longer version goes to e-junkie (for new purchasers) and to my second AWeber list (where everyone who has bought a copy gets notified about new releases).

I’ve used Google Website Optimizer to A/B test the landing page, with the Google Website Optimizer plugin for WordPress you just copy over the javascript that GWO provides to three pages (A, B and result page) and it starts to track conversions.  If there’s interest I’ll write some details on the (few) things that I’ve learned about landing page design.

I’ve already discussed AWeber, SurveyMonkey and Google Groups above.

Having an ‘accountability buddy’ helps!

Andy White is writing Podcasting Unleashed at the same time, we’re meeting every two weeks to push each other forwards and trade tips.  We’re both using WordPress and he’s about to move to Aweber so we’ll have pretty much the same setup.  Knowing that your partner is making progress when you’re having a slow day is a great motivator to write a few more pages!

Edition 2?

I’m thinking about the needs of a second edition, I’m wondering if a book format (with a linear series of pages) is wrong and perhaps a wiki is a better tool.  It would certainly allow collaborative content creation.  I’d also like to build some tools like an automatic de-noiser and a scripting tool.

Want to write you own eBook?

It occurs to me that the above process might be useful to other people who want to write their own book, particularly those who want to get early feedback from a potential audience before committing to write a full book.

One possibility is the construction of a site that makes ‘everything easy’ for a potential author.  If you’d like to know if I push this idea in the future, make a comment below which includes your email.

Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

2 Comments | Tags: Business Idea, Entrepreneur, Life, ProCasts, Screencasting, The Screencasting Handbook

20 November 2009 - 11:37Building a high performance cluster with Ubuntu 9.10 and Eucalyptus

I’ve spent the last day (almost) installing Eucalyptus on Ubuntu 9.10 to create a mini ‘high performance computing’ environment.  We’re testing the concept and could build 100+ machines if the prototype works as expected.

This is a running log of my notes, for this post I only have a partial setup.

Note – I have a Eucalytpus follow-up which gets further than this post but ultimately fails.

To start you need Ubuntu 9.10 Server edition, this includes the open Eucalyptus software.  Eucalyptus is an API for cluster computing that is compatible with Amazon’s EC2.  This means you can build an in-house network for testing and private computation and later switch to EC2 if you want to scale up.  This is great if some clients need privacy and some want true utility computing.

Note that the process of installing Eucalyptus requires at least one CD download, or two if you need both the 64bit edition for the Node and 32bit for your Cluster Controller because the machine is too old.  The hardware requirements are a bit steep (Node machines need 1GB+ RAM, 40GB+ HD etc).  Once installed you’ll also have to download at least one instance image that will run on the Nodes, these are about 180MB.  This is a lot to download if you have a tiny VPN pipe to the outside world.

Two good background papers from open.eucalytpus.com are:

Installing UEC via the CD (and UEC main page) is fairly easy, I actually followed these notes (first of three parts) before finding the official docs.

Installing the server took about 30 minutes, most of that was spent reading from the CD.  The questions were pretty easy.  Some notes:

  • For the hard disk setup I used a fresh 40GB disk and chose ‘Guided – Use entire disk’ (not the LVM option)
  • I chose no email configuration (I don’t know the SMTP local details here in the client’s office)
  • For apt-get I had to configuring the proxy so it could see outside of the corporate firewall

To install the Node (1 client) I needed to dual-boot an existing Windows XP machine.  For this I had to use PartitionMagic to resize the 500Gb Windows partition down to 100GB.  This didn’t work – we kept getting ‘error 983 while executing batch’ and the resize would abort.  The solution (as noted many times on the web) is to run ‘chkdsk /f’ at the command prompt – it reboots, does the check, in our case it didn’t report any changes, then PartitionMagic worked.

The candidate Node machine recognises that the Cluster Controller is running on another machine so it nominates itself as a Node.  Only a few questions are asked (e.g. the keyboard) and then everything is installed.  For the HD installation I chose ‘Use the largest contiguous free space’ having blanked 360GB via PartitionMagic earlier.

For reasons that aren’t clear after installation it had trouble finding the network.  I had to ‘sudo /etc/init.d/networking restart’ before it could ‘ping slashdot.org’.  It still won’t do a full ‘sudo apt-get update’ (it completes just fine on the Cluster Controller) but I’ll assume that this isn’t a problem.

Now that the network is good, if I run ‘sudo euca_conf –no-rsync –discover-nodes’ on my Cloud Controller then it reports finding 1 Node.  I can accept the Node but after that I have some sort of authentication fail.  This might be due to the corporate network firewall.

If I jump a step forwards then I can run ‘sudo euca_conf –get-credentials mycreds.zip’, ‘unzip mycreds.zip’, ‘./eucarc’ but then when I run ‘euca-describe-availability-zones verbose’ I get an XML parse error much like this bug.

There are enough network errors here to suggest that the corporate firewall isn’t playing ball (it won’t be the first time).  I’ll restart installation on my two test machines when we have a public internet connection established that avoids the corporate firewall.  I’ll post another entry when I run the second experiment (December, all going well).

Update: I followed the NodeInstallation notes to set the Cloud Controller’s eucalyptus user’s public key into the Node Controller’s eucalyptus user’s authorized_keys file.  That hasn’t fixed the above two errors.


The following books will help you move forwards, the Eucalyptus one will make the above configuration easier and the second on EC2 will help you see how Eucalyptus and EC2 compare.

Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

1 Comment | Tags: Life

11 November 2009 - 12:12£5 App Christmas Special – Weds 2nd December

John and I are very pleased to announce our upcoming music-themed £5 App Christmas Special on Wednesday 2nd December, 8-11pm at Hector’s House in collaboration with the lovely Playgroup guys.  Please do the usual – sign-up on Upcoming so we know how much beer to brew for you all.  If you don’t know what this is then see last year’s Xmas Special write-up and details of all the previous events (with videos).

We want 40-60 of you along this year so please spread the word – Tweets and blog posts would be hugely appreciated!


  • Seb Lee-Delisle – “My life as a wannabe rock star at the birth of the internet music boom” – full description below
  • Toby Cole – “Zero to Theremin in 20 days” – How BuildBrighton built a feature rich, ultrasonic, laser etched MIDI controller in under three weeks”
  • Tom Hume – “You’re all an orchestra, get over it” – Bluetooth devices will interact with the audience to create changing ambient music, created by Future Platforms for a Music Hack Day
  • Jim Purbrick – “A short talk on the Mrmr/LiveAPI guitar mounted iPhone ableton live interface by the head of Second Life Europe and later a demo with 100Robots”
  • lastminute.com labs – Bottle-Rock-It, a music game for n iPhones where (with any luck) n > 3 (Richard, Russ, Sam, Mathias)
  • 100Robots – Jim and Max Williams play live and loud for us

Seb has the main talk, his full blurb is:

“Before Seb Lee-Delisle was peddling his digital creations, he had an entirely different life. He spent most of his 20s setting up Solar Records and promoting his band Stargirl (later Laine). Investing over £50,000 of their own money, they released their own CDs, made it onto the radio and TV, played in front of 30,000 people, recorded at George Martin’s Air Studios and had full page spreads in the nationals.

They were at the forefront internet music boom of the late 90s. The future was looking rosy for this group of dynamic 20-somethings. So come and find out what it was like, how the hell they got the £50K, and why their plans didn’t quite reach fruition…”

Beer – several of us who are doing well this year will put up some bar-money (Alan of SensibleDevelopment, Paul Silver of Brighton Farm and my ProCasts so far, several more to come, get in contact if you want to share the love).

Food – maybe nibbles.

Next, please sign-up on Upcoming so we know how much beer to provide and tweet/post about the event to help us spread the word.  Cheers!

Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

2 Comments | Tags: Life, projectbrightonblogs, sussexdigital, £5 App Meet