About

Ian Ozsvald picture

This is Ian Ozsvald's blog (@IanOzsvald), I'm an entrepreneurial geek, a Data Science/ML/NLP/AI consultant, founder of the Annotate.io social media mining API, author of O'Reilly's High Performance Python book, co-organiser of PyDataLondon, co-founder of the SocialTies App, author of the A.I.Cookbook, author of The Screencasting Handbook, a Pythonista, co-founder of ShowMeDo and FivePoundApps and also a Londoner. Here's a little more about me.

High Performance Python book with O'Reilly View Ian Ozsvald's profile on LinkedIn Visit Ian Ozsvald's data science consulting business Protecting your bits. Open Rights Group

21 September 2010 - 12:01Scrapy, libxml + libxslt, Mac, “checking for libxml libraries >= 2.6.8… configure: error:”

In the hope that this’ll save someone else the bother…if you’re installing the web scraping Python library scrapy on your Mac (I’m on Leopard 10.5.8) and you come across an error like:

checking for libxml libraries >= 2.6.8... configure: error:
Version 2.6.7 found. You need at least libxml2 2.6.8 for this
version of libxslt

then here’s the solution.

Presumably you’ll be following the Scrapy install instructions. I used the supplied links for libxml2-2.7.3 and libxslt-1.1.24. libxml built and installed to /usr/local/lib just fine. libxslt wouldn’t ./configure – it kept reporting that it could only see the older libxml from /usr/lib, not the newer one in /usr/local/lib.

The fix is here, and this is my configure line:

 $ ./configure 
    --with-python=/Library/Frameworks/Python.framework/Versions/2.5/ 
    --prefix=/usr/local 
    --with-libxml-prefix=/usr/local 
    --with-libxml-include-prefix=/usr/local/include 
    --with-libxml-libs-prefix=/usr/local/lib

At this point libxslt configured, built and installed just fine. To make python see it I had to update my .bash_profile so PYTHONPATH linked to the default output directory:

export PYTHONPATH=$PYTHONPATH:/usr/local/lib/python2.5/site-packages

Side note – whatever you do, don’t mess with /usr/lib. I tried moving the default libxml and libxslt libraries and I had the same consequence mentioned by Kevin Watters – lots of system tools (including su!) depend on libxslt to be in /usr/lib. I had to boot to Single User Mode to copy the files back before the system would work again.


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

No Comments | Tags: Python

17 September 2010 - 10:52Demoing pyCUDA at the London Financial Python User Group

On Wednesday night I jumped on a train up to London to visit the London Financial Python User Group to give a short demo of pyCUDA. I’m using CUDA heavily for my physics consultancy and I figured the finance guys would be interested in 10-1000* speed-ups for their calculations.

The raw figures and the Mandelbrot demo that I gave are already covered in my earlier blog post: 22,937* faster Python math using pyCUDA.

To introduce pyCUDA I used P. Narayanan’s GPUs: For Graphics and Beyond PDF presentation (the first 13 pages), his explanation and diagrams are very clear.

To put CUDA in context against regular CPUs I used the recent Peak MHz graph and the main power/speed/transistor count graph in The Free Lunch is Over: A Fundamental Turn to Concurrency in Software. The main point here is that we’ve topped out at 2-3GHz CPUs and now we have to parallelise our code. Doing so on CPUs means we get 4, 8, 16 (and soon 24 then 32) cores to play with…but with CUDA if the problem is mathematics based we have 480 cores to use!

If you’re interested in the general use of CUDA and GPUs then check out the excellent gpgpu.org.

You may wonder about real-world performance with CUDA. Without naming names I can say that I’m now delivering a 115* speed-up on a particularly gnarly problem (I mentioned during the talk that I’d reached 80* – I’ve managed to improve that in the last 2 days). On an earlier problem when I knew far less about CUDA I delivered a 100* speed-up for the same company.

It was grand to meet a lot of new faces at the group, a few people I’ve met before at PyCons (hi Ben! Giles!). Making a contact with Didrik of Enthought was rather grand too. I hope to visit again.


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

3 Comments | Tags: Life, Python

7 September 2010 - 22:11Selling ProCasts through Flippa.com

A couple of weeks ago I sold ProCasts.co.uk, the screencasting business I built over the last two years. Some of you know that I moved away from the business back at Christmas and left it idle (a rather silly thing to do), here are some notes on how I sold it and how you could sell your business. This is my first business sale, some valuable lessons were learned.

I listed the business on flippa.com a month back, flippa specialises in matching buyers and sellers of domain names and small businesses. Since ProCasts was, after 8 months of inactivity, essentially a website that generated leads with a client list – I figured a listing on flippa would find some interested parties. I didn’t sell The Screencasting Handbook, I’m still happily developing the Handbook’s sales.

The new owners are Tintisha Technologies, a Leicester based video production company who wanted to expand their screencasting brand. Rich of Tintisha discovered the ProCasts sale through flippa by (happy!) accident, made a couple of bids at the end of the auction and came out on top. We completed the handover last week.

The reason for selling ProCasts was simple – I’d moved away from screencasting back at Christmas as I’d decided to return to my historic trade of artificial intelligence research and data science. I knew that a few of ProCasts’ competitors might be interested in the site and that a listing on flippa with money sent through escrow.com would make for a clean, safe sale.

I listed the site as an “Established lead generating screencasting site” with a two week auction. Flippa works differently to eBay – it uses an open auction (though private sales are possible) with a rolling end-time (if a bid is placed within 4 hours of the end of the auction the end time is advanced by another 4 hours).

Take a look at the listing to see the details that I included, I added:

  • Full business and site description
  • Details of past clients and warm leads
  • Bank statements to prove income
  • Verified Google Analytics traffic data
  • A Transfer Agreement listing all assets/processes for the sale

I made a point of responding to all questions (lots came via the private email channel) and updating the listing with new information. Fortuitously a couple of older leads came back with requests for work during the auction so these ‘very warm leads’ got a mention in the comments too.

At the end of the day the site sold for $4,002 (£2,500), minus the sale fee (£100) and escrow.com’s fees I took away £2,400. Not bad for a site that was otherwise of no value to me but obviously not an ‘interesting exit’.

Here are some of the takehome lessons:

  • If you’re selling a business, a pure consultancy (with no consultants) isn’t super interesting to buyers, only to existing market players
  • Building a consultancy in a super-small niche (when I started I had 4 US competitors and 0 in the UK) means few buyers when you decide to exit (in fairness – I didn’t build the business to sell it, I know better for next time)
  • Design your business with an exit in mind – recurring or passive income has real value to a buyer, make sure you can be removed from the business without damaging it
  • A two week auction was fine but four weeks would have made more sense
  • Soliciting private bids from competitors should have been done sooner rather than later
  • Adding a product or recurring income stream to the business would have added a lot of value (I decided to keep The Screencasting Handbook as an experimental platform)
  • BusinessesForSale is an alternative site, I didn’t know about it when I started, their companies tend to have higher value (flippa isn’t really for consultancy businesses, just simple web businesses)

What next?

Some of you know that I’ve been working in the field of artificial intelligence research for industry over the last 10 years (as both senior programmer, product designer and pure r&d bod) in my Mor Consulting. This role is evolving and I’m turning into a “Data Scientist” (the new shiny term for A.I. researchers!).

I’m also building some new IP by way of web services using A.I. technologies, these are designed with an exit in mind (I’m learning!). If you’re curious about using A.I. in industry see my new A.I.Cookbook.

I’m also continuing to develop The Screencasting Handbook, it is a useful experimental platform and I still very much enjoy teaching the art of screencasting.

If you have any questions, ask away.


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

No Comments | Tags: ArtificialIntelligence, Entrepreneur, Life, ProCasts, Screencasting, The Screencasting Handbook

5 September 2010 - 13:03Saving power around the house with an EnviR

We took delivery of an EnviR from CurrentCost a week back, we’ve been measuring power usage around the house since then. The unit itself is super easy to install – the LCD panel sits on a window sill and the measurement unit clips to the electric meter (it has a 30m range).

Here’s what we measured (kW is kilo Watts, W is Watts, I’m not writing ‘per hour’ for each entry):

  • Kettle boiling – 2.9kW/hr (for about 5 minutes)
  • Electric oven – 2kW/hr at 220 degrees C (used 30-60 minutes per day)
  • 800W Microwave running – 1kW
  • Washing machine – 300W
  • Electric clothes drier – 282W (used for 2 hours every few days)
  • Dehumidifier – 200W (used 1 hour each day)
  • Widescreen LCD TV – 90W-140W
  • Kitchen downlighters (5) – 112W
  • Fridge and freezer running – 94W (turns on and off throughout the day)
  • Media PC on and playing a video – 100W
  • Media PC on but idle – 40W
  • Macbook charger – 40W
  • Amplifier on media PC – 20W (idle most of the day)
  • Power saving lights – 9W-24W each
  • Microwave on standby – 4W
  • Dishwasher running – ??
  • Toaster running – ??
  • DECT phone, broadband router, standby power for media PC – 3W
  • The following readings are guestimates – 1W seems to be the lowest reading the EnviR can make
  • Coffee grinder on standby – 1W
  • Toaster on standby – 1W
  • Widescreen LCD TV – 1W
  • Electric oven and extractor fan on standby – 1W

Has it changed our behaviour? We’ve started turning off the media PC when not in use (saving 40W/hr overnight). We also turn off the microwave and coffee grinder (saving 5W/hr 23 hours a day) – it is trivial but the grinder gets warm, turning them on just to use them is easy.

We’ve also stopped turning off the TV at night (it uses at most 1W/hr on standby) and turning off the toaster (again 1W/hr at most). I had wondered if the TV consumed a lot of standby power, the toaster has a set of LEDs – both use a trivial amount of power so I’ll ignore them for now.

I had no idea that the kitchen lights were so expensive – we won’t leave them on when not in use any more. I was really surprised by the oven – 2kW/hr  swamps the usage of everything else! We really ought to use the outside line for some wet clothes too (but we don’t have back access to the tiny garden so getting there is a bit of a faff…).

The Economist has a nice Watts Up article looking at how people underestimate the expense of some items (I certainly did!) and overestimate the savings they get from turning off things like lightbulbs.

Now the meter sits on a window ledge facing the sofa – we can monitor the house’s power usage and over time we’ll learn to play the game of keeping the numbers as low as seems reasonable. Feedback is a powerful thing!


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

2 Comments | Tags: Life