About

Ian Ozsvald picture

This is Ian Ozsvald's blog, I'm an entrepreneurial geek, an AI consultant, co-founder of the StrongSteam AI and data mining API, co-founder of the SocialTies App, author of the A.I.Cookbook, author of The Screencasting Handbook, a Pythonista, co-founder of ShowMeDo and FivePoundApps and also a Brightonian. Here's a little more about me.

View Ian Ozsvald's profile on LinkedIn Visit Ian Ozsvald's data science consulting business Protecting your bits. Open Rights Group

31 January 2012 - 14:12Data mining/AI/robots/hackerspace meet-up this Thursday

This Thursday at 7pm our StrongSteam will run a friendly pub meet around:

  • Data mining
  • Artificial Intelligence (AI)
  • Robots
  • Hackerspaces

The goal is to bring people together from StartupChile and the local community who are interested in the above subjects. The meeting is just a pub meetup, if there’s demand then I’ll organise speakers for the next one.

The locations is Bar Lastarria, 70 Lastarria, Santiago (map). Here’s a photo:

Confirmed attendees include:

Here’s the official announce.


Ian applies Artificial Intelligence as an Artificial Intelligence Researcher for companies (Mor Consulting), co-founded the StrongSteam A.I. datamining toolkit, co-authored SocialTies, programs Python, writes The Screencasting Handbook and is also a sea-side dweller and consumer of fine coffees.

No Comments | Tags: ArtificialIntelligence, Entrepreneur, Life, StartupChile

31 January 2012 - 14:02StartupChile – we have our contracts, StrongSteam progress, PyCon

A few days back we signed our StartupChile contracts, now we’re official. Apparently our ID cards are available but there’s no word on bank accounts yet. The admin rolls forward but it is a bit boring now. The feeling here is still very positive, we’ve gained some Return Value Agenda (RVA) points by meeting with the local university and StrongSteam runs its first event this week (next post).

In StrongSteam we’ve made progress – we’re now working with Kasabi on an optical character recognition project on Latin plant labels, they have large plant data sets which we’ll marry up with a user’s experience whilst walking around places like Kew Gardens. We’re being interviewed by the BBC on this shortly.

Behind the scenes I’ve extended the python-tesseract wrapper with a nicer access class, shortly I’ll post that to github. It makes it really easy to get characters and co-ordinates from scenes. Image processing tools will be available via StrongSteam to make the task easier.

For March I’ve also bought my PyCon tickets to run my High Performance Computing class. I had no idea it’d take longer to fly from Santiago to Santa Clara than Heathrow to Santiago! It is 20 hours north vs 18 hours west.


Ian applies Artificial Intelligence as an Artificial Intelligence Researcher for companies (Mor Consulting), co-founded the StrongSteam A.I. datamining toolkit, co-authored SocialTies, programs Python, writes The Screencasting Handbook and is also a sea-side dweller and consumer of fine coffees.

No Comments | Tags: ArtificialIntelligence, StartupChile

14 January 2012 - 21:27Santiago – first few days

I’d better log our first few days before the crazyness of signing up to the programme kicks off on Monday. Emily (my fiancée) is also blogging for her TinyEars StartupChile project.

We arrived safely on Wednesday after 18 hours of travel – BA treated us well (reasonably comfy seats and reasonable food). We were hustled into a taxi at the airport (at a rather pricey £45) but got delivered quickly to our rather nice apartments in swanky Providence.

We’ve had three nights of parties now, first with Jon and Anna (so lovely to catch up!), then lunch with Emily’s madrina Johanna (@J_Angulo) and on to meet our padrino Fernando (@fdelsolar), and finally two Phase 1 leaving dos last night. Pisco and rum seem to flow from all bottles. We seem to have found a nice Pale Ale too and London Pride has been sighted in bottles. We got to meet Fernando of SQMOS, the data guys of Junar and Tom of Rentalita (Tom’s Santiago tumblr) along with a whole bunch of others, some of whom are shortly off to travel South America.

Yesterday we climbed San Cristobel (photo) and met a Llama (pronounced ‘yama’). Today we had a nice run along the river at Tobalaba and Kyran has pointed out some other running sites.

Tonight we have another dinner, Sunday we chill (a touch, and prepare a demo), then Mon-Thurs are sign-up days, government ID card days, bank days and demo days all rolled into one lump. The week after we ‘officially’ start on our projects (even if we have launched StrongSteam to our first users already!).

Wifi tip – in the business district there are lots of StarBucks, these have free wifi when you buy coffee.


Ian applies Artificial Intelligence as an Artificial Intelligence Researcher for companies (Mor Consulting), co-founded the StrongSteam A.I. datamining toolkit, co-authored SocialTies, programs Python, writes The Screencasting Handbook and is also a sea-side dweller and consumer of fine coffees.

No Comments | Tags: ArtificialIntelligence, Entrepreneur, StartupChile

9 January 2012 - 15:13Heading to StartupChile

This is a quick update – we’re flying tomorrow to Santiago for 6 months of the StartupChile project ($40k funding, no equity, hundreds of projects flying in from all over the world). If you’re interested in taking 6 months to build your own project I’d suggest you take a look at applying to the next round.

Kyran Dale and I are flying out to build our StrongSteam AI and data mining toolkit (its a cloud API with local language bindings). We have our first client and we launched the alpha API to our first testers a couple of days back. Once we’re in Santiago we’ll add some more testers, expand the API and deliver our first project, then after March we can really ramp up the creation of data mining APIs for people to play with. We’re excited to be in talks with a few people about releasing the alpha at a couple of hackday events, it’ll be really interesting to see what people do with our optical character recognition, image matching, face detection and image manipulation tools. If you’re interested in trying out the Python API then do sign-up to the mailing list on the homepage.

Emily (my fiancée) is also heading out with her TinyEars iPad app, she’ll build a child-friendly app that’ll help kids learn to read out loud by using speech recognition to spot errors in their speech. She’ll be looking for testers with iPad 2s and young kids who are learning to read, do get in touch if you’re interested in the testing.

We had a fab sendoff at the Northern Lights a few days back, cheers to all who came along :-)

Finally – I’m a bit honoured to have been selected as a teacher at PyCon in the US in March, I’m running a half-day tutorial on High Performance Computing based on my tutorial at EuroPython. We’re using a bunch of these ideas in StrongSteam, it’ll be great to run the tutorial again.


Ian applies Artificial Intelligence as an Artificial Intelligence Researcher for companies (Mor Consulting), co-founded the StrongSteam A.I. datamining toolkit, co-authored SocialTies, programs Python, writes The Screencasting Handbook and is also a sea-side dweller and consumer of fine coffees.

No Comments | Tags: ArtificialIntelligence, Entrepreneur, StartupChile

26 November 2011 - 20:48Five new Brighton businesses

Earlier in the year through Matt Weston a group of us met, funded by the Innovation and Growth Team, to start a peer-group for a set of four (wait for it…) new businesses. The group was successful – and for several of us it led to the realisation that our plans at the time weren’t right. Emily and I were working on SocialTies as our project and trying to find a business hidden in the app, we decided against it and looked to other ideas.

Here’s what we’re working on. I hope it’ll encourage a few other folk to think about building new businesses.

The IGT funding dried up and so we now meet informally, our projects are:

I mentioned that I’d do a little write-up before we leave the country, Chris sent me this blurb about MightyHumble:

mighty humble is a small organic clothing company that believes in creativity, good design and responsible business.  We collaborate with hand picked creative talent to produce unique products using the most ethical and environmental sound materials, manufacturing and suppliers we can find. Our 100% cotton t-shirts are ethically made, certified organic by the Soil Association and manufactured solely using sustainable energy generated from wind power. We envisage our collection as wearable art which enables us to bring the work of some incredible talented people to a wider audience.  For mighty humble business is not just about turning a profit.  Experience has taught us that there’s more too it than that!  We believe a business can (and should) be a creative, fun and positive force.

Jo describeds Bookish as:

the home of unique literary gifts, typographic loveliness and beautiful bookish things – for readers, writers, dreamers, thinkers and bibliophiles everywhere

Jackie says:

Sales Precruitment is all about helping MDs of growing digital and technology companies prepare for recruiting their first (and additional) sales person.  Setting realistic targets, putting measurements in place, interviewing and induction, these are just a few of the things we can help with.  All this is done face to face at present but 2012 is the year I work out how to offer some of this support online… wish me luck!!

From January Emily, Kyran and myself are off to Chile for the StartupChile project, we’re taking TinyEars and StrongSteam as our 6 month projects. A part of our requirement for StartupChile is that we help build the entrepreneurial community – given our work building OpenCoffeeSussex, SheSays, FivePoundApp and GirlGeekDinners we figure we’re well placed to help bring interesting folk together. The opportunity to network with several hundred other folk who have jumped country to found new businesses is simply too good to pass up (along with living in a growing, upbeat country with a strong economy, a new language to learn and some Tango to practice).

For our StrongSteam we’re after alpha testers – we want non-AI developers (particularly web and mobile devs) who want access to image recognition, OCR, data mining and clustering tools. Emily is after collaborators and testers – particularly people with kids and iPad 2s.


Ian applies Artificial Intelligence as an Artificial Intelligence Researcher for companies (Mor Consulting), co-founded the StrongSteam A.I. datamining toolkit, co-authored SocialTies, programs Python, writes The Screencasting Handbook and is also a sea-side dweller and consumer of fine coffees.

1 Comment | Tags: ArtificialIntelligence, Entrepreneur, Life

26 October 2011 - 11:29StrongSteam alpha, HackerNewsLondon, Startup-Chile

I’m a little behind with the blogging so here’s the short version. StrongSteam has been under constant dev for 2 months, we’re close to putting up the first AI tools behind a few Python demos (hopefully it’ll be up next week). I’m talking on this at HackerNewsLondon tomorrow night.

We haven’t (quite) finished the demos so it’ll be a slideshow, I’m thinking of running a workshop in a month or so to show what’s possible, talk through the limitations and possibilities and help people got comfy with the API.

I’m also very pleased to say that we were accepted into the StartupChile programme alongside RadicalRobot (my better half). In StrongSteam Kyran and I will get 6 months in Santiago with a $40k budget (for no equity!) to build our API and this opens the door to further travel. We’re also very happy to welcome Balthazar Rouberol (linkedin) to our team, he’ll be joining us remotely as an intern for 6 months.

Our biggest priority now is to get the alpha out there. If you’re curious to see what we’re doing please follow us via @strongsteamapi and join the mailing list on the strongsteam homepage.

We also have two surveys – the first is so you can tell us about your general AI interest, the second focuses on some of the points raised in the first to tell us more about your needs. We’d really appreciate your input here if you have 10 minutes to spare.


Ian applies Artificial Intelligence as an Artificial Intelligence Researcher for companies (Mor Consulting), co-founded the StrongSteam A.I. datamining toolkit, co-authored SocialTies, programs Python, writes The Screencasting Handbook and is also a sea-side dweller and consumer of fine coffees.

No Comments | Tags: ArtificialIntelligence, Programming, Python

31 August 2011 - 13:24strongsteam – an “AppStore for A.I. and data mining tools”

Kyran and I are starting work on a new project – strongsteam offers a web API with artificial intelligence and data mining tools. The goal is to make it easy for you to do things like:

  • get the text out of images using optical character recognition
  • determine whether two images look the same and if one object (e.g. a certain book or a can of coke) can be found in another
  • use natural language processing to analyse, cluster and compare text
  • extract text from audio (e.g. to pull out keywords from podcasts)
  • use machine learning on text to derive new data

If you’d like to join the closed alpha then visit strongsteam and add your email to the announce list on the homepage.

We’ve started with Python bindings which make it easy to talk to the strongsteam web service. Initially we’ll wrap open source tools that we’ve used along with lots of our own A.I. data mining tools from years of work in my Mor Consulting A.I. consultancy.

At EuroSciPy last week I demo’d using O.C.R. to extract the words from plant labels at Wakehurst Place gardens so you can lookup the plant on Wikipedia once you’ve taken a photo like this one:

Plant label for Ostrich Plume Fern at Wakehurst Place (Sussex)

Now we’re looking at applying O.C.R. to conference name-badges, this will be a bit of a mash-up from data used in our SocialTies conference app and Lanyrd.com‘s data. Next we’ll look at image matching and some text processing tools.


Ian applies Artificial Intelligence as an Artificial Intelligence Researcher for companies (Mor Consulting), co-founded the StrongSteam A.I. datamining toolkit, co-authored SocialTies, programs Python, writes The Screencasting Handbook and is also a sea-side dweller and consumer of fine coffees.

No Comments | Tags: ArtificialIntelligence, Entrepreneur, Python

25 July 2011 - 9:42High Performance Python tutorial v0.2 (from EuroPython 2011)

My updated High Performance Python tutorial is now available as a 55 page PDF. The goal is to take you on several journeys which show you different ways of making Python code run much faster (up to 75* on the CPU, faster with a GPU).

This is an update to the 49 page v0.1 I published three weeks ago after running the tutorial at EuroPython 2011 in Florence.

Topics covered:

  • Python profiling (cProfile, RunSnake, line_profiler) – find bottlenecks
  • PyPy – Python’s new Just In Time compiler, a note on the new numpy module
  • Cython – annotate your code and compile to C
  • numpy integration with Cython – fast numerical Python library wrapped by Cython
  • ShedSkin – automatic code annotation and conversion to C
  • numpy vectors – fast vector operations using numpy arrays
  • NumExpr on numpy vectors – automatic numpy compilation to multiple CPUs and vector units
  • multiprocessing – built-in module to use multiple CPUs
  • ParallelPython – run tasks on multiple computers
  • pyCUDA – run tasks on your Graphics Processing Unit
  • Other algorithmic choices and options you have

The improvement over the last version (v0.1) is that I’ve filled in all the sections now including pyCUDA (there are still a few IAN_TODOs marked, I hope to finish these in a future v0.3). I’ve also added a short section on Algorithmic Choices, link to the new Cython prange operator and show the new numpy module in PyPy.

The source code is on my github page. The original slides are on slideshare too. If you’re after a challenge then at the end of the report I suggest some ported versions of the code that I’d like to see.

The report is licensed Creative Commons by Attribution (please link back here) – I’ll also happily accept a beer if you meet me in person! If you’re curious about this sort of work then note that I offer A.I. and high performance computing consulting and training via my Mor Consulting.

Update – ShedSkin 0.9 adds faster complex number support. I haven’t added it to the report yet, evidence in the ShedSkin Group suggests it gets closer to the non-complex-number version (i.e. you don’t have to do more work but you get a nice speed boost whilst still using complex numbers).

Update (Nov 2011) – Antonio and Armin posted a note which explains some of the slowness in PyPy and show how it is competitive, under the right conditions. Armin also contributed a C version which shows PyPy to run as fast as C (for their chosen configuration).


Ian applies Artificial Intelligence as an Artificial Intelligence Researcher for companies (Mor Consulting), co-founded the StrongSteam A.I. datamining toolkit, co-authored SocialTies, programs Python, writes The Screencasting Handbook and is also a sea-side dweller and consumer of fine coffees.

10 Comments | Tags: ArtificialIntelligence, Python

14 March 2011 - 17:00“A.I. in the real world” 2011 lecture at Sussex Uni

I’ve just given my yearly lecture at Sussex Uni talking about Artificial Intelligence in the real-world. It details some of my exploits over the last 10 years. The presentation is below, I’ve also linked to some of the videos below.

For my real world examples I covered cars and a few other projects. Audi have their automated racing car which climbs Pikes Peak in 27 minutes (a human racer does it in 17 minutes). I haven’t managed to find a video of the race, there is this video taken earlier in the year.

More interestingly Google have an automated car that drives the streets (has video) – this is a real automated vehicle really driving on the real roads. This is darned impressive. Sooner or later we’ll have low-accident-rate automatic cars which are far cheaper to drive than human-power-cars, this’ll have big economic implications (but could be years off yet).

IBM’s Watson played Jeopardy recently (has video), it is pretty scary watching the machine beat humans at tricky general knowledge questions.

I also mentioned Word Lens (a real time OCR-based translation system using a phone camera) and I demo’d Google’s Voice Translator with some French to English on my Android-based Galaxy S.

Finally I linked to my own project – Social Ties does data mining and natural language processing to give you a mobile ‘people radar’ – it helps you find interesting people at events and places. We’re in alpha at present, sign up on the site if you’d like to join the beta.

At the end of the talk I spoke about some local events that will help people move forwards with A.I. company ideas, these are:


Ian applies Artificial Intelligence as an Artificial Intelligence Researcher for companies (Mor Consulting), co-founded the StrongSteam A.I. datamining toolkit, co-authored SocialTies, programs Python, writes The Screencasting Handbook and is also a sea-side dweller and consumer of fine coffees.

No Comments | Tags: ArtificialIntelligence, SussexUniversity

30 January 2011 - 19:49Review for Python Text Processing with NLTK 2.0 Cookbook (Packt, 2010)

Python Text Processing with NLTK 2.0 Cookbook (Amazon US, UK) is a cookbook for Python’s Natural Language Processing Toolkit. I’d suggest that this book is seen as a companion for O’Reilly’s Natural Language Processing with Python (available for free at nltk.org). The older O’Reilly book gives a lot of explanation for how to use NLTK’s component, Packt’s new book shows you lots of little recipes which build to larger projects giving you a great hands-on toolkit.

Overall the book is easy to read, has a huge set of sample recipes and feels very useful. I’ll be referring to it for our upcoming @socialties mobile app.

You’ll need to download NLTK, you can also refer to some sample articles at Packt’s site and get Chapter 3 as a free PDF (see below). The author is Jacob Perkins, his blog links to many related articles, he also has a nice ‘how it started‘ article.

Here are my thoughts on the book. Disclosure – I was sent a free copy of the book by Packt for review, the thoughts below are entirely my own.

Chapter 1: Tokenizing Text and WordNet Basics

If you haven’t tried tokenising text before you may not realise how complicated it can be (expressing even basic rules for English is jolly hard!). This chapter has a good overview of tokenisation and the excellent WordNet library. Filtering stopwords (low value words like ‘the’, ‘of’) and synsets approaches (synonym groups in WordNet) are also covered. The word similarity measure was new to me, the book certainly throws up nice nuggets.

Chapter 2: Replacing and Correcting Words

Stemming approaches are covered, the goal is to find common root words (e.g. “running”, “runs” and “run” can each have “run” as their stem) to simplify your input text. Synonym replacement (e.g. converting “bday” to “birthday”) and negating words using antonyms are nicely treated. Babelfish is provided through NLTK for translation and the PyEnchant spellchecker is introduced.

Chapter 3: Creating Custom Corpora (sample PDF chapter)

This chapter discusses MongoDB (a NoSQL document store) as a way to store your own corpora in NLTK’s format, it also introduces part of speech tagging. File locking using lockfile is mentioned in case you’re using multiple processes (discussed later).

Chapter 4: Part-of-Speech Tagging and Chapter 5: Extracting Chunks

I was less interested in this part, I’ve had to extract Named Entities before and there’s a nice discussion in Chapter 5.

Chapter 6: Transforming Chunks and Trees

The section on filtering out insignificant words using part of speech tags was interesting (i.e. using the Determiner tag DT to filter words like “a”, “all”, “an”, “that”, “that”). Cardinals (numbers) are discussed, I liked the recipe for swapping noun cardinal phrases so e.g. “Dec 10″ becomes “10 Dec” (whilst “10 Dec” doesn’t change).

Chapter 7: Text Classification

This feels like it will be useful – bag of words classification and the Naive Bayes Classifier are discussed (along with some other classifiers). Here the author starts to build a movie rating classifier. Precision and Recall are explained nicely. A high-information classifier is built, this is useful as we can then remove low-information words (those that aren’t biased to a single class in the classifier) which can improve classification results. Combining classifiers to further improve results is also covered.

Chapter 8: Distributed Processing and Handling Large Datasets

This chapter has promise – I wasn’t aware of the share-nothing distributed execution engine execnet. Redis is also used, Jacob builds towards a distributed word scoring engine which uses Redis as a single storage system. I’ve yet to use Redis but really want to hook it into our future @socialtiesapp, distributed processing will definitely be on the agenda too.

Chapter 9: Parsing Specific Data

This is a little gem, tucked at the end of the book. Ages ago I’d come across a date parsing module (which I then forgot about), having needed it recently I was super-happy to see dateutil discussed. It makes the parsing of different date formats incredibly easy and also handles timezones.

The timex module in NLTK is introduced (I’d never heard of it before) – it takes a fuzzy reference to a date or time and marks it up. An example would be “let’s go sometime <TIMEX2>this week</TIMEX2>”, you can then extract the fuzzy reference and decide how to interpret it in your application.

lxml, Beautiful Soup and chardet (another gem) are used to write a web page scraper.

Overall I recommend this book, if you have the original O’Reilly book (and you really ought to) then this makes for a great companion. I also spotted these two other reviews.


Ian applies Artificial Intelligence as an Artificial Intelligence Researcher for companies (Mor Consulting), co-founded the StrongSteam A.I. datamining toolkit, co-authored SocialTies, programs Python, writes The Screencasting Handbook and is also a sea-side dweller and consumer of fine coffees.

5 Comments | Tags: ArtificialIntelligence, Python