Entrepreneurial Geekiness

Ian is a London-based independent Chief Data Scientist who coaches teams, teaches and creates data products. More about Ian here.
Entrepreneurial Geekiness
Ian is a London-based independent Chief Data Scientist who coaches teams, teaches and creates data products.
Coaching
Training
Jobs
Products
Consulting

“A.I. in the real world” 2011 lecture at Sussex Uni

I’ve just given my yearly lecture at Sussex Uni talking about Artificial Intelligence in the real-world. It details some of my exploits over the last 10 years. The presentation is below, I’ve also linked to some of the videos below.

For my real world examples I covered cars and a few other projects. Audi have their automated racing car which climbs Pikes Peak in 27 minutes (a human racer does it in 17 minutes). I haven’t managed to find a video of the race, there is this video taken earlier in the year.

More interestingly Google have an automated car that drives the streets (has video) – this is a real automated vehicle really driving on the real roads. This is darned impressive. Sooner or later we’ll have low-accident-rate automatic cars which are far cheaper to drive than human-power-cars, this’ll have big economic implications (but could be years off yet).

IBM’s Watson played Jeopardy recently (has video), it is pretty scary watching the machine beat humans at tricky general knowledge questions.

I also mentioned Word Lens (a real time OCR-based translation system using a phone camera) and I demo’d Google’s Voice Translator with some French to English on my Android-based Galaxy S.

Finally I linked to my own project – Social Ties does data mining and natural language processing to give you a mobile ‘people radar’ – it helps you find interesting people at events and places. We’re in alpha at present, sign up on the site if you’d like to join the beta.

At the end of the talk I spoke about some local events that will help people move forwards with A.I. company ideas, these are:


Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.
Read More

Review for Python Text Processing with NLTK 2.0 Cookbook (Packt, 2010)

Python Text Processing with NLTK 2.0 Cookbook (Amazon US, UK) is a cookbook for Python’s Natural Language Processing Toolkit. I’d suggest that this book is seen as a companion for O’Reilly’s Natural Language Processing with Python (available for free at nltk.org). The older O’Reilly book gives a lot of explanation for how to use NLTK’s component, Packt’s new book shows you lots of little recipes which build to larger projects giving you a great hands-on toolkit.

Overall the book is easy to read, has a huge set of sample recipes and feels very useful. I’ll be referring to it for our upcoming @socialties mobile app.

You’ll need to download NLTK, you can also refer to some sample articles at Packt’s site and get Chapter 3 as a free PDF (see below). The author is Jacob Perkins, his blog links to many related articles, he also has a nice ‘how it started‘ article.

Here are my thoughts on the book. Disclosure – I was sent a free copy of the book by Packt for review, the thoughts below are entirely my own.

Chapter 1: Tokenizing Text and WordNet Basics

If you haven’t tried tokenising text before you may not realise how complicated it can be (expressing even basic rules for English is jolly hard!). This chapter has a good overview of tokenisation and the excellent WordNet library. Filtering stopwords (low value words like ‘the’, ‘of’) and synsets approaches (synonym groups in WordNet) are also covered. The word similarity measure was new to me, the book certainly throws up nice nuggets.

Chapter 2: Replacing and Correcting Words

Stemming approaches are covered, the goal is to find common root words (e.g. “running”, “runs” and “run” can each have “run” as their stem) to simplify your input text. Synonym replacement (e.g. converting “bday” to “birthday”) and negating words using antonyms are nicely treated. Babelfish is provided through NLTK for translation and the PyEnchant spellchecker is introduced.

Chapter 3: Creating Custom Corpora (sample PDF chapter)

This chapter discusses MongoDB (a NoSQL document store) as a way to store your own corpora in NLTK’s format, it also introduces part of speech tagging. File locking using lockfile is mentioned in case you’re using multiple processes (discussed later).

Chapter 4: Part-of-Speech Tagging and Chapter 5: Extracting Chunks

I was less interested in this part, I’ve had to extract Named Entities before and there’s a nice discussion in Chapter 5.

Chapter 6: Transforming Chunks and Trees

The section on filtering out insignificant words using part of speech tags was interesting (i.e. using the Determiner tag DT to filter words like “a”, “all”, “an”, “that”, “that”). Cardinals (numbers) are discussed, I liked the recipe for swapping noun cardinal phrases so e.g. “Dec 10” becomes “10 Dec” (whilst “10 Dec” doesn’t change).

Chapter 7: Text Classification

This feels like it will be useful – bag of words classification and the Naive Bayes Classifier are discussed (along with some other classifiers). Here the author starts to build a movie rating classifier. Precision and Recall are explained nicely. A high-information classifier is built, this is useful as we can then remove low-information words (those that aren’t biased to a single class in the classifier) which can improve classification results. Combining classifiers to further improve results is also covered.

Chapter 8: Distributed Processing and Handling Large Datasets

This chapter has promise – I wasn’t aware of the share-nothing distributed execution engine execnet. Redis is also used, Jacob builds towards a distributed word scoring engine which uses Redis as a single storage system. I’ve yet to use Redis but really want to hook it into our future @socialtiesapp, distributed processing will definitely be on the agenda too.

Chapter 9: Parsing Specific Data

This is a little gem, tucked at the end of the book. Ages ago I’d come across a date parsing module (which I then forgot about), having needed it recently I was super-happy to see dateutil discussed. It makes the parsing of different date formats incredibly easy and also handles timezones.

The timex module in NLTK is introduced (I’d never heard of it before) – it takes a fuzzy reference to a date or time and marks it up. An example would be “let’s go sometime <TIMEX2>this week</TIMEX2>”, you can then extract the fuzzy reference and decide how to interpret it in your application.

lxml, Beautiful Soup and chardet (another gem) are used to write a web page scraper.

Overall I recommend this book, if you have the original O’Reilly book (and you really ought to) then this makes for a great companion. I also spotted these two other reviews.


Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.
Read More

And on with 2011

Earlier this year I switched back to my long-running artificial intelligence and CUDA high performance computing consultancy work through Mor Consulting. Having sold ProCasts earlier in the year and moved entirely away from screencasting (leaving ShowMeDo in Kyran’s capable hands) I wanted to get back to the nitty gritty of low level algorithms and implementations.

My A.I.Cookbook project is coming along, albeit very slowly over the last few months. I’m looking forward to starting a few new projects in the Cookbook, the OCR project against the OpenPlaque images needs finishing first (I see more prizes ahead there…).

Emily and I have some joint A.I./mobile projects to publish this year and I’m always on the look-out for interesting parallel, high performance computing and artificial intelligence problems – if you have a problem in this area please do get in touch.

Lee Tucknott did the design for Mor Consulting and has provided a design for the A.I.Cookbook (which I’ve yet to get implemented), if you need a beautiful design do go see his work.

And now, to push on with Social Ties, our first A.I./mobile product for January which’ll help you find interesting people at events…


Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.
Read More

Packt’s new “Python Text Processing with NLTK 2.0 Cookbook”

I’m rather excited to have received a review copy of Packt’s new NLP book “Python Text Processing with NLTK 2.0 Cookbook“, it is based around Python’s Natural Language Processing toolkit.

I’ve been using the O’Reilly book for over a year, I’m curious to see what’s different between the two. I’ll post a full review once I’ve been through it.


Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.
Read More

£5 App #23 – “Things we built this summer”

Last Tuesday we had our 23rd £5 App event, given that it is only our second event this year we chose to let people “show and tell” about the things they built this summer. We had 9 speakers, I bought the beer, John baked the cakes.

Shardcore and the Englightenment Machine

Shardcore‘s Enlightenment Machine was installed at the WhiteNight festival a week back, here he explains what’s going on:

£5 App #23 – Shardcore and the Enlightenment Machine from Ian Ozsvald on Vimeo.

Jon and Digestly

Jon‘s Digestly lets you summarise tweets which can then be sent by email to e.g. your mum who wants to hear more about you:

£5 App #23 – Jon and Digestly from Ian Ozsvald on Vimeo.

Ian (me!) and the Social Microprinter

My Social Microprinter is a CBM 231 receipt printer + Arduino + WiShield + remote server, it prints tweets and useful info using a regular shop’s receipt printer via serial:

£5 App #23 – Ian and the Social Microprinter from Ian Ozsvald on Vimeo.

John and the Arduino Doorbell

John’s Arduino-powered door-bell couples a regular remote-control doorbell with lego, wood and a big bell:

£5 App #23 – John and the Arduino Doorbell from Ian Ozsvald on Vimeo.

Seb and Geek Family Fortunes

Seb built a Family Fortunes clone recently (we played it at BarCamp Brighton) using Flash, Nun-chucks and an iPad:

£5 App #23 – Seb and Geek Family Fortunes from Ian Ozsvald on Vimeo.

Emily and SocialTies on the iPhone

Emily is working on an iPhone app with me that we’ve named SocialTies, it helps you find your friends and ‘similar people’ when you’re at an event or conference. It was inspired by the fruitless hours I’ve spent at events wondering if I’ll ever find anyone I know…

£5 App #23 – Emily and SocialTies from Ian Ozsvald on Vimeo.

Kyran and JavaScript Social Graph Visualisations

Kyran and I have been working on some social graph visualisations, Kyran’s interface lets you see where you sit in an event’s social network whilst reading real-time updates from attendees:

£5 App #23 – Kyran and JavaScript Social Graph Visualisations from Ian Ozsvald on Vimeo.

Mike and the Tardis Money Bank

Mike’s Tardis Money Bank was designed to help him and son keep tabs on pocket money. It has gone on to be used by many families since its launch:

£5 App #23 – Mike and the Tardis Bank from Ian Ozsvald on Vimeo.

Jay and Twitter Election Predictions

Jay’s real-time election results predictor read Twitter during the UK elections, the results were interestingly accurate:

£5 App #23 – Jay and using Twitter to Predict Elections from Ian Ozsvald on Vimeo.

If you’re interested in keeping tabs on future events or would like to speak please join our £5 App Google Group.


Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.
Read More