About

Ian Ozsvald picture

This is Ian Ozsvald's blog, I'm an entrepreneurial geek, a Data Science/ML/NLP/AI consultant, founder of the Annotate.io social media mining API, author of O'Reilly's High Performance Python book, co-organiser of PyDataLondon, co-founder of the SocialTies App, author of the A.I.Cookbook, author of The Screencasting Handbook, a Pythonista, co-founder of ShowMeDo and FivePoundApps and also a Londoner. Here's a little more about me.

High Performance Python book with O'Reilly View Ian Ozsvald's profile on LinkedIn Visit Ian Ozsvald's data science consulting business Protecting your bits. Open Rights Group

25 November 2012 - 22:39Testing 3 modern face detection libraries (face.com, openCV, libccv)

As a research project months back Balthazar and I tested 3 modern face detection libraries (definitely see Balthazar’s write-up). Face.com had just been acquired by facebook, they had a great and free service which annotated not just face locations but also sex, age and emotion. We also tested OpenCV (popular and free) and the lesser known libccv.

Previously I’d used openCV to build a face tracking robot head in Python and we figured a review of what’s easily available might be fun:

Balthazar ran the face detection process with face.com and OpenCV, I added libccv. We used 200 images kindly provided by Rosario Rascuna (@_sarhus), collected from Instagram and annotated by us. We listed 150 images with faces and 50 without to test how often faces are correctly detected and whether faces are seen where they shouldn’t be.

We did not test the locations of the face, just the absolute count per image. This means that a face could be incorrectly spotted in an image whilst the true face was missed – our scoring system would still say ‘1 was expected and 1 was found so that is correct’. Manual inspection suggested that this is a minor problem (though if I ran the experiment again I’d take the time to hand-annotate every face’s location and check that faces were detected in the right place).

OpenCV provides a set of pre-trained data files (as xml with names like alt_tree_cascade), we tested them individually and then combined all their detections into an uber-detector. The goal for OpenCV was just to see how well it might do without fine tuning.

For OpenCV we used v2.3, for libccv we used v0.1.

I’ll be posting some of the code that we used along with the dataset, I’ll update a link here when I’ve done that.

Results:

  • face.com found 144 of the 150 images with faces with 0 false positives (i.e. it didn’t say once that an image without a face had a face)
  • OpenCV found 93 images with faces of the 150 images and an additional 4 that were false positives
  • libccv found 99 images with faces of the 150 images and an additional 6 that were false positives

The short story is that the open source tools are ‘pretty good’ but face.com was better (and is now unavailable). Since this piece of work Stephen’s LambdaLabs offers a RESTful face detection (and recognition) API, I’ve not evaluated it.

There’s clearly room for a web based service in this area, training it with feedback would be a nice feature. Adding face recognition (as LambdaLabs has, but OpenCV/libccv doesn’t) is an obvious bonus. I’ve seen face detection used for:

  • cropping uploaded faces in web profile pictures
  • filtering non-face photos from photo albums
  • filtering face photos from restaurant review sites

I suspect we’ll see more computer vision APIs that make it easier to annotate images (much the reason why I’ve registered this skeleton site for annotate.io), given the rise in photos on sites like Instagram (and flickr before).


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

No Comments | Tags: ArtificialIntelligence, Life

25 November 2012 - 19:50StartupChile (Round 2.1) all finished, thoughts

The odd thing is that I’ve been trying to write this post for 3 months. Having started and stopped several times (including during the flight back from Chile on Oct 15th) I figure I ought to put something out. The journey was, it turns out, somewhat of a roller-coaster ride.

Early in January Kyran Dale and I flew to Santiago for Round 2.1 of StartupChile to build StrongSteam, a cloud-based computer vision API. Emily (my fiancée) also won funding and came out to build TinyEars. Sadly StrongSteam didn’t make it (my co-founder and I went in different directions, it was easier to end the project).

The goal of the StartupChile project is to bring working entrepreneurs in from around the world to teach Chileans how to build start-ups. Teaching includes running events, building partnerships, explaining lessons-learnt in prior experiences and explaining that failure/experimentation is a part of the process. In return we stay for 6 months, get a $40k reimbursement package (90% of our expenses up to $40k USD are reimbursed via a slightly torturous bureaucratic process) and are free to leave at the end. We never have to register our business our there, give up shares or pay tax on foreign earnings.

During the last 8 months I:

  • ran a pair of Python programming courses (material open-sourced)
  • started private self-mentorship groups (now an official part of the StartupChile programme)
  • built a novel AI backend with Kyran for using Optical Character Recognition to replace the need for QR codes (which is now OpenPlants)
  • won ‘best choice for investment‘ on the Jason Calacanis show This Week in Startups (ha!)
  • played with Kinects and Python for rock-sizing with computer vision for the Chilean mining industry
  • organised some data meetups
  • spoke on agile lessons-learnt
  • presented to VCs and Angel groups (and got offered $500k investment lumps in both San Francisco and Chile)
  • received acquisition offers from companies in San Francisco and Chile
  • presented at conferences like PyCon and got mentions in places like the BBC
  • wrote up demo day meets
  • finished the programme by moving with Emily to San Francisco for 2 months to continue our networking

The main upsides of the programme are:

  • time to build your idea without the need to work/consult to pay the bills (your living expenses are covered)
  • lovely group of proactive people to meet from both around the world and locally
  • supportive (if overworked) staff members who do their best to help
  • lovely people in Chile in general (warm, friendly, interested, those building companies are particularly open and friendly)
  • increasing recognition in the investment/startup community which opens doors (e.g. The Economist and others covered it recently) – a few months ago StartupChile held its first Demo Day in San Francisco to ease fund-raising
  • easy access to North America if you’re coming in from outside the US (I used it as a springboard in our final two months to head to San Francisco to continue the networking)
  • you’re encouraged to travel within Chile to teach other groups, you also have easy access to places like Argentina and Uruguay if you fancy traveling (we certainly did) and can justify it as work-related
  • other related spaces like the Santiago Hackerspace and new co-work venues are popping up

The main goal of the programme definitely seems to be working for the Chileans. In our time in Chile we saw many Chileans step forwards with either young working companies or ideas (some high-tech, many not), who then got on with building, partnering and growing their businesses. The company registration process is being massively simplified, failure is becoming more acceptable (generally it is not socially acceptable to fail – much the case in the UK only 20 years ago – and thankfully that attitude is changing in Chile).

More Chileans are traveling around the world, more doors are being opened in cities like San Francisco and more money, connections and opportunities are flowing back into Chile. Being part of a government’s experiment to change their citizens’ attitude to risk (and seeing it work) has been a very rewarding experience.

On a personal level I’ve also made some lovely contacts – people I’d work with who I consider friends who I’d never have met otherwise. I suspect that the “StartupChile Mafia” (ex-StartupChile folk) will open doors for all of us in the programme in the future too. I’ve met a few ex-StartupChile folk here in London (one by accident in the pub last week – hi Michael!) and I’m wondering if we can run a Mafia meetup before Christmas.

There are several downsides to Chile which should be considered by future applicants:

  • there’s a reason we’re paid to be entrepreneurs in Chile – the ecosystem is lacking certain things and maybe you’d not setup shop there otherwise. Make sure your eyes are open to the very young/conservative investment scene, the small tech community and the conservative nature of businesses (bureaucracy and caution->long time to get things done)
  • things that worked elsewhere in the world a few years ago will probably be successful now in Chile (e.g. people building online food services and education sites were doing well, persons trying to offer novel AI/data applications and things requiring iPads had a, well, harder time of it) so don’t assume your cutting edge idea from California will move quickly in Chile
  • the air in winter is polluted and horrid (bad news if e.g. you have asthma) but lovely in summer
  • the programme’s goals are focused on making Chile successful (and not you, per se, but that’s a nice side-effect for StartupChile if it occurs)
  • most people only speak the Chilean-variant of Spanish called Chileno (StartupChile participants and staff all speak some level of English) – this can make buying things in the street a bit of a challenge – try to learn some Spanish before you come
  • there was little explanation about the interests & needs of companies within Chile – for example it took me months to learn just how large and hungry the mining industry is for innovative solutions (and it is a rich industry)

I spoke with Mitch Altman (a founder of the San Franciscan hackerspace Noisebridge) recently and, paraphrased, he pointed out that in most places in the world (he travels a lot to promote hackerspaces) if you open the door to encourage experiments, accept failure and encourage small business and knowledge sharing then It Just Tends To Happen. I suspect that this model can be applied around the world, without big Government funding, and I expect to see many more countries try this bottom-up approach of bringing entrepreneurs in (rather than building expensive ‘innovation clusters’ that rarely seem to perform).

There are other positive and negative write-ups about the programme including Emily‘s, Liis Peetermanns‘s, another, Nathan Lustig, Maptia (lovely British team!). My posts here are under the startup-chile tag.

If you’re interested in building your business in South America then this is the go-to programme. If you need 6 months time in an interesting country with an increasing investor scene, this is not a bad choice. If you want mentorship and hands-on help or you want to deal with the large corporates that you might find in London, New York or Frankfurt then Chile hasn’t proven itself here yet (though it may, given time). What’s impressed me most about the programme is the way it keeps on improving – keep an eye on it, definitely consider it! Seek a wide set of opinions if you want to apply, lots of people experience the programme differently.

Emily and I have discussed what we’d like to see in future StartupChile-like programmes (I suspect we’ll see more, with further innovation, as Governments wake up to the positive change that can occur):

  • invite academics and industrialists to a country to work on a specific problem for a fixed time period without heavy-handed IP controls but funded like StartupChile – this could be a wonderful way to foster innovation and collaboration and to build new IP that could be exploited (perhaps with a share in the IP being owned by all in these projects)
  • setup targets for sector improvement in a country – e.g. in Chile perhaps choose to make mining more energy efficient – then invite companies to come with industrial doors opened and primed for collaboration (so many StartupChile companies could have formed local partnerships if only doors had been opened so the incumbents knew we were coming!)
  • list the problems that entrepreneurs could solve and make it public – actively seek entrepreneurs to visit to try to fix things (e.g. in Chile the winter pollution must be fixable, education is super-expensive [which led to student protests] and surely can be improved, the mining industry suffers from growing energy and mine-discovery costs)
  • encourage an alumni group so past members can easily help future members (something that’s been long discussed in StartupChile but seems to be low on the agenda)
  • work harder to jump language & cultural barriers – in Chile we were told everyone on the programme would speak English but the locals notably didn’t so the very people we were trying to help were hard to communicate with – add language & cultural lessons to a programme to ease the transition for both sides

As of now I’m back to my AI consulting for natural language processing (working with the lovely team at AdaptiveLab in Shoreditch), tinkering on the side with industrial needs learned via StrongSteam in annotate.io. If you’re ex-StartupChile and you’d be interested in meeting in London, drop me a line.


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

1 Comment | Tags: ArtificialIntelligence, Business Idea, Entrepreneur, Life, StartupChile

1 November 2012 - 20:00aMaking “from lxml import etree” work with virtualenv (Python)

Update – these steps are overly complicated and *unnecessary*! See fizyk and Marius’ comments below. I’ll leave this post just in case it helps anyone – hopefully anyone coming here will realise it isn’t hard (now) to install lxml, as long as the OS dependencies are installed

I use virtualenv for all development. Recently I was stumped with the need for the lxml module – installing it using virtualenv on Linux requires a bit of work.

Let’s see the problem first:

$ virtualenv testlibxml
 New python executable in testlibxml/bin/python
 Installing distribute.............................................................................................................................................................................................done.
 Installing pip...............done.
.../virtualenvs/testlibxml $ source bin/activate
$ pip install lxml
gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/home/ian/workspace/virtualenvs/testlibxml/build/lxml/src/lxml/includes -I/usr/include/python2.7 -c src/lxml/lxml.etree.c -o build/temp.linux-x86_64-2.7/src/lxml/lxml.etree.o
In file included from src/lxml/lxml.etree.c:254:0:
/home/ian/workspace/virtualenvs/testlibxml/build/lxml/src/lxml/includes/etree_defs.h:9:31: fatal error: libxml/xmlversion.h: No such file or directory
compilation terminated.
error: command 'gcc' failed with exit status 1

Following these instructions and noting to follow the instructions for *both* libxml2 and libxml (further below) I run (using this change for my local path):

./configure --with-python=/home/ian/workspace/virtualenvs/testlibxml/bin/python

And now we can start python and import libxml2

(testlibxml)ian@ian-Latitude-E6420 ~/workspace/virtualenvs/testlibxml $ python
 Python 2.7.3 (default, Aug  1 2012, 05:14:39)
 [GCC 4.6.3] on linux2
 Type "help", "copyright", "credits" or "license" for more information.
 >>> import libxml2 # works

Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

3 Comments | Tags: Life, Python