About

Ian Ozsvald picture

This is Ian Ozsvald's blog, I'm an entrepreneurial geek, a Data Science/ML/NLP/AI consultant, founder of the Annotate.io social media mining API, author of O'Reilly's High Performance Python book, co-organiser of PyDataLondon, co-founder of the SocialTies App, author of the A.I.Cookbook, author of The Screencasting Handbook, a Pythonista, co-founder of ShowMeDo and FivePoundApps and also a Londoner. Here's a little more about me.

High Performance Python book with O'Reilly View Ian Ozsvald's profile on LinkedIn Visit Ian Ozsvald's data science consulting business Protecting your bits. Open Rights Group

17 May 2010 - 21:06Extracting keyword text from screencasts with OCR

Last week I played with the Optical Character Recognition system tesseract applied to video data. The goal – extract keywords from the video frames so Google has useful text to index.

I chose to work with ShowMeDo‘s screencasts as many show programming in action – there’s great keyword information in these videos that can be exposed for Google to crawl. This builds on my recent OCR for plaques project.

I’ll blog in the future about the full system, this is a quick how-to if you want to try the system yourself.

First – get a video. I downloaded video 10370000.flv from Introducing numpy arrays (part 1 of 11).

Next – extract a frame. Using ffmpeg I extracted a frame at 240 seconds as a JPG:

ffmpeg -i 10370000.flv -y -f image2 -ss 240 -sameq -t 0.001  10370000_240.jpg

Tesseract needs TIF input files (not JPGs) so I used GIMP to convert to TIF.

Finally I applied tesseract to extract text:

tesseract 10370000_30.tif 10370000_30 -l eng

This yields:

than rstupr .
See Also
linspate : Evenly spaced numbers with  careful handling of endpoints.
grid: Arrays of evenly spared numbers  in Nrdxmensmns
grid: Grid—shaped arrays of evenly spaced numbers in  Nwiunensxnns
Examples
>>> np.arange(3)
¤rr¤y([¤. 1.  2])
>>> np4arange(3.B)
array([ B., 1., 2.])
>>>  np.arange(3,7)
array([3, A, S, 6])
>>> np.arange(3,7,?)
·=rr··¤y<[3.  5])
III
Ill

Obviously there’s some garbage in the above but there are also a lot of useful keywords!

To clean up the extraction I’ll be experimenting with:

  • Using the original AVI video rather than the FLV (which contains compression artefacts which reduce the visual quality), the FLV is also watermarked with ShowMeDo’s logo which hurts some images
  • Cleaning the image – perhaps applying some thresholding or highlighting to make the text stand out, possibly the green text is causing a problem in this image
  • Training tesseract to read the terminal fonts commonly found in ShowMeDo videos

I tried four images for this test, in all cases useful text was extracted. I suspect that by rejecting short words (less than four characters) and using words that appear at least twice in the video then I’ll have a clean set of useful keywords.

Update – the blog for the A.I. Cookbook is now active, more A.I. and robot updates will occur there.


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

1 Comment | Tags: ArtificialIntelligence, Life, Programming, Screencasting, ShowMeDo

10 May 2010 - 18:59“Artificial Intelligence in the Real World” lecture at Sussex University 2010

I’m chuffed to have delivered the second version of my “A.I. in the real world” lecture (I gave it last May too) to 2nd year undergraduates at Sussex University this afternoon.

The slides are below, I cover:

  • A.I. that I’ve seen and have been involved with in the last 10 years
  • Some project ideas for undergraduates
  • How to start a new tech business/project in A.I.

In the talk I also showed or talked about:

Artificial Intelligence in the Real World May 2010 Sussex University Guest Lecture

Here’s the YouTube video showing the Grand Challenge entries:

Update – the blog for the A.I. Cookbook is now active, more A.I. and robot updates will occur there.


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

No Comments | Tags: ArtificialIntelligence, projectbrightonblogs, Python, ShowMeDo, sussexdigital, SussexUniversity, £5 App Meet

31 July 2009 - 17:59The Micropreneur Academy

Rob Walling has been working to build the Micropreneur Academy – a focused site aimed at MicroISV’s who want to grow their businesses into high-value, fast-growth affairs (original May announce).

He contacted me a few months back to ask if I’d like to contribute an article on how screencasts can boost sales of software products, naturally I jumped at the chance!

I wasn’t sure what to expect when I first logged in.  Rob had sent me a login as I wanted to know what the community looked like, and what they’d need, before I wrote the article.  I was pleasently surprised to see a sort of mini-BusinessOfSoftware forum (i.e. friendly, helpful people who are working on cool stuff) backed by an awful lot of solid start-up knowledge written by Rob.

Knowing that I’d have to add an article of similar quality I spent time dissecting Rob’s articles.  Having worked in start-ups for 10 years and having founded 3 of my own, I had an idea about a lot of the content, but I kept finding nuggets of really useful material in Rob’s articles.

I was left wishing I’d had access to this when Kyran and I had founded ShowMeDo back in 2005!  I must also confess – I ended up using some of Rob’s ideas on market research to help plan my new eBook entitled The Screencasting Handbook.

Anyhow, I ought to cut this long story short.  I wrote the article and found a set of happy readers inside Rob’s forum.  I’m now also a mentor in his group because of my experience with on-line community building in ShowMeDo.

Rob’s about to open up the site to new membership, so if you have an interest in finding a closed group of people who are all building their own online start-ups, backed by lots of solid knowledge, do take a look at the Micropreneur Academy.

You can see two other write-ups by paying members for an idea of the value they see in the site.  Rob’s also very chatty, you can easily get in touch with him if you sign-up to ask any questions.

The following are a couple of the start-ups run by members of the academy (I don’t know any of these, but I did recommend another start-up who has happily joined!):


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

No Comments | Tags: Business Idea, Entrepreneur, ProCasts, Screencasting, ShowMeDo

30 June 2009 - 18:46New ShowMeDo series – OpenOffice Calc transition series for Excel users

I spent the weekend cracking through the creation of a new 12-part tutorial  series for ShowMeDo, the result is OpenOffice Calc 3.1 for Microsoft Excel users.  The overview is embedded below, links to all the episodes are further down.  This series was created whilst wearing my screencast production hat for ProCasts.

The goal is to take the hand of an Excel user and easily and quickly move them over to using Calc with the minimum of fuss.

Common topics like sharing .xls files, using .ods native files, printing, .csv files and PDF export are covered.  I also spend three episodes (just over 20 minutes) going through the creation of a sheet to model a mythical eBook’s sales covering formulas, formatting and charting.  Finally I finish with a look at some of Calc’s easter eggs including Star Wars (heck, I didn’t even know there were any games in OOo!).

Note that this series is a part of ShowMeDo’s Club so you need to be a paying member or contributing author to get access to all the episodes.  I’ve also updated the OpenOffice Learning Path which collates all of ShowMeDo’s OpenOffice videos.

  1. Series Overview in 3 minutes (OpenOffice 3.1 for Excel users) (Free)
  2. Installing OpenOffice 3.1 on Windows XP (Free)
  3. Working with .xls Sheets in Calc
  4. Saving as an ODF Spreadsheet (.ods) and PDF
  5. Where to get help
  6. Constructing a Sales-Forecast Sheet
  7. Formatting your Sheet
  8. Creating a Chart
  9. Printing your Sheet
  10. Import and Export CSV files
  11. Re-associating .xls files with Excel after OpenOffice installation
  12. Light relief – Star Wars Easter Egg and others

Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

No Comments | Tags: ProCasts, Screencasting, ShowMeDo

6 June 2009 - 15:47Starting work on a ‘How to Screencast’ eBook

Having learned an awful lot about screencasting in the last 4 years, I’m starting work in ProCasts on developing an eBook that teaches How to Screencast.  With over 3 years of tutorial screencasting in ShowMeDo and 2 years of commercial screencasting for marketing, tech-support and training that led to ProCasts, I feel rather well-qualified to write the book on the subject.

Right now the book is in the design stages, I’ve posted a survey in SurveyMonkey that asks 10 easy questions about your current screencasting knowledge and what you want to learn about.  If you want to learn more about screencasting – please fill in the survey.  It doesn’t require registration and if you put in your email address for the first question you’ll be in the draw for 2 free licenses (one for you, one for a friend).

If you don’t want to fill in the survey but you do want to be notified about the release, sign-up to our eBook notification list.

Topics that I’ll probably cover include:

  • Using screencasts for Sales, Training and Technical Support
  • Story-boarding to explain all your benefits and features
  • Writing a short, powerful script that really grabs the viewer’s attention
  • Techniques for converting your viewer into a user of your software
  • The best software packages to use when recording and editing a screencast
  • Editing competently so you have a short, polished video
  • Using annotations, fades and high-lights to focus the viewer’s attention on the key points
  • Improving your audio recording
  • Critiques of existing screencasts including ways of improving them
  • Check-lists for each step of the process so you can quickly complete each phase confidently

Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

No Comments | Tags: Screencasting, ShowMeDo

4 June 2009 - 11:38OpenOffice Writer tutorials at ShowMeDo

I’ve just finished a new series for OpenOffice users entitled OpenOffice 3.1 Writer for Microsoft Word users.  This series is a part of ShowMeDo’s Club, in 45 minutes I cover 11 topics for the new Writer user aimed at easing their transition across from MS Office:

  1. Series Overview in 4 minutes (OpenOffice 3.1 for Word users)
  2. Installing OpenOffice 3.1 on Windows XP
  3. Working with Word (.doc) files
  4. Working with an OpenDocument Format (.odt) file
  5. Help! Manuals, Forums and Mail lists
  6. Basic Formatting (bold, italic etc)
  7. Exporting to PDF, HTML, MediaWiki
  8. Printing
  9. Word-completion
  10. Find and Replace, Undo
  11. Spell Checker

We have plans to cover more OOo topics, right now I’m looking for feedback.  It feels a bit odd to have one OpenOffice series in amongst all the Python series but we need to start this new tutorial thread somehow!  I created this series whilst wearing my ProCasts screencast production hat.


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

No Comments | Tags: Screencasting, ShowMeDo

9 April 2009 - 18:28New Python tutorials at ShowMeDo using Learning Paths

The latest development at ShowMeDo is a new learning system called Learning Paths.  The Paths are ordered collections of videos and series where individual items are pulled together to make a journey (a ‘learning trajectory’) for the learner to achieve one particular goal.

The Path also allow for dependencies so ‘Fully worked Python Projects’ depends upon  ‘Beginning Python Programming’ and that depends upon ‘Setting up Python’.

At present we have an initial set of Paths and more will follow very soon:

The Paths mix our free and Club content, all authors have edit rights so everyone can add the right material to the Paths so they tell exactly the right story.

We’re very keen to see the Paths used, I’ve already blogged about this on the main ShowMeDo Blog.  If you like what we’re trying to achieve, perhaps you could help us to spread the word by blogging or tweeting?


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

1 Comment | Tags: Python, ShowMeDo

13 March 2009 - 13:48ShowMeDo server move + Python 3 videos

We’ve spent the last few weeks migrating ShowMeDo to its own server after 3 years operating out of a shared box.  Moving the site was a pain as I’m not a low-level Apache hacker but all in everything seems fine now and we have extra capacity to grow.

Kyran has skinned the blog so it fits with the overall theme.  The new Learning Paths feature is close to being released, this’ll really tie together all the learning resources in the site so visitors can get a threaded path through all the videos.

Kyran has explained some of the move and has configured ShowMeDo’s frontpage to show some of the posts, this is a really nice way to integrate the blog into the main site.  We also have a Hall of Fame now where all authors are ranked by a number of measures.

Two authors have added Python 3 videos, Gasto summarises some of the changes in 3.0 and chyld shows 3.0 in action in 2 videos on lists and del.icio.us.  These and all the other Python videos are here.


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

No Comments | Tags: Life, Python, ShowMeDo

17 December 2008 - 23:50Presenting at Sussex Uni Research Day

Last week I was kindly invited to speak at a Research Day for the Infomatics dept at Sussex Uni. by Inman Harvey and Anil Seth.  The plan was to talk about ‘life after my MSc [10 years back]‘ and to figure out if and how we could get more students involved with tech companies in Brighton.

Of the 30 attendees it was lovely to spot 10 old faces including my evolutionary-hardware thesis supervisor Adrian and old-drinking-buddy Andy who is now involved with the MSc programme.

The first part of the morning was spent looking through a set of research proposals from members of the dept. who were presenting short, low-cost projects.  The projects were vying for funding from a limited pot, in part the exercise was to present the wide range of research to everyone in the room.  Projects included:

  • “Using Motion Capture to learn geometric transformations” (using Animazoo‘s motion-capture systems)
  • “Music Interfaces for Mobiles” (involving novel iPhone development…I wish we’d had tech like iPhones back during my MSc)
  • “Further experiments in perceptual crossing”
  • “Optimal Computation meets Compiler Optimisation”
  • “Towards an earlier diagnosis of infants with, or at risk of, cerebral palsy” (most voted for – a clear contender for using the funds to prime the pump for a larger project)
  • “Network formations: from neural development to epidemiology, through ant foraging”
  • “Prototype interface for pilot studies of visual attention research around ‘design for attention’”

Three of the talks received full funding, two part funding and one required further work.  The range of the topics was wide, I was particularly interested to see iPhones making an entry and the visual tracking system applied to cerebral palsy detection was darned cool.

Later I got to speak on my past and the local industry.  I explained how I’d worked in MASA, started Mor Consulting and co-founded ShowMeDo and FivePoundApp and was heavily involved with the local tech scene.  Outlining the range of high-tech companies was fun (from NCsoft through FuturePlatforms and Madgex and out to Ambiental).

During discussion there and later in the pub it is clear that there is interest in encouraging links between the dept. and local high-tech companies.  A possible way forward might be to encourage companies to propose MSc and 3rd year projects (talk to me if you’re curious).

I’ll also be posting research news about the dept. here (getting news out) and posting some of our events into the Alergic mail list (sending news in).  It was great seeing some ex-MScs at the last FivePoundApp, hopefully we’ll see more students out at our events in 2009.


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

No Comments | Tags: Academic Stuff, Entrepreneur, projectbrightonblogs, ShowMeDo, sussexdigital, £5 App Meet

14 September 2008 - 14:51PyCon UK 2008 Write-up

We’re heading towards the end of the afternoon from a great weekend’s worth of PyCon, Right now Ted Leung is giving the key-note on Dynamic Languages.  JavaScript’s acceptance and speed improvements is being high-lighted, Ted seems to have some worries about JavaScript’s continual growth.

Yesterday we had Mark Shuttleworth‘s key-note on how he wanted to see Python give better support to Transactional Memory and Cloud Computing.Both talks interesting and inspiring. It felt like Mark really buys into the Python community with lots of ‘our language’ references, he’s a great speaker to boot.  He dug at our GIL and said ‘please solve scaling above 1 core!’.

Raymond Hettinger gave some great talks including a look at Python 2.6 and 3.0, behind-the-scenes Python containers (cool src link) and A.I. with Python.  Did you know that a list’s growth pattern is 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, … elements and for large lists you never waste more than 12.5% of memory?  Neato.

It looks like PyPy is coming up to a 1.0 release later this year.  James Gardner gave a good Birds of a Feather session on Pylons, this is relevant to me due to TurboGear’s use of Pylons – ShowMeDo is written in TurboGears.

Zeth tells me that last year there were about 150 attendees, I think we have over 200 this year, next year EuroPython meets PyConUK so we’ll have closer to 500.  I also met a bunch of regional Python usergroups including Python Ireland – there’s obvious growth in our userbase with a lot of smart people coding away.

At the end of last night’s dinner we had a talk on the Lunar Society, it was long (2 hours!) but told a great story.  The drinking, inevitably, went on late <ouch>.

I look forward to next year’s PyCon UK!  Pictures etc via the PyCon wiki which links to flickr.

As a side note it was great to talk to Pythonistas and hear that our ShowMeDo is fairly well known (given that we’ve never had a marketing budget, or time, or resources…) and well-respected.  I think I’ve recruited a few more authors along the way.

1 Comment | Tags: Life, Python, ShowMeDo