Entrepreneurial Geekiness

Ian is a London-based independent Chief Data Scientist who coaches teams, teaches and creates data products. More about Ian here.
Entrepreneurial Geekiness
Ian is a London-based independent Chief Data Scientist who coaches teams, teaches and creates data products.
Coaching
Training
Jobs
Products
Consulting

strongsteam – an “AppStore for A.I. and data mining tools”

Kyran and I are starting work on a new project – strongsteam offers a web API with artificial intelligence and data mining tools. The goal is to make it easy for you to do things like:

  • get the text out of images using optical character recognition
  • determine whether two images look the same and if one object (e.g. a certain book or a can of coke) can be found in another
  • use natural language processing to analyse, cluster and compare text
  • extract text from audio (e.g. to pull out keywords from podcasts)
  • use machine learning on text to derive new data

If you’d like to join the closed alpha then visit strongsteam and add your email to the announce list on the homepage.

We’ve started with Python bindings which make it easy to talk to the strongsteam web service. Initially we’ll wrap open source tools that we’ve used along with lots of our own A.I. data mining tools from years of work in my Mor Consulting A.I. consultancy.

At EuroSciPy last week I demo’d using O.C.R. to extract the words from plant labels at Wakehurst Place gardens so you can lookup the plant on Wikipedia once you’ve taken a photo like this one:

Plant label for Ostrich Plume Fern at Wakehurst Place (Sussex)

Now we’re looking at applying O.C.R. to conference name-badges, this will be a bit of a mash-up from data used in our SocialTies conference app and Lanyrd.com‘s data. Next we’ll look at image matching and some text processing tools.


Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.
Read More

Closing The Screencasting Handbook’s email list

At the start of the year I published The Screencasting Handbook, my eBook on the art of screencasting gained whilst building ShowMeDo and ProCasts. Writing a 129 page book was an interesting challenge, I’m very happy with my early peer-review approach, the book retailed at $39 at first and later I dropped the price to $19 as some of the material has dated.

For the last 5 months I haven’t made any changes but I did keep the emailing list. Now I’m shutting the email lists as there’s little point re-mailing everyone to say that the book is done. Really I should have shut the lists months ago! The Handbook is still for sale of course (at $19USD) and will be for quite some time to come.

Putting this older project into ‘life support’ mode is rather cathartic, now I get to focus on fresh projects like SocialTies (now in iPhone and Android app stores) and our forthcoming StrongSteam ‘Artificial Intelligence web service’.


Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.
Read More

Social Ties now available for UK iPhones and any Android

This is just a quick post to say that we’ve released the iPhone build of Social Ties to the iTunes app store in the UK. Currently it only supports UK events so we’ve limited it to the UK AppStore, international events will follow later. The latest features include Bookmarking of people you’d like to meet and a Met button to mark the people you’ve already met. As noted today by a couple of users:

Yay now I can know who I should talk to at a conference and everything about them. Thanks @socialtiesapp http://bit.ly/rhka9U@juliancheal

Very impressed with the new @socialtiesapp for iPhone and Android. One to recommend to @briankelly I think! – @eventamplifier

Yay! @socialtiesapp is out on the iPhone and I’m unexpectedly famous! (see screenshots) – @bensummers

The Android BETA has been linked on our Social Ties homepage for a month, we’ll submit that to the Android AppStore once it is feature-complete to the iPhone build.

Followup updates on @socialtiesapp, you can follow us on @ianozsvald and @fluffyemily. If you want customised A.I. for your own project then talk to me, if you’d like mobile apps then talk to Emily.


Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.
Read More

Dell E6420 with Ubuntu 10.10 (Maverick Meerkat) 32 bit

Having hacked away with Natty Narwahl for a few weeks I’m regressing to the 10.10 distribution provided by Dell here. Installation took 20 minutes, it allowed me to use the previous ext4 partition (I had to edit it using the advanced configuration and set the ext4 partition’s mount point from blank to ‘/’). I formatted the partition too for good measure. I made sure to reload the package list (via Synaptics) and let it fetch updates.

Running ‘uname -a’ reports that this is 32 bit: “Linux ian-Latitude-E6420 2.6.35-30-generic-pae #54-Ubuntu SMP Tue Jun 7 20:28:33 UTC 2011 i686 GNU/Linux”

Next I followed the instructions here to get access to sound and the touchpad (on the fresh install the ‘pad worked but had no side-scroll, now it has side-scroll). I used my previous instructions to get the edgers version of the NVIDIA drivers (not the ones on the Dell site), Optimus was already disabled and the NVIDIA drivers ‘just worked’. I had to install the Dell sound driver but then it also ‘just worked’. Flash with sound seems to have worked out of the box too.

Wifi was a pain – the Dell links didn’t work but downloading this (in Synaptic – the pae version) via this for my Broadcom BCM 5800 (ID: 0a5c:5800) gave me wifi on a reboot. I’ve also upgraded Firefox 3 to 5 via this.

Suspend and hibernate seem to be stable (unlike before with the 11.04 install – it randomly got stuck and lost my desktop). Rather pleasingly although I was getting a gig of Dropbox over Wifi and compiling new sources the battery tool reported 6 hours of battery life (which seemed true-ish, maybe 4 hours would have  been right, though I did have the screen on darkest as it was very late in the night). This beats the max 2 hours I got before with 11.04.

Overall regressing to the 10.10 build from Dell seems to be the right move. Update two weeks later – using the Dell image is definitely the right thing to do, everything ‘just works’ like it is supposed to. I get 4-6 hours battery life using the NVIDIA graphics card as my primary display.

Update – I’ve uploaded a modified script that disables the touchpad for a fraction of a second when you’re typing. This is necessary as the ALPS touchpad identifies itself as a PS/2 mouse rather than a trackpad due to proprietary drivers. The script is in my github repo as Dell_E6420_Touchpad_AutoDisabler. It contains minor fixes from Philip Aston’s excellent version here.

Update (Nov 2011) – Having used 10.10 for 2 months I’ve got some problems that I’ll list.

  1. About 1 in 20 lid closes do not cause the suspend behaviour to start. The result is that the laptop stays ‘on’ with the lid shut. After an hour it tries to go into (I guess) hibernate, for some reason it gets stuck. Next it gets hot, the fans run on full and after a while it is cooking at 80 degrees in my laptop bag, merrily eating the battery. If I get it in time I can open the screen – the backlight is on but nothing responds and I have to force power-off (holding the power button for 5 seconds). If I don’t get it in time it just kills the battery. Upon a reboot it boots a fresh session and everything is fine, sans all the previous session info (this hasn’t yet led to corruption)
  2. About once a month the machine freezes during use. It has happened just after a clean boot (after logging in, before doing anything). It has happened after days of use and many suspends. The behaviour is a total system lock, the screen doesn’t update, no mouse etc. A force power off is required.
  3. The in-built camera normally works with Skype, sometimes it fails to start and a reboot is required. The picture is grainy and doesn’t cope with low lighting conditions (I haven’t tried this on Windows). Using an older Logitech QuickCam Pro 9000 I get a bright, clear picture even in low light conditions for Skype.
  4. Power usage with the NVIDIA card on (Optimus off), using VirtualBox, with wifi and a bright screen is about 3 hours.

It is hard to know if this is a hardware fault (the BIOS-based self diagnostics which run for 30 mins report no problems) or a software fault. I’m inclined to think it is 10.10 and/or the Dell changes. I’m planning on trying 11.10 next in the hope that the SandyBridge chipset is better supported.

My take-home message so far is that if the manufacturer doesn’t support your OS (Dell only partially support Ubuntu), don’t buy from them. I believe HP might have been a better purchase.


Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.
Read More

High Performance Python tutorial v0.2 (from EuroPython 2011)

My updated High Performance Python tutorial is now available as a 55 page PDF. The goal is to take you on several journeys which show you different ways of making Python code run much faster (up to 75* on the CPU, faster with a GPU).

UPDATE As of October 2014 I’ll be teaching High Performance Python and Data Science in London, sign-up here to join our announce list (no spam, just occasional notes about our upcoming courses).

UPDATE (2014) I’ve written the High Performance Python book with O’Reilly, it is publishing this year.

UPDATE 1 this talk is superseded by my High Performance Python 1 tutorial from PyCon 2012.

UPDATE 2 I’m thinking of writing an updated guide, if you’re interested in hearing about it please join the High Performance Python Mailing List (I’ve only got a list right now). I’ll make an announce once I know more.

UPDATE 3 (Sept 2012) I was missing the EuroPython video, I’ve embedded it down below.

This is an update to the 49 page v0.1 I published three weeks ago after running the tutorial at EuroPython 2011 in Florence.

Topics covered:

  • Python profiling (cProfile, RunSnake, line_profiler) – find bottlenecks
  • PyPy – Python’s new Just In Time compiler, a note on the new numpy module
  • Cython – annotate your code and compile to C
  • numpy integration with Cython – fast numerical Python library wrapped by Cython
  • ShedSkin – automatic code annotation and conversion to C
  • numpy vectors – fast vector operations using numpy arrays
  • NumExpr on numpy vectors – automatic numpy compilation to multiple CPUs and vector units
  • multiprocessing – built-in module to use multiple CPUs
  • ParallelPython – run tasks on multiple computers
  • pyCUDA – run tasks on your Graphics Processing Unit
  • Other algorithmic choices and options you have

The improvement over the last version (v0.1) is that I’ve filled in all the sections now including pyCUDA (there are still a few IAN_TODOs marked, I hope to finish these in a future v0.3). I’ve also added a short section on Algorithmic Choices, link to the new Cython prange operator and show the new numpy module in PyPy.

Here’s the video from EuroPython (via pyvideo.org):

The source code is on my github page. The original slides are on slideshare too. If you’re after a challenge then at the end of the report I suggest some ported versions of the code that I’d like to see.

The report is licensed Creative Commons by Attribution (please link back here) – I’ll also happily accept a beer if you meet me in person! If you’re curious about this sort of work then note that I offer A.I. and high performance computing consulting and training via my Mor Consulting.

Update – ShedSkin 0.9 adds faster complex number support. I haven’t added it to the report yet, evidence in the ShedSkin Group suggests it gets closer to the non-complex-number version (i.e. you don’t have to do more work but you get a nice speed boost whilst still using complex numbers).

Update (Nov 2011) – Antonio and Armin posted a note which explains some of the slowness in PyPy and show how it is competitive, under the right conditions. Armin also contributed a C version which shows PyPy to run as fast as C (for their chosen configuration).


Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.
Read More