All posts of Ian

Layers of “data science”?

The field of “data science” covers a lot of areas, it feels like there’s a continuum of layers that can be considered and lumping them all as “data science” is perhaps less helpful than it could be. Maybe by sharing my list you can help me with further insight. In terms of unlocking value in […]

Do self-driving cars make the courier redundant?

I’ll start with a quote via “Why workers are losing the war against the machines” taken from A Farewell to Alms by economist Gregory Clark: “There was a type of employee at the beginning of the Industrial Revolution whose job and livelihood largely vanished in the early twentieth century. This was the horse. The population […]

Map/Reduce (Disco) on millions of tweets

Whilst working on data sciencey problems for AdaptiveLab I’m becoming more involved in simple visualisations for proof-of-concepts for clients. This ties in nicely with my PyCon Parallel Computing tutorial with Minesh. I’ve been prototyping a Disco map/reduce tutorial (part 2 for PyCon) using tweets collected during the life of SocialTies during 2011-2012. Using 11,645,331 tweets […]

Office social graph connectivity using NetworkX

I wanted an excuse to play with the Python NetworkX graph visualisation library and recently I joined AdaptiveLab to consult on some data science & visualisation problems. Thus formed the question – how were we all connected together? I figured that looking at who follows us all will yield a little insight into the people […]

Testing 3 modern face detection libraries (face.com, openCV, libccv)

As a research project months back Balthazar and I tested 3 modern face detection libraries (definitely see Balthazar’s write-up). Face.com had just been acquired by facebook, they had a great and free service which annotated not just face locations but also sex, age and emotion. We also tested OpenCV (popular and free) and the lesser […]

StartupChile (Round 2.1) all finished, thoughts

The odd thing is that I’ve been trying to write this post for 3 months. Having started and stopped several times (including during the flight back from Chile on Oct 15th) I figure I ought to put something out. The journey was, it turns out, somewhat of a roller-coaster ride. Early in January Kyran Dale […]

aMaking “from lxml import etree” work with virtualenv (Python)

Update – these steps are overly complicated and *unnecessary*! See fizyk and Marius’ comments below. I’ll leave this post just in case it helps anyone – hopefully anyone coming here will realise it isn’t hard (now) to install lxml, as long as the OS dependencies are installed I use virtualenv for all development. Recently I […]

EuroSciPy Parallel Python tutorial now online

I taught Parallel Python at EuroSciPy 2012 last week in Brussels, I’ve uploaded all the necessary stuff. In the talk we covered: multiprocessing (built in) parallelpython (an easy shift from multiprocessing to do mult-machine and -core processing) gearman (cross-platform job server for heterogeneous job processing) picloud.com (was python only, now any infrastructure cloud-based processing using […]

EuroSciPy2012 Parallel Python tutorial requirements now online

My EuroSciPy 2012 Parallel Python tutorial requirements are online in this github repo. If you’re coming to my tutorial next Thursday please make sure everything is installed beforehand. The repo includes the slides (not quite yet finished) and a ‘solutions/’ directory which you shouldn’t peek at (that’s there in case we run behind in the […]

Kinect depth maps and Python

I had the opportunity to play with a Kinect over the weekend, I wanted to test out depth mapping using the built in infra red cameras. Using a structured light approach is different to the stereopsis approach I was looking at with Kyran recently. Using the open source drivers for Ubuntu I quickly got the […]