About

Ian Ozsvald picture

This is Ian Ozsvald's blog, I'm an entrepreneurial geek, a Data Science/ML/NLP/AI consultant, founder of the Annotate.io social media mining API, author of O'Reilly's High Performance Python book, co-organiser of PyDataLondon, co-founder of the SocialTies App, author of the A.I.Cookbook, author of The Screencasting Handbook, a Pythonista, co-founder of ShowMeDo and FivePoundApps and also a Londoner. Here's a little more about me.

High Performance Python book with O'Reilly View Ian Ozsvald's profile on LinkedIn Visit Ian Ozsvald's data science consulting business Protecting your bits. Open Rights Group

17 April 2013 - 11:38Visualising London, Brighton and the UK using Geo-Tweets

Recently I’ve been grabbing Tweets some some natural language processing analysis (in Python using NetworkX and NLTK) – see this PyCon and PyData conversation analysis. Using the London dataset (visualised in the PyData post) I wondered if the geo-tagged tweets would give a good-looking map of London. It turns out that it does:

london_all_r1_nomap

You can see the bright centre of London, the Thames is visible wiggling left-to-right through the centre. The black region to the left of the centre is Hyde Park. If you look around the edges you can even see the M25 motorway circling the city. This is about a week’s worth of geo-filtered Tweets from the Twitter 10% firehose. It is easier to locate using the following Stamen tiles:

london_all_r5

Can you see Canary Wharf and the O2 arena to its east? How about Heathrow to the west edge of the map? And the string of reservoirs heading north north east from Tottenham?

Here’s a zoom around Victoria and London Bridge, we see a lot of Tweets around the railway stations, Oxford Street and Soho. I’m curious about all the dots in the Thames – presumably people Tweeting about their pleasure trips?

centrallondon_r3_map

Here’s a zoom around the Shoreditch/Tech City area. I was surprised by the cluster of Tweets in the roundabout (Old Street tube station), there’s a cluster in Bonhill Street (where Google’s Campus is located – I work above there in Central Working). The cluster off of Old Street onto Rivington Street seems to be at the location of the new and fashionable outdoor eatery spot (with Burger Bear). Further to the east is a more pubby/restauranty area.

london_shoreditch_all

I’ve yet to analyse the content of these tweets (doing something like phrase extraction from the PyCon/PyData tweets onto this map would be great). As such I’m not sure what’s being discussed, probably a bunch of the banal along with chitchat between people (“I”m on my way”…). Hopefully some of it discusses the nearby environment.

I’m using Seth’s Python heatmap (inspired by his lovely visuals). In addition I’m using Stamen map tiles (via OpenStreetMap). I’m using curl to consume the Twitter firehose via a geo-defined area for London, saving the results to a JSON file which I consume later (shout if you’d like the code and I’ll put it in github) – here’s a tutorial.

During London Fashion Week I grabbed the tagged tweets (for “#lfw’ and those mentioning “london fashion week” in the London area), if you zoom on the official event map you’ll see that the primary Tweet locations correspond to the official venue sites.

lfw

What about Brighton? Down on the south coast (about 1 hour on the train south of London), it is where I’ve spent the last 10 years (before my recent move to London). You can see the coastline, also Sussex University’s campus (north east corner). Western Road (the thick line running west a little way back from the sea) is the main shopping street with plenty of bars.

brighton_gps_to0103_nomap

It’ll make more sense with the Stamen tiles, Brighton Marina (south east corner) is clear along with the small streets in the centre of Brighton:

brighton_gps_to0403_map

Zooming to the centre is very nice, the North Laines are obvious (to the north) and the pedestriansed area below (the “south laines”) is clear too. Further south we see the Brighton Pier reaching into the sea. To the north west on the edge of the map is another cluster inside Brighton Station:

brighton_gps_to0403_map_zoomed

Finally – what about all the geo-tagged Tweets for the UK (annoyingly I didn’t go far enough west to get Ireland)? I’m pleased to see that the entirety of the mainland is well defined, I’m guessing many of the tweets around the coastline are more from pretty visiting points.

uk_gps_to0404_map_r5_zoomed

How might this compare with a satellite photograph of the UK at night? Population centres are clearly visible but tourist spots are far less visible, the edge of the country is much less defined (via dailymail):

Europe satellite

I’m guessing we can use these Tweets for:

  • Understanding what people talk about in certain areas (e.g. Oxford Street at rush-hour?)
  • Learning why foursquare checkings (below) aren’t in the same place as tweet locations (can we filter locations away by using foursquare data?)
  • Seeing how people discuss the weather – is it correlated with local weather reports?
  • Learning if people talk about their environment (e.g. too many cars, poor London tube climate control, bad air, too noisy, shops and signs, events)
  • Seeing how shops, gigs and events are discussed – could we recommend places and events in real time based on their discussion?
  • Figuring out how people discuss landmarks and tourist spots – maybe this helps with recommending good spots to visit?
  • Looking at the trail people leave as they Tweet over time – can we figure out their commute and what they talk about before and after? Maybe this is a sort of survey process that happens using public data?

Here are some other geo-based visualisations I’ve recently seen:

If you want help with this sort of work then note that I run my own AI consultancy, analysing and visualising social media like Twitter is an active topic for me at present (and will be more so via my planned API at annotate.io).


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

13 Comments | Tags: ArtificialIntelligence, Data science, Entrepreneur, Life, Python

15 April 2013 - 14:03More Python 3.3 downloads than Python 2.7 for past 3 months

Since PyCon 2013 I’ve been in a set of conversations that start with “should I be using Python 3.3 for science work?”. Here’s a recent reddit thread on the subject. Last year I solidly recommended using Python 2.7 for scientific work (as many key libraries weren’t yet supported). I’m on the cusp of changing my recommendation.

Update there’s a nice thread on Reddit/r/python discussing what’s required and where the numbers are coming from.

I last looked at the rate of Python downloads via ShowMeDo during 2008 when Python 2.5 was the top dog. The Windows 2.5.1 installer was getting 500,000 downloads a month. In the last 3 months I’m pleasantly surprised to see that Python 3.3 for Windows is downloaded more each month than Python 2.7. We can see:

  • March 2013 Python 3.3 for Windows has 647k downloads vs Python 2.7 with 630k
  • February 2013 Python 3.3 for Windows has 553k downloads vs Python 2.7 with 498k
  • January 2013 Python 3.3 for Windows has 533k downloads vs Python 2.7 with 495k (Python 2.7 less popular since January 2013)
  • December 2012 Python 3.3 for Windows has 412k downloads vs Python 2.7 with 525k

These figures only tell a part of the story of course. For Windows you have to download Python. On Linux and Mac it comes pre-installed (so we can’t measure those numbers).

Python 2.7 has been the default on Ubuntu for a while, that’s changing with Ubuntu 13.04. There are two lists of Python-3 compatible packages, it seems that Django is on this list and at PyCon there was a how-to-port-to-py3 video (not Flask yet update Armin is tweeting for sprint help for Py3 support), SQLAlchemy is (but not MySQL-python), Fabric isn’t ready yet. For web-dev it seems to be a mixed bag but I’m guessing Python 3 support will be across the board this year.

For scientific use we already have Python-3 compatible numpy, scipy and matplotlib. scikit-learn is ‘nearly‘ ported, Pillow (the recent fork of PIL) is ready for Python 3. NLTK is also being ported.

For scientific use around natural language processing the switch to unicode-by-default looks most attractive (the mix of strings and unicode datatypes has burnt hours for me over the years in Python 2.x). Here’s a PyCon video on the use of Python 3 for text processing and this reviews why Python 3.3 is superior to Python 2.7.

It is slightly too early for me yet to want to switch but I’m starting to experiment. I’ve added some __future__ imports to new code so I know I’m writing Python 2.7 in a 3-like style. I’m also increasingly using Ned Batchelder’s coverage.py via nosetests to make sure I have good coverage. I currently run 2to3 to check that things convert cleanly to Python 3 but rarely run the result with Python 3 (I haven’t needed to do this yet). There’s a set of useful advice on python3porting including various __future__ imports (including division, print_function, unicode_literals, absolute_import).


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

28 Comments | Tags: Life, Python

2 April 2013 - 8:32Applied Parallel Computing (PyCon 2013 Tutorial) slides and code

Minesh B. Amin (MBASciences) and I (Mor Consulting Ltd) taught Applied Parallel Computing over 3 hours at PyCon 2013. PyCon this year was a heck of a lot of fun, I did the fun run (mentioned below), received one of the free 2500 RaspberryPis that were given away, met an awful lot of interesting people and ran two birds-of-a-feather sessions (parallel computing for our tutorial, another on natural language processing).

I held posting this entry until the video was ready (it came out yesterday). All the code and slides are in the github repo. Currently (but not indefinitely) there’s a VirtualBox image with everything (Redis, Disco etc) pre-installed.

After the conference, partly as a result of the BoF NLP session I created a Twitter graph “Concept Map” based on #pycon tweets, then another for #pydata. They neatly summarise many of the topics of conversation.

Here’s our room of 60+ students, slides and video are below:

Applied Parallel Computing PyCon 2013 (left side of room)

Applied Parallel Computing PyCon 2013 (left side)

The video runs for 2 hours 40:

Here’s a list of our slides:

  1. Intro to Parallelism (Minesh)
  2. Lessons Learned (Ian)
  3. List of Tasks with Mandelbrot set (Ian)
  4. Map/Reduce with Disco (Ian)
  5. Hyperparameter optimisation with grid and random search (Minesh)

These are each of the slide decks:

 

I also had fun in the 5k fun run (coming around 77th of 150 runners), we raised $7k or so for cancer research and the John Hunter Memorial Fund.


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

5 Comments | Tags: ArtificialIntelligence, Life, Python