Archives of #Nltk

Social Media Brand Disambiguator first steps

As noted a few days back I’m spending June working on a social-media focused brand disambiguator using Python, NLTK and scikit-learn. This project has grown out of frustrations using existing Named Entity Recognition tools (like OpenCalais and DBPediaSpotlight) to recognise brands in social media messages. These tools are generally trained to work on long-form clean […]

June project: Disambiguating “brands” in Social Media

Having returned from Chile last year, settled in to consulting in London, got married and now on honeymoon I’m planning on a change for June. I’m taking the month off from clients to work on my own project, an open sourced brand disambiguator for social media. As an example this will detect that the following […]

More Python 3.3 downloads than Python 2.7 for past 3 months

Since PyCon 2013 I’ve been in a set of conversations that start with “should I be using Python 3.3 for science work?”. Here’s a recent reddit thread on the subject. Last year I solidly recommended using Python 2.7 for scientific work (as many key libraries weren’t yet supported). I’m on the cusp of changing my […]

Analysing #pydata, London and Brighton tweets for concept mapping

Below I’ve visualised tweets for #PyData conference and the cities of London and Brighton – this builds on my ‘concept cloud‘ from a few days ago at the #PyCon conference. Props to Maksim for his Social Media Analysis tutorial for inspiration. Update – Maksim’s Analying Social Networks tutorial video is online. For the earlier #PyCon […]

Semantic map of PyCon2013 Twitter Topics

Maksim taught a lovely Social Graph Analytics course at PyCon the day before I taught Applied Parallel Computing. I took his demo for a “poor mans LDA/LSI analysis” of a Twitter topic (rather than using full LDA it just uses co-incident hashtags) and added usernames to produce the plot below. Update – Analysing #pydata conference […]

Review for Python Text Processing with NLTK 2.0 Cookbook (Packt, 2010)

Python Text Processing with NLTK 2.0 Cookbook (Amazon US, UK) is a cookbook for Python’s Natural Language Processing Toolkit. I’d suggest that this book is seen as a companion for O’Reilly’s Natural Language Processing with Python (available for free at The older O’Reilly book gives a lot of explanation for how to use NLTK’s […]