Blog

Entrepreneurial Geekiness

Visualising True Positives and False Positives against Features with scikit-learn

Here I’m starting to look into the errors caused in the social media brand disambiguator project. Below I look at true and false positives (correct and mistaken is-a-brand classifications) and plot them against the number of features that two different classifiers can use to calculate their class membership probabilities. First I’m using the default LogisticRegression […]

Demonstrating the first Brand Disambiguator (a hacky, crappy classifier that does something useful)

Last week I had the pleasure of talking at both BrightonPython and DataScienceLondon to about 150 people in total (Robin East wrote-up the DataScience night). The updated code is in github. The goal is to disambiguate the word-sense of a token (e.g. “Apple”) in a tweet as being either the-brand-I-care-about (in this case – Apple […]

Open Sourcing “The Screencasting Handbook”

Back in 2010 I released the finished version of my first commercial eBook The Screencasting Handbook. It was 129 pages of distilled knowledge for the budding screencaster, written in part to introduce my (then) screencasting company ProCasts to the world (which I sold years back) and based on experience teaching through ShowMeDo. Today I release […]

Social Media Brand Disambiguator first steps

As noted a few days back I’m spending June working on a social-media focused brand disambiguator using Python, NLTK and scikit-learn. This project has grown out of frustrations using existing Named Entity Recognition tools (like OpenCalais and DBPediaSpotlight) to recognise brands in social media messages. These tools are generally trained to work on long-form clean […]

Thoughts from a month’s backpacking honeymoon

I’m publishing this on the hoof, right now we’re in Istanbul near the end of our honeymoon back home. Here are some app-travelling notes (for our Nexus 4 Androids). Google Translate offers Offline dictionaries for all the European languages, each is 150mb. We downloaded new ones before each country hop. Generally they were very useful, […]

More Python 3.3 downloads than Python 2.7 for past 3 months

Since PyCon 2013 I’ve been in a set of conversations that start with “should I be using Python 3.3 for science work?”. Here’s a recent reddit thread on the subject. Last year I solidly recommended using Python 2.7 for scientific work (as many key libraries weren’t yet supported). I’m on the cusp of changing my […]