Entrepreneurial Geekiness

Some Natural Language Processing and ML Papers

After I spoke at DataScienceLondon in June I was given a set of paper references by a couple of people (the bulk were by Levente Török) – thanks to all. They’re listed below. Along the same lines I have one machine learning paper aimed at beginners to recommend (“A Few Useful Things to Know about […]

Ian

13 years ago

Overfitting with a Decision Tree

Below is a plot of Training versus Testing errors using a Precision metric (actually 1.0-precision, so lower is better) that shows how easy it is to over-fit a decision tree to the detriment of generalisation. It is important to check that a classifier isn’t overfitting to the training data such that it is just learning […]

Ian

13 years ago

Visualising True Positives and False Positives against Features with scikit-learn

Here I’m starting to look into the errors caused in the social media brand disambiguator project. Below I look at true and false positives (correct and mistaken is-a-brand classifications) and plot them against the number of features that two different classifiers can use to calculate their class membership probabilities. First I’m using the default LogisticRegression […]

Ian

13 years ago

Visualising the internals of Logistic Regression on a Text Matrix

Below I have some plots that visualise the term matrix (as a binary matrix and as a TF-IDF matrix) for the brand disambiguation project followed by a visualisation of the coefficients used in scikit-learn’s LogisticRegression classifier using l1 and l2 penalties. Using a CountVectorizer with binary=True we can mark the absence or presence of a […]

Ian

13 years ago

Demonstrating the first Brand Disambiguator (a hacky, crappy classifier that does something useful)

Last week I had the pleasure of talking at both BrightonPython and DataScienceLondon to about 150 people in total (Robin East wrote-up the DataScience night). The updated code is in github. The goal is to disambiguate the word-sense of a token (e.g. “Apple”) in a tweet as being either the-brand-I-care-about (in this case – Apple […]

Ian

13 years ago

Active Countermeasures for Privacy in a Social Networking age?

This is a bit of a rambling post covering some thoughts on data privacy, mobile phones and social networking. A general and continued decrease in personal privacy seems inevitable in our age of data (NSA Files at The Guardian). We generate a lot of data, we rarely know how or where it is stored and […]

Ian

13 years ago

Open Sourcing “The Screencasting Handbook”

Back in 2010 I released the finished version of my first commercial eBook The Screencasting Handbook. It was 129 pages of distilled knowledge for the budding screencaster, written in part to introduce my (then) screencasting company ProCasts to the world (which I sold years back) and based on experience teaching through ShowMeDo. Today I release […]

Ian

13 years ago

Social Media Brand Disambiguator first steps

As noted a few days back I’m spending June working on a social-media focused brand disambiguator using Python, NLTK and scikit-learn. This project has grown out of frustrations using existing Named Entity Recognition tools (like OpenCalais and DBPediaSpotlight) to recognise brands in social media messages. These tools are generally trained to work on long-form clean […]

Ian

13 years ago

Thoughts from a month’s backpacking honeymoon

I’m publishing this on the hoof, right now we’re in Istanbul near the end of our honeymoon back home. Here are some app-travelling notes (for our Nexus 4 Androids). Google Translate offers Offline dictionaries for all the European languages, each is 150mb. We downloaded new ones before each country hop. Generally they were very useful, […]

Ian

13 years ago

June project: Disambiguating “brands” in Social Media

Having returned from Chile last year, settled in to consulting in London, got married and now on honeymoon I’m planning on a change for June. I’m taking the month off from clients to work on my own project, an open sourced brand disambiguator for social media. As an example this will detect that the following […]

Ian

13 years ago

All posts of Ian

Some Natural Language Processing and ML Papers

Overfitting with a Decision Tree

Visualising True Positives and False Positives against Features with scikit-learn

Visualising the internals of Logistic Regression on a Text Matrix

Demonstrating the first Brand Disambiguator (a hacky, crappy classifier that does something useful)

Open Sourcing “The Screencasting Handbook”

Social Media Brand Disambiguator first steps

Thoughts from a month’s backpacking honeymoon

Navigation

Recent Posts

About Ian