Entrepreneurial Geekiness

“Introducing Python for Data Science” talk at SkillsMatter

On Wednesday Bart and I spoke at SkillsMatter to 75 Pythonistas with an Introduction to Data Science using Python. A video of the 4 talks is now online. We covered: High Performance Python (profiling, line_profiler, memory_profiler, Cython, Numba) Natural Language Processing and Machine Learning (scikit-learn for brand detection) – based on my longer talk at […]

Ian

11 years ago

Future Cities Hackathon (@ds_ldn) Oct 2013 on Parking Usage Inefficiencies

On Saturday six of us attended the Future Cities Hackathon organised by Carlos and DataScienceLondon (@ds_ldn). I counted about 100 people in the audience (see lots of photos, original meetup thread), from asking around there seemed to be a very diverse skill set (Python and R as expected, lots of Java/C, Excel and other tools). […]

Ian

11 years ago

Public Python survey for “High Performance Python” book – your input much appreciated!

If you’re a Pythonista and you’re interested in reading our forthcoming High Performance Python book from O’Reilly we’d really appreciate 5-10 minutes of your time in our survey so we can discover what you want to learn about. Please mail this link to whoever you think would be interested (and ReTweet etc!). We’ve already conducted […]

Ian

11 years ago

PyConUK 2013

I’m just finishing with PyConUK, it has been a fun 3 days (and the sprints carry on tomorrow). Yesterday I presented a lightly tweaked version of my Brand Disambiguation with scikit-learn talk on natural language processing for social media processing. I had 65 people in the room (cripes!), 2/3 had used ML or NLP for […]

Ian

11 years ago

Writing a High Performance Python book

I’m terribly excited to announce that I’m co-authoring an O’Reilly book on High Performance Python, to be published next year. My co-author is the talented Micha Gorelick (github @mynameisfiber) of bit.ly, he’s already written a few chapters, I’ll be merging an updated version of my older eBook and adding content based on past tutorials (PyCon […]

Ian

11 years ago

EuroSciPy 2013 write-up

The conference is over, tomorrow I’m sticking around to Sprint on scikit-learn. As last year it has been a lot of fun to catch up with colleagues out here in Brussels. Here’s Logilab’s write-up. Yesterday I spoke on Building an Open Source Data Science company. Topics included how companies benefit from open sourcing their tools, […]

Ian

11 years ago

Overfitting with a Decision Tree

Below is a plot of Training versus Testing errors using a Precision metric (actually 1.0-precision, so lower is better) that shows how easy it is to over-fit a decision tree to the detriment of generalisation. It is important to check that a classifier isn’t overfitting to the training data such that it is just learning […]

Ian

11 years ago

Visualising True Positives and False Positives against Features with scikit-learn

Here I’m starting to look into the errors caused in the social media brand disambiguator project. Below I look at true and false positives (correct and mistaken is-a-brand classifications) and plot them against the number of features that two different classifiers can use to calculate their class membership probabilities. First I’m using the default LogisticRegression […]

Ian

11 years ago

Visualising the internals of Logistic Regression on a Text Matrix

Below I have some plots that visualise the term matrix (as a binary matrix and as a TF-IDF matrix) for the brand disambiguation project followed by a visualisation of the coefficients used in scikit-learn’s LogisticRegression classifier using l1 and l2 penalties. Using a CountVectorizer with binary=True we can mark the absence or presence of a […]

Ian

11 years ago

Demonstrating the first Brand Disambiguator (a hacky, crappy classifier that does something useful)

Last week I had the pleasure of talking at both BrightonPython and DataScienceLondon to about 150 people in total (Robin East wrote-up the DataScience night). The updated code is in github. The goal is to disambiguate the word-sense of a token (e.g. “Apple”) in a tweet as being either the-brand-I-care-about (in this case – Apple […]

Ian

11 years ago

Archives of Python

“Introducing Python for Data Science” talk at SkillsMatter

Future Cities Hackathon (@ds_ldn) Oct 2013 on Parking Usage Inefficiencies

Public Python survey for “High Performance Python” book – your input much appreciated!

PyConUK 2013

Writing a High Performance Python book

EuroSciPy 2013 write-up

Overfitting with a Decision Tree

Visualising True Positives and False Positives against Features with scikit-learn

Visualising the internals of Logistic Regression on a Text Matrix

Demonstrating the first Brand Disambiguator (a hacky, crappy classifier that does something useful)

Navigation

Recent Posts

About Ian