About

Ian Ozsvald picture

This is Ian Ozsvald's blog (@IanOzsvald), I'm an entrepreneurial geek, a Data Science/ML/NLP/AI consultant, author of O'Reilly's High Performance Python book, co-organiser of PyDataLondon, a Pythonista, co-founder of ShowMeDo and also a Londoner. Here's a little more about me.

High Performance Python book with O'Reilly

View Ian Ozsvald's profile on LinkedIn

ModelInsight Data Science Consultancy London Protecting your bits. Open Rights Group

9 July 2013 - 13:55Some Natural Language Processing and ML Papers

After I spoke at DataScienceLondon in June I was given a set of paper references by a couple of people (the bulk were by Levente Török) – thanks to all. They’re listed below. Along the same lines I have one machine learning paper aimed at beginners to recommend (“A Few Useful Things to Know about Machine Learning” – Pedro Domingos), it gives a set of real-world examples to work off, useful for someone short on experience who wants to learn whilst avoiding some of the worse mistakes.

Selection of references in no particular order:

Deep Learning for Efficient Discriminative Parsing, Ronan Collobert

A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning, Ronan Collobert

Latent Dirichlet Allocation (old article)
Fast Collapsed Gibbs Sampling For Latent Dirichlet Allocation

Rethinking LDA: Why priors matter (How to tune the hyper parameters which shouldn’t matter.)
Dynamic Topic Models and the Document Influence Model (in which they deal with the change of the hidden topics ( HMM))

Semi supervised topic model notes:

Semi-supervised Extraction of Entity Aspects using Topic Models

Hierarchically Supervised Latent Dirichlet Allocation

Melting the huge difference between the topic models and the bag of words approach:

Beyond Bag of words (presentation)

A note on Topical N-grams

PCFGs, Topic Models

Integrating Topics with Syntax

Syntactic Topic Models

Collective Latent Dirichlet Allocation (might be useful for Tweet collections)

R packages (from Levente):

topicmodels for R

lda for R

R Text Tools package (noted as most advanced package, website offline when I visited it)


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

1 Comment | Tags: Life, SocialMediaBrandDisambiguator