Some Natural Language Processing and ML Papers

After I spoke at DataScienceLondon in June I was given a set of paper references by a couple of people (the bulk were by Levente Török) – thanks to all. They’re listed below. Along the same lines I have one machine learning paper aimed at beginners to recommend (“A Few Useful Things to Know about Machine Learning” – Pedro Domingos), it gives a set of real-world examples to work off, useful for someone short on experience who wants to learn whilst avoiding some of the worse mistakes.

Selection of references in no particular order:

Deep Learning for Efficient Discriminative Parsing, Ronan Collobert

A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning, Ronan Collobert

Latent Dirichlet Allocation (old article)
Fast Collapsed Gibbs Sampling For Latent Dirichlet Allocation

Rethinking LDA: Why priors matter (How to tune the hyper parameters which shouldn’t matter.)
Dynamic Topic Models and the Document Influence Model (in which they deal with the change of the hidden topics ( HMM))

Semi supervised topic model notes:

Semi-supervised Extraction of Entity Aspects using Topic Models

Hierarchically Supervised Latent Dirichlet Allocation

Melting the huge difference between the topic models and the bag of words approach:

Beyond Bag of words (presentation)

A note on Topical N-grams

PCFGs, Topic Models

Integrating Topics with Syntax

Syntactic Topic Models

Collective Latent Dirichlet Allocation (might be useful for Tweet collections)

R packages (from Levente):

topicmodels for R

lda for R

R Text Tools package (noted as most advanced package, website offline when I visited it)


Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.

1 Comment