After I spoke at DataScienceLondon in June I was given a set of paper references by a couple of people (the bulk were by Levente Török) – thanks to all. They’re listed below. Along the same lines I have one machine learning paper aimed at beginners to recommend (“A Few Useful Things to Know about Machine Learning” – Pedro Domingos), it gives a set of real-world examples to work off, useful for someone short on experience who wants to learn whilst avoiding some of the worse mistakes.
Selection of references in no particular order:
Deep Learning for Efficient Discriminative Parsing, Ronan Collobert
A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning, Ronan Collobert
Latent Dirichlet Allocation (old article)
Fast Collapsed Gibbs Sampling For Latent Dirichlet Allocation
Rethinking LDA: Why priors matter (How to tune the hyper parameters which shouldn’t matter.)
Dynamic Topic Models and the Document Influence Model (in which they deal with the change of the hidden topics ( HMM))
Semi supervised topic model notes:
Semi-supervised Extraction of Entity Aspects using Topic Models
Hierarchically Supervised Latent Dirichlet Allocation
Melting the huge difference between the topic models and the bag of words approach:
Beyond Bag of words (presentation)
Integrating Topics with Syntax
Collective Latent Dirichlet Allocation (might be useful for Tweet collections)
R packages (from Levente):
R Text Tools package (noted as most advanced package, website offline when I visited it)
Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.
1 Comment