Archives of Data science

Flask + mod_uwsgi + Apache + Continuum’s Anaconda

I’ve spent the morning figuring out how to use Flask through Anaconda with Apache and uWSGI on an Amazon EC2 machine, side-stepping the system’s default Python. I’ll log the main steps in, I found lots of hints on the web but nothing that tied it all together for someone like me who lacks Apache config […]

“Introducing Python for Data Science” talk at SkillsMatter

On Wednesday Bart and I spoke at SkillsMatter to 75 Pythonistas with an Introduction to Data Science using Python. A video of the 4 talks is now online. We covered: High Performance Python (profiling, line_profiler, memory_profiler, Cython, Numba) Natural Language Processing and Machine Learning (scikit-learn for brand detection) – based on my longer talk at […]

Future Cities Hackathon (@ds_ldn) Oct 2013 on Parking Usage Inefficiencies

On Saturday six of us attended the Future Cities Hackathon organised by Carlos and DataScienceLondon (@ds_ldn). I counted about 100 people in the audience (see lots of photos, original meetup thread), from asking around there seemed to be a very diverse skill set (Python and R as expected, lots of Java/C, Excel and other tools). […]

Visualising London, Brighton and the UK using Geo-Tweets

Recently I’ve been grabbing Tweets some some natural language processing analysis (in Python using NetworkX and NLTK) – see this PyCon and PyData conversation analysis. Using the London dataset (visualised in the PyData post) I wondered if the geo-tagged tweets would give a good-looking map of London. It turns out that it does: You can […]

PowerPoint: Brief Introduction to NLProc. for Social Media

For my client (AdaptiveLab) I recently gave an internal talk on the state of the art of Natural Language Processing around Social Media (specifically Twitter and Facebook), having spent a few days digesting recent research papers. The area is fascinating (I want to do some work here via my as the text is so […]

Applied Parallel Computing at PyCon 2013 (March)

Minesh B. Amin (MBA Sciences) and I (Mor Consulting) are teaching Applied Parallel Computing at PyCon in San Jose in just over a month, here’s an outline of the tutorial. The conference is sold out but there’s still tickets for the tutorials (note that they’re selling quickly too). Typically a recording of the tutorial is […]

Layers of “data science”?

The field of “data science” covers a lot of areas, it feels like there’s a continuum of layers that can be considered and lumping them all as “data science” is perhaps less helpful than it could be. Maybe by sharing my list you can help me with further insight. In terms of unlocking value in […]

Map/Reduce (Disco) on millions of tweets

Whilst working on data sciencey problems for AdaptiveLab I’m becoming more involved in simple visualisations for proof-of-concepts for clients. This ties in nicely with my PyCon Parallel Computing tutorial with Minesh. I’ve been prototyping a Disco map/reduce tutorial (part 2 for PyCon) using tweets collected during the life of SocialTies during 2011-2012. Using 11,645,331 tweets […]

Office social graph connectivity using NetworkX

I wanted an excuse to play with the Python NetworkX graph visualisation library and recently I joined AdaptiveLab to consult on some data science & visualisation problems. Thus formed the question – how were we all connected together? I figured that looking at who follows us all will yield a little insight into the people […]