All posts of Ian

A tiny foray into Apache Spark & Python

I’ve spent an afternoon playing with Apache Spark (1.0.1) to start to form an opinion on where it might be useful. Here’s a couple of notes. We’re discussing this at PyDataLondon tonight. UPDATE I cover PySpark 1.2, ElasticSearch and PyPy in 2015. You can run Spark out of the box on Linux (I’m using 13.10) […]

Python Training courses: Data Science and High Performance Python coming in October

I’m pleased to say that via our ModelInsight we’ll be running two Python-focused training courses in October. The goal is to give you new strong research & development skills, they’re aimed at folks in companies but would suit folks in academia too. UPDATE training courses ready to buy (1 Day Data Science, 2 Day High […]

IPython Memory Usage interactive tool

I’ve written a tool (ipython_memory_usage) to help my colleague and I understand how RAM is allocated for large matrix work, it’ll work for any large memory allocations (numpy or regular Python or whatever) and the allocs/deallocs are reported after every command. Here’s an example – we make a matrix of 10,000,000 elements costing 76MB and […]

Second PyDataLondon Meetup a Javascript/Analystic-tastic event

This week we ran our 2nd PyDataLondon meetup (@PyDataLondon), we had 70 in the room and a rather techy set of talks. As before we hosted by Pivotal (@gopivotal) via Ian – many thanks for the beer and pizza! I took everyone to the pub after for a beer on out  data science consultancy to […]

PyDataLondon second meetup (July 1st)

Our second PyDataLondon meetup will be running on Tuesday July 1st at Pivotal in Shoreditch. The announce went out to the meetup group and the event was at capacity within 7 hours – if you’d like to attend future meetups please join the group (and the wait-list is open for our next event). Our speakers: […]

High Performance Python manuscript submitted to O’Reilly

I’m super-happy to say that Micha and I have submitted the manuscript to O’Reilly for our High Performance Python book. Here’s the final chapter list: Understanding Performant Python Profiling to find bottlenecks (%timeit, cProfile, line_profiler, memory_profiler, heapy and more) Lists and Tuples (how they work under the hood) Dictionaries and Sets (under the hood again) […]

Flask + mod_uwsgi + Apache + Continuum’s Anaconda

I’ve spent the morning figuring out how to use Flask through Anaconda with Apache and uWSGI on an Amazon EC2 machine, side-stepping the system’s default Python. I’ll log the main steps in, I found lots of hints on the web but nothing that tied it all together for someone like me who lacks Apache config […]

7 chapters of “High Performance Python” now live

O’Reilly have just released another update to our High Performance Python book, in total we’ve now released the following: Understanding Performance Python Profiling to find bottlenecks (%timeit, cProfile, line_profiler, memory_profiler, heapy) Lists and Tuples Dictionaries and Sets Iterators and Generators Matrix and Vector Computation (numpy and scipy) Compiling to C (Cython, Shed Skin, Pythran, Numba, […]

First PyDataLondon meetup done, preparing the second

Last night we ran our first PyDataLondon meetup (@PyDataLondon). We had 80 data-focused Pythonistas in the room, co-organiser Emlyn lead the talks followed by a great set of Lightning Talks. Pivotal provided a cool venue (thanks Ian Huston!) with lovely pizza and beer in central Shoreditch – we’re much obliged to you. This was a […]

New High Performance Python chapters online & teaching a 2 day course on HPC

The last month has been crazy busy, not least because I got to run my first High Performance Python 2 day tutorial at a university. I was out in Aalborg University teaching a PhD group, we covered four blocks: Profiling (CPU and RAM) Compilers and JITs Multi-core and distributed Using less RAM, storage systems and […]