Entrepreneurial Geekiness

Ian is a London-based independent Chief Data Scientist who coaches teams, teaches and creates data products. More about Ian here.
Entrepreneurial Geekiness
Ian is a London-based independent Chief Data Scientist who coaches teams, teaches and creates data products.
Coaching
Training
Jobs
Products
Consulting

Installing the numpy module in PyPy

Working on the High Performance Python book (mailing list here for our occasional announces) I’ve reinstalled PyPy a couple of times, each time I forget how to install the numpy module. Note that PyPy’s numpy is different and much smaller than CPython’s numpy. It does however work for smaller problems if you just need some of the core features (i.e. not the libs that numpy wraps). It used to be included in a branch, now it comes as a separate package.

I’m posting this as a reminder to myself and maybe as  a bit of help to another intrepid soul. The numpy PyPy install instructions are in this Nov 2013 blog post. You need to clone their numpy repo and then install it as a regular module using the “setup.py” that’s provided (it takes just a couple of minutes and installs fine). Download PyPy from here and just extract it somewhere.

Once you have pypy you’ll also need pip, follow the get-pip instructions and but use “bin/pypy get-pip.py” and it’ll install pip, you can then use “bin/pip install git+https://bitbucket.org/pypy/numpy.git” as per their instructions.

NOTE if you get “ImportError: No module named _numpypy” after all of this – maybe you’re using pypy3 – as of June 2015 pypy3 doesn’t support numpy.

Having installed it I can:
$ ../bin/pypy 
Python 2.7.3 (87aa9de10f9c, Nov 24 2013, 18:48:13)
[PyPy 2.2.1 with GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
And now for something completely different: ``it seems to me that once you
settle on an execution / object model and / or bytecode format, you've already
decided what languages (where the 's' seems superfluous) support is going to be
first class for''
>>>> import numpy as np
>>>> np.__version__
'1.8.0.dev-74707b0'

From here I can use the random module and do various vectorized operations, it isn’t as fast as CPython’s numpy for the Pi example I’m working but it does work. Does anyone know which parts offer comparable speed to its bigger brother?


Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.
Read More

PyData London (Feb 21-23 2014)

PyData is coming to London, this’ll be the first PyData in Europe. The conference will focus on Python for Data Analytics, quite like SciPy and EuroSciPy but with a bit more of a focus on business rather than science (but only a bit, I rather like the science).

170 videos from past conferences are available free online, all the content from this conference will go online afterwards. Obviously a big part of the conference is meeting people in-person, it’ll be held in Level39 in Canary Wharf at One Canada Place.We’re looking for talk submissions – the timeline is short so please respond to the Call for Proposals by the end of Jan (that’s 2.5 weeks away). Submissions around data analysis, “big data” (“interesting data” is more what I’d like to see) and visualisation would be ace, tools like numpy, Pandas, Cython, Numba, IPython, Vincent etc obviously are all very relevant.

Sponsorship for a London event is rather reasonable, if you’re looking to hire or raise your profile to London data/Python people then contact Leah via the Admin address on the sponsorship page, you’ll get good exposure for less than the cost of hiring a recruiter and you’ll have direct access to people who help write the tools that we all use.


Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.
Read More

Progress on High Performance Python book

I figured a short update was in order. Micha (@mynameisfiber, github) and I are progressing on our High Performance Python book, we have a proposed chapter outline below and hope to have a rough cut of some early chapters for January. The book should be finalised ‘earlier in 2014’ though we won’t be drawn on a date just yet.

The book is aimed at an Intermediate Pythonista audience for people who need to go beyond single core performance with CPython 2.7 and who want to use multi-cores, other compilers and clusters. It is not an HPC-Expert level book, though we hope that Python Experts will find some of the chapters to be useful.

We have a mailing list (signup here) which we use very sparingly to update our group about our progress (and soon we’ll be posting the rough-cut chapter notifications via the list). We’re posting once a month or so, you can opt out at any time.

Some people have questioned our focus on CPython 2.7. We ran a survey a few months ago and a couple of hundred people told us that they mainly use CPython 2.7 for their number crunching work, so that’s the focus of the book. Almost everything will or should run in Python 3.3+ with little or no change so we’re not worrying about the smaller differences.

Planned chapter list (this might be subject to change):
  • Understanding Performance Programming (high level introduction)
  • Profiling Python code (focusing on CPU and RAM profiling)
  • Pure Python (mainly CPython 2.7)
  • Matrix Computation with numpy
  • Disks and Network Processing
  • Calculating in Parallel (introduction)
  • Multiprocessing (multi-core on a single machine)
  • Cluster Computing (commodity clusters)
  • Just In Time Compiling
  • Static Compiling
  • Using less RAM
  • Lessons from the Field

 


Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.
Read More

“Introducing Python for Data Science” talk at SkillsMatter

On Wednesday Bart and I spoke at SkillsMatter to 75 Pythonistas with an Introduction to Data Science using Python. A video of the 4 talks is now online. We covered:

Since the group is more of a general programming community we wanted to talk at a high level on the various ways that Python can be used for data science, it was lovely to have such a large turn-out and the following pub conversation was much fun.


Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.
Read More

What confusion leads from self driving vehicles and their talking to each other?

This is a light follow-up from my “Do self driving cars make the courier redundant?”  post from January. I’m wondering which first- and second-order effects occur from self-driving cars talking to each other.

Let’s assume they can self-drive and self-park and that they have some ability to communicate with each other. Noting their speed and intent should help self-driving cars make better utilisation of the road (they could drive closer together), they could quickly signal if they have a failure (e.g. “My brake readings have just become odd – everyone pull back! I’m slowing using the secondary brake system”), they can signal that e.g. they intend to reverse park and that other cars should slow further back along the road to avoid having to halt. It is hard to see how a sensibly designed system of self-driving cars could be worse than a similar sized pack of normal humans (who might be tired, overconfident, in a rush etc) behind the wheel.

Would cars deliberately lie? There are many running jokes about drivers (often “elsewhere” in the world) where some may signal one way and then exploit nearby gaps regardless of their signalled intention. Might cars do the same? By design or by poor coding? I’d guess people might mod their driving computer to help them get somewhere faster – maybe they’d ask it to be less cautious in its manoeuvres  (taking turns quicker, giving less distance between other vehicles) or hypermile more closely than a human would. Manufacturers would fight back as these sorts of modifications would increase their liabilities and accidents would damage their brand.

What about poorly implemented protocols? On the Internet with TCP/IP we suffer from bufferbloat – many intermediate devices between packet destinations have varying sized buffers, they all try to cache to manage traffic but we end up with lower throughput and odd jams that are rather unpredictable and contrary to the design goal. Cars could have poor implementations of communication protocols (just as some smartphones and laptop brands have trouble with certain WiFi routers), so they’d fail to talk or maybe talk with errors.

Maybe cars would not communicate directly but would implement some boids-like behaviours based on local sensing (probably more robust but also less efficient due to no longer-range negotiation). Even so local odd behaviours might emerge – two cars backing off from each other, then accelerating to close the gap, then repeating – maybe a group of cars get into an unstable ‘dance’ whilst driving down the motorway. This might only be visible from the air and would look rather inhuman.

Presumably self-driving cars would have to avoid hitting humans at all costs. This might make humans less observant as they cross the road – why look if you know that a car is always anticipating (and avoiding) your arrival into the road? This presumably leaves self-driving cars at the mercy of mischievous humans – leaving out human-like dolls in the road that cause slow-and-avoid behaviours, just for kicks.

Governments are likely to introduce some kind of control overrides into the cars in the name of safety and national security (NSA/GCHQ – looking at you). This is likely to be as secure as the “unbreakable” DVD encryption, since any encryption system released into the wild is subject to various attacks. Having people steal cars or subvert their behaviours once the backdoors and overrides are noticed seems inevitable.

I wonder what sort of second order effects we’d see? I suspect that self-driving delivery vehicles would shift to more night work (when the roads are less congested and possibly petrol is dynamically priced to be cheaper), so roads could be less congested by day (and so could be filled by more humans as they commute longer distances to work?). Maybe people en-mass forget how to drive? More people will never have to drive a car, so we’d need fewer driving instructors. Maybe we’d need fewer parking spaces as cars could self-park elsewhere and return when summoned – maybe the addition of intelligence helps us use parking resources more efficiently?

If we have self-driving trucks then maybe the cost of removals and deliveries drop. No longer would I need to hire a large truck with a driver, instead the truck would drive itself (it’d still need loading/unloading of course). This would mean fewer people taking the larger-vehicle licensing exams, so fewer test centres (just as for driving schools) would be needed.

An obvious addition – if cars can self-drive then repair centres don’t need to be small and local. Whither the local street of car mechanics (inevitably of varying quality and, sadly, honesty)? I’d guess larger, out of town centralised garages more closely monitored by the manufacturers will surface (along with a fleet of pick-up trucks for broken-down vehicles). What happens to the local street of car mechanic shops? More hackspaces and assembly shops? Conversion to housing seems more likely.

If we need less parking spaces (e.g. in Hove [1927 photo!] there are huge boulevards – see Grand Avenue lanes here) then maybe we get more cycle lanes and maybe we can repurpose some of the road space for other usages – communal green patches (for kids and/or for growing stuff?).

The NYTimes has a good article on how driverles cars could reshape cities.

Charles Stross has a nice thread on geo-political consequences of self-driving cars. One comment alludes to improved social lives – if we can get to and from a party/restaurant/pub/nice social scene very easily (without e.g. hoping for the last Tube train home in London or a less pleasant bus journey), maybe our social dimension increases? The comment on flying vs driving  is interesting – you’d probably drive further rather than fly if you could sleep for much of the journey, so that hurts flight companies and increases the burden on road maintenance (but maybe preserves motorway service stations that might otherwise get less business since you’d be less in need of a break if you’re not concentrating on driving all the time!).

Hmmm…drone networks look like they might do interesting things for delivery to non-road locations, but drones have a limited range. What about coupling an HGV ‘mother truck’ with a drone fleet for the distribution of goods to remote locations, with the ‘mother truck’ containing a generator and a large storage unit of stuff-to-distribute. I’m thinking about feeding animals in winter that are stuck in fields, reaching hurricane survivors, more extreme running races (and hopefully helping to avoid deaths) or even supplying people living out of cities and in remote areas (maybe Amazon-by-drone deliveries whilst living up a mountain become feasible?).


Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.
Read More