About

Ian Ozsvald picture

This is Ian Ozsvald's blog (@IanOzsvald), I'm an entrepreneurial geek, a Data Science/ML/NLP/AI consultant, author of O'Reilly's High Performance Python book, co-organiser of PyDataLondon, a Pythonista, co-founder of ShowMeDo and also a Londoner. Here's a little more about me.

High Performance Python book with O'Reilly

View Ian Ozsvald's profile on LinkedIn

ModelInsight Data Science Consultancy London Protecting your bits. Open Rights Group

14 January 2014 - 23:05Installing the numpy module in PyPy

Working on the High Performance Python book (mailing list here for our occasional announces) I’ve reinstalled PyPy a couple of times, each time I forget how to install the numpy module. Note that PyPy’s numpy is different and much smaller than CPython’s numpy. It does however work for smaller problems if you just need some of the core features (i.e. not the libs that numpy wraps). It used to be included in a branch, now it comes as a separate package.

I’m posting this as a reminder to myself and maybe as  a bit of help to another intrepid soul. The numpy PyPy install instructions are in this Nov 2013 blog post. You need to clone their numpy repo and then install it as a regular module using the “setup.py” that’s provided (it takes just a couple of minutes and installs fine). Download PyPy from here and just extract it somewhere.

Once you have pypy you’ll also need pip, follow the get-pip instructions and but use “bin/pypy get-pip.py” and it’ll install pip, you can then use “bin/pip install git+https://bitbucket.org/pypy/numpy.git” as per their instructions.

NOTE if you get “ImportError: No module named _numpypy” after all of this – maybe you’re using pypy3 – as of June 2015 pypy3 doesn’t support numpy.

Having installed it I can:
$ ../bin/pypy 
Python 2.7.3 (87aa9de10f9c, Nov 24 2013, 18:48:13)
[PyPy 2.2.1 with GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
And now for something completely different: ``it seems to me that once you
settle on an execution / object model and / or bytecode format, you've already
decided what languages (where the 's' seems superfluous) support is going to be
first class for''
>>>> import numpy as np
>>>> np.__version__
'1.8.0.dev-74707b0'

From here I can use the random module and do various vectorized operations, it isn’t as fast as CPython’s numpy for the Pi example I’m working but it does work. Does anyone know which parts offer comparable speed to its bigger brother?


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

8 Comments | Tags: High Performance Python Book, Python

23 December 2013 - 14:52Progress on High Performance Python book

I figured a short update was in order. Micha (@mynameisfiber, github) and I are progressing on our High Performance Python book, we have a proposed chapter outline below and hope to have a rough cut of some early chapters for January. The book should be finalised ‘earlier in 2014’ though we won’t be drawn on a date just yet.

The book is aimed at an Intermediate Pythonista audience for people who need to go beyond single core performance with CPython 2.7 and who want to use multi-cores, other compilers and clusters. It is not an HPC-Expert level book, though we hope that Python Experts will find some of the chapters to be useful.

We have a mailing list (signup here) which we use very sparingly to update our group about our progress (and soon we’ll be posting the rough-cut chapter notifications via the list). We’re posting once a month or so, you can opt out at any time.

Some people have questioned our focus on CPython 2.7. We ran a survey a few months ago and a couple of hundred people told us that they mainly use CPython 2.7 for their number crunching work, so that’s the focus of the book. Almost everything will or should run in Python 3.3+ with little or no change so we’re not worrying about the smaller differences.

Planned chapter list (this might be subject to change):
  • Understanding Performance Programming (high level introduction)
  • Profiling Python code (focusing on CPU and RAM profiling)
  • Pure Python (mainly CPython 2.7)
  • Matrix Computation with numpy
  • Disks and Network Processing
  • Calculating in Parallel (introduction)
  • Multiprocessing (multi-core on a single machine)
  • Cluster Computing (commodity clusters)
  • Just In Time Compiling
  • Static Compiling
  • Using less RAM
  • Lessons from the Field

 


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

2 Comments | Tags: High Performance Python Book, Python