Entrepreneurial Geekiness

Ian is a London-based independent Chief Data Scientist who coaches teams, teaches and creates data products. More about Ian here.
Entrepreneurial Geekiness
Ian is a London-based independent Chief Data Scientist who coaches teams, teaches and creates data products.
Coaching
Training
Jobs
Products
Consulting

High Performance Python 1 from PyCon 2012 (slides, video, src)

This is the follow-on for my PyCon 2012 notes from the end post. I gave a 3.5 hour tutorial on High Performance Python 1, below I link to the slides, the video and the source code.

UPDATE2 From October 2014 I’ll be training on High Performance Python and Data Science in London using Python – sign-up here to get on our announce list (no spam, it’ll just be occasional announces).

UPDATE I’m thinking of writing an updated guide (update High Performance Python published by O’Reilly now!), if you’re interested in hearing about it please join the High Performance Python Mailing List (I’ve only got a list right now). I’ll make an announce once I know more.

Topics covered:

  1. Profiling with cProfile and line_profiler
  2. Profile visualisations with runsnake
  3. PyPy for quick wins
  4. Cython for C-level speed
  5. ShedSkin for ‘quick wins’ on the right problems
  6. Cython+numpy for multi-core (300* on this Mandelbrot problem) speed-ups
  7. Multiprocessing for multi-core support
  8. ParallelPython for multi-machine support
  9. Numexpr for faster numpy math

The other topics in this high performance track (a part of the tutorial track) are:

and there’s a full set of videos here.

After EuroPython I wrote up my talk with additional material as a 55 page book, I was hoping to update the book this year but things are moving so fast with our new StrongSteam AI/vision startup (presented at StartupRow at PyCon) that I can’t really justify the time right now. I’ll just link to the High Performance Python book from last year, the timings are out of date (but they’re correct in the slides below) and the src is updated a bit, but the method and discussion is still correct.

Github code for HighPerformancePython_PyCon2012.

Slides:

 

Video (3.5 hours) via pyvideo.org:

 


Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.
Read More

PyCon 2012 notes from the end

PyCon 2012 is just coming to a close. There were over 2,200 people here and too many talks to choose between. It was a bloody fine conference. Meeting so many of the Names of the Python world was rather grand, teaching High Performance Computing and getting pats on the back for the creation of ShowMeDo was also rather nice.

UPDATE my notes for my High Performance Python 1 tutorial at PyCon 2012 are now online.

I Send A Big Thank You To The Organisers (and I mean all the organisers – including the AV crew who did a very fine job). This was my first US PyCon, it won’t be my last.

On the Thursday morning I ran my High Performance Python 1 tutorial for 60 students. The 3 hours passed in a blur (much as it did at EuroPython last year). I’ll have an updated booklet (here’s last year’s EuroPython booklet) in a couple of weeks. Here’s the github code.

Here’s the 3 hour video of my tutorial: High Performance Python 1 at PyCon 2012:

On Friday Paul Graham gave a nice keynote on startups (“Frighteningly Ambitious Startup Ideas“), this rather set the stage for our attendance with StrongSteam. Keynote:

We won a booth on StartupRow on the Friday in the expo hall, I adorned our stand with some posters and props for our mobile phone demos. With StrongSteam we’re working to give a pair of eyes to mobile phones so phones can ‘see’ the world as a human does. What could you do with an API that let’s you build your own Google Goggles?

Kyran and Balthazar had put together some cool demos – OCR on photos of labels to read text and open relevant wikipedia entries and also artwork recognition for Shardcore‘s art. We won a few offers of angel investment, had an acquisition offer, got some users and found some collaborators. Not bad for the second day at the conference.

One criticism I have is that StartupRow wasn’t advertisied. We were given small booths at the back of the hall behind the big shiny stands so it looked a bit like we were the poor cousins to the ‘proper’ companies. A banner or other announcement priming folk to the idea that we were early stage would have been handy.

Today the expo hall was cleared for the poster session. This was huge, I was very happy to see a wide selection of science and HPC projects along with a handful of companies.

Now I need to sleep before the 4am wakeup for the return flight to Santiago. Then…finishing off our first StrongSteam client and moving towards inviting users into the API.


Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.
Read More

StartupChile, PyCon, StrongSteam

As ever with a startup – there’s always too much to do and the game is all about juggling burning balls whilst figuring out which shouldn’t be dropped. We’re rather busy here.

Yesterday Emily and I finished the paperwork for our first reimbursement round at StartupChile. This is the part of the process that gets the most complaints from the startups here. We spent 5 hours yesterday preparing our first £5k or so for refund (flights, visas, first month’s rent, various expenses). All going well we’ll get 90% of this money back in a few weeks. Quite possibly we’ll have missed a massively important but otherwise minor detail somewhere and the admin team will reject up to 90% of the receipts (which can be resubmitted in a month) – such horror stories abound from Round 1.

The future expenses that we’ve already paid for like my trip to PyCon (next month) and flights can’t be claimed yet as I’ve yet to attend – we can only reimburse for definitely-spent money. The argument is that we could refund a future plane or conference ticket having already claimed it here through StartupChile, so getting ‘money for nothing’. This means I’m carrying another few thousand pounds of expenses that I can’t refund for at least another 6 weeks. Ho hum. Cashflow is king, I’m glad we had reserves when we flew out here.

Emily notes that the next application round opens soon, I know that Round 3 starts to arrive in a week’s time. I hope everyone who is already here updates the wiki so the obvious newbie questions that we asked don’t get repeated all over again!

Talking of PyCon – I’m pretty excited to be teaching High Performance Computing 1 this year. I’ve made some updates from last year’s course and I’ll get to tell some stories this year as we’re using this tech in StrongSteam. Getting to catch up with Travis (numpy originator), Fijal (numpypy in PyPy) and others will be rather awesome. I’ve also accepted a teaching position for EuroSciPy in August.

StrongSteam continues to develop. We’re still not taking on alpha users, we’re focusing on our first client from London until the end of March and then we’ll invite people to come play with our first bit of tech. In April we release our first iPhone app – it’ll let you take photographs of Latin plant labels at botanical gardens, we’ll then match them using Optical Character Recognition and vision techniques to a database of plants and give you information, pictures and videos (via WikiPedia, GeoSpecies and BBC:Wildlife) in return. We’re working with Kasabi (data partner announce) as our data partner.

Everything is backed by Python, our third member (Balthazar Rouberol @baltorouberol) joins us this week and he’ll wrap the client API as a Python package so we can start to distribute it to users who have joined our announce list (see our homepage).

We hope to expand this tech to make a similar app for use at the London Science Museum – getting videos and schematics for all the wonderful devices at the Science Museum direct to the smartphone seems like a wonderful way to enhance a trip (Steam Engines puffing! Babbage’s machines calculating!). We’re really excited to see what devs can do once they can reliably match text from labels, plaques and information cards – despite noise, distortion and obstruction – to a database of matching entries. This should make for some fun mobile apps.

I’m also preparing to declare myself as ‘tribe leader’ for Data Mining here at StartupChile – this means our Data meetups will gather more of the Return Value Agenda points (the points you have to get to qualify for the $40k grant under the programme), it’ll also give me more reasons to go open doors at the local telecomms companies.


Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.
Read More

Data mining/AI/robots/hackerspace meet-up this Thursday

This Thursday at 7pm our StrongSteam will run a friendly pub meet around:

  • Data mining
  • Artificial Intelligence (AI)
  • Robots
  • Hackerspaces

The goal is to bring people together from StartupChile and the local community who are interested in the above subjects. The meeting is just a pub meetup, if there’s demand then I’ll organise speakers for the next one.

The locations is Bar Lastarria, 70 Lastarria, Santiago (map). Here’s a photo:

Confirmed attendees include:

Here’s the official announce.


Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.
Read More

StartupChile – we have our contracts, StrongSteam progress, PyCon

A few days back we signed our StartupChile contracts, now we’re official. Apparently our ID cards are available but there’s no word on bank accounts yet. The admin rolls forward but it is a bit boring now. The feeling here is still very positive, we’ve gained some Return Value Agenda (RVA) points by meeting with the local university and StrongSteam runs its first event this week (next post).

In StrongSteam we’ve made progress – we’re now working with Kasabi on an optical character recognition project on Latin plant labels, they have large plant data sets which we’ll marry up with a user’s experience whilst walking around places like Kew Gardens. We’re being interviewed by the BBC on this shortly.

Behind the scenes I’ve extended the python-tesseract wrapper with a nicer access class, shortly I’ll post that to github. It makes it really easy to get characters and co-ordinates from scenes. Image processing tools will be available via StrongSteam to make the task easier.

For March I’ve also bought my PyCon tickets to run my High Performance Computing class. I had no idea it’d take longer to fly from Santiago to Santa Clara than Heathrow to Santiago! It is 20 hours north vs 18 hours west.


Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.
Read More