Entrepreneurial Geekiness

Ian is a London-based independent Chief Data Scientist who coaches teams, teaches and creates data products. More about Ian here.

Coaching

Training

Jobs

Products

Consulting

PyPy commercial support now available by core devs

Python March 9, 2014

I meant to note this a week or so back – some of the core PyPy dev team (Fijal and Armin) have put together a new consultancy focused on PyPy commercial support. The move was warmly received on reddit. They aim to provide training, tuning and custom work and have a team in various bits of the world. I look forward to watching this develop.

Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.

PyDataLondon 2014 Write-up

pydata, Python February 24, 2014

We’ve just drawn PyDataLondon 2014 to a close, it has been a wonderfully successful weekend. The growth of Python’s use for data science in the last few years here in the UK is pretty phenomenal. Many thanks to Continuum Analytics and NumFocus for backing and organising the PyData conferences.

“Start of the week after busy weekend attending 1st #PyData London. Thanks to organisers & speakers: a smashing set of talks & gr8 community.” @lindauruchurtu

“If it weren’t for the succession of great talks at #pydata Ldn, I’d be getting quite upset about the AUS v SA test. Thank you @PyDataConf” @davisjmcc

We’ve had a fab weekend and a packed schedule with training and varied talks including:

Two great keynotes (Deutsche Börse finance by Felix Fernandez and ‘big data’ brain research on a budget by Gael Varoquaux)
Machine learning (mostly scikit-learn) and text processing
Lots of visualisation (mostly javascript)
Practical discussion of how and why things work (including some hard-won lessons on processes and statistics) and strong lessons on mistakes to avoid
Art and economics
Lots of IPython Notebook
Some R and Matlab (we need more of this!)
Great lightning talks including live rocket-science-in-the-Notebook to close the weekend
Speakers from both industry (including BAE, the Met Office and Hedge Funds through to fresh startups) and academia stretching through Europe

One outcome from Gael’s keynote was the importance of citing the open source projects that get used to help highlight their need for funding and resources:

“next time you write an article that uses scikit-learn and friends, cite the software you use, that will help authors, eg get funding #pydata” – @dimazest

I ran a panel asking “Shouldn’t more companies be using data science?” – the deliberately loaded question was addressed by a a range of industrial representatives including James from New York, Jonathan, Johnny, Dirk, Ian and Philip. The short answer seemed to be that more companies were taking risks (and winning the rewards) of analysing their data and that some more training (both for scientists and for managers) could help things along.

“#PyData panel first question: have you done anything data analysis related within the last six month? (half the room raises hands)” James Powell

Through my Mor Consulting I talked on The Landscape of High Performance Python by taking a look at profiling techniques and compiler options for single-machine multi-core speed-ups, obviously this is somewhat connected to the High Performance Python book I’m working on (hopefully an early release of the first chapters will be out shortly).

“Like @ianozsvald ‘s ‘team velocity’ to describe how clean slow code can be better than complex fast code in terms of team development” – Mark Basham of Diamond Synchrotron

Renowned Brightonian artist Eric Drass spoke on the confluence of art, mass data, surveillance, the redaction of political positions (and how nothing is ever really removed from the internet – AlgoCameron) and Hugh Hefner:

Martin Goodson‘s “Most Winning A/B Test Results are Illusory” talk has hit HackerNews with good discussion via his published paper.

(reformed string-theorist) Linda spoke on trying #sklearn as an avid R user for music recommendation, highlighting some of the highs and lows of both toolsets (and noting the sillyness of the ‘language wars’):

My colleague Bart Baddeley discussed problems and solutions in clustering approaches, IPython Notebook with all examples available online:

“Similarity matrices are a neat way of eye-balling whether you’ve chosen the right number of clusters #pydata” – Hugo Carr

Kyran Dale (my co-founder from ShowMeDo and StrongSteam) spoke on powering javascript from live Python servers using techniques such as web sockets to visualise robot brain controllers and UK weather patterns:

Neri covered NLP and ML using NLTK and scikitlearn for real-time customer support at Conversocial (a successful London customer support startup):

Philippe Bracke spoke on house price rents and yields, modelled during his PhD:

“Interesting conclusion from @PhilippeBracke #pydata you earn less money from renting more expensive properties” – Ian Taylor

SkimLinks sponsored a fun Saturday party (they’re hiring!). The conference series is generously sponsored by Continuum Analytics (it all started in the USA – hello Bryan!) and supported through the non-profit NumFocus organisation (and Leah does a rather ace job of pulling all the loose strands into a cohesive whole!).

Level39 in Canary Wharf provided the venue. Additional sponsors include Lyst who are hiring (hi Seb!), Python Academy (hello Mike!), Python Software Foundation, Knowsis, DataRobot (hello Jeremy and Peter!), Python Weekly and O’Reilly.

The view from Level39 was rather nice (their space is ace – visit it if you get a chance – thanking Jacqui for the photo):

Clearly we have a strong base here to build from for future conferences. EuroSciPy 2014 (Cambridge, August) was discussed and PyDataBerlin was announced, it’ll happen in conjunction with EuroPython (July, Berlin). I’ll be at all three.

More write-ups are available:

DataRobot
Continuum‘s by Francesc (one of our speakers)
Ian Huston
Ian’s tweet collection (with many images)
Florian Rathberger
SandTable (London based consultancy)

For future events we’ll have to work on female attendance (I counted 10% – this surely can be improved), we also want more interdisciplinary talks (we had some R and Matlab – we need more languages and other approaches). Overall I’m super happy with the outcome, we organised this in under two months, we got a fab turn-out and a stellar set of speakers (from nearby, throughout Europe and out to the USA). The next event can only be stronger still.

We collected slides and everything was recorded, videos will hopefully be up in a week.

I thank the organising team – Leah (NumFocus) kept us all on track, Emlyn, Cecilia, Florian, Yves and James here and our past-PyData American supporters all kept things moving in what appeared to be a rather effortless way. It wouldn’t have worked without everyone’s support including all the custodians of local usergroups who kindly spread the word – many thanks to you all.

High Performance Python at PyDataLondon 2014

High Performance Python Book, pydata, Python February 23, 2014

Yesterday I spoke on The High Performance Python Landscape at PyDataLondon 2014 (our first PyData outside of the USA – see my write-up). I was blessed with a full room and interesting questions. With Micha I’m authoring a High Performance Python book with O’Reilly (email list for early access) and I took the topics from a few of our chapters.

“@ianozsvald providing eye-opening discussion of tools for high-performance #Python: #Cython, #ShedSkin, #Pythran, #PyPy, #numba… #pydata” – @davisjmcc

Overall I covered:

line_profiler for CPU profiling in a function
memory_profiler for RAM profiling in a function
memory_profiler’s %memit
memory_profiler’s mprof to graph memory use during program’s runtime
thoughts on adding network and disk I/O tracking to mprof
Cython on lists
Cython on numpy by dereferencing elements (which would normally be horribly inefficient) plus OpenMP
ShedSkin‘s annotated output and thoughts on using this as an input to Cython
PyPy and numpy in PyPy
Pythran with numpy and OpenMP support (you should check this out)
Numba
Concluding thoughts on why you should probably use JITs over Cython

Here’s my room full of happy Pythonistas 🙂

“Really useful and practical performance tips from @ianozsvald @pydata #pydata speeding up #Python code” – @iantaylorfb

Slides from the talk:

UPDATE Armin and Maciej came back today with some extra answers about the PyPy-numpy performance (here and here), the bottom line is that they ~~plan to fix it~~ (Maciej says it is now fixed – quick service!). Maciej also notes improvements planned using e.g. vectorisation in numpy.

VIDEO TO FOLLOW

PyData London Abstracts Announced

Python February 4, 2014

I’m very pleased to say that the talks and tutorials are public now, listed on the Abstracts page, this is the first draft of the acceptances so maybe there will be some changes but we’re treating it as ‘mostly done’. The schedule will follow later (the conference is backed by a non-profit and the organisation is all volunteer based).

Early bird tickets run for 1 week from this announce, grab ’em quick. We’ll cover lots of Python, some R and Matlab, lots of data analysis, visualisation and machine learning, also some economics and art. We’re aiming to bring a wide range of people together to help build the local data analysis community, the goal is to start lots of conversations and to encourage collaborations. We plan to have a panel, lightning talks and there will be lots of evening beer to drink.

Some of our speakers are coming in from both the USA and further into Europe, they have links to the older PyData and SciPy conferences along with EuroSciPy and EuroPython. You’ll recognise core contributors for numpy, scipy, scikit-learn and the like at the conference.

PyData London conference keynotes and topics coming together

Python January 21, 2014

We’ve got our keynoters for PyData London (Feb Fri 21- Sun 23):

Gael Varoquaux (INRIA) with “Building a Cutting-Edge Data Processing Environment on a Budget”
Felix Fernandez (Deutsche Börse) with “Python in the Financial Industry: The Universal Tool for End-to-End Development”

Gael is a core committer for scikit-learn and Felix is the Business CIO for the Cash & Derivatives IT department.

Along with the keynoters we have a nice set of talks lining up (the Call for Proposals is open!), the topics will probably include:

Javascript frameworks for data visualisation
Statistical approaches to problem solving
An economist’s view into data science
Art in the realm of Big Data
Data clustering techniques
High performance Python processing

Our Call for Proposals is open until the end of the month, I’m really keen to see stories and tutorials around the solving of interesting problems with data. Whilst the conference is themed for Python I’m keen to see proposals that use other languages (or no language – art & stats!) to do interesting things around data. I’m more focused on interesting topics and lively discussion. Are there any R, Matlab and Julia users who’d like to share their experience?

Please do consider putting forward a proposal, the conversation that will come out of the conference looks to be rather interesting already. If you’ve never spoken at a conference before then this would be a rather ideal place to start.

Ian Ozsvald
Read my book
Oreilly High Performance Python by Micha Gorelick & Ian Ozsvald
AI Consulting

Mor Consulting Ltd. is an A.I. focused consultancy offering strategic research and development owned by Ian Ozsvald, based in London (UK).
Co-organiser

PyData London provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other.
Trending Now
1
Leadership discussion session at PyDataLondon 2024
Data science, pydata, RebelAI
2
What I’ve been up to since 2022
pydata, Python
3
Upcoming discussion calls for Team Structure and Buidling a Backlog for data science leads
Data science, pydata, Python
4
My first commit to Pandas
Python
5
Skinny Pandas Riding on a Rocket at PyDataGlobal 2020
Data science, pydata, Python
Tags
Aim Api Artificial Intelligence Blog Brighton Conferences Cookbook Demo Ebook Email Emily Face Detection Few Days Google High Performance Iphone Kyran Laptop Linux London Lt Map Natural Language Processing Nbsp Nltk Numpy Optical Character Recognition Pycon Python Python Mailing Python Tutorial Robots Running Santiago Seb Skiff Slides Startups Tweet Tweets Twitter Ubuntu Ups Vimeo Wikipedia

Entrepreneurial Geekiness

PyPy commercial support now available by core devs

PyDataLondon 2014 Write-up

High Performance Python at PyDataLondon 2014

PyData London Abstracts Announced

PyData London conference keynotes and topics coming together

Read my book

AI Consulting

Co-organiser

Trending Now

Navigation

Recent Posts

About Ian

PyPy commercial support now available by core devs

PyDataLondon 2014 Write-up

High Performance Python at PyDataLondon 2014

PyData London Abstracts Announced

PyData London conference keynotes and topics coming together

Read my book

AI Consulting

Co-organiser

Trending Now

Tags

Navigation

Recent Posts

About Ian