Entrepreneurial Geekiness

Ian is a London-based independent Chief Data Scientist who coaches teams, teaches and creates data products. More about Ian here.

Coaching

Training

Jobs

Products

Consulting

My Keynote at PyConIreland 2014 – “The Real Unsolved Problems in Data Science”

High Performance Python Book, pydata, Python October 11, 2014

I’ve just given the opening keynote here at PyConIreland 2014 – many thanks to the organisers for letting me get on stage. This is based on 15 years experience running my own consultancies in Data Science and Artificial Intelligence. (Small note – with the pic below James mis-tweeted ‘sexist’ instead of ‘sexiest’ (from my opening slide) <sigh>)

Sidenote – this is the precursor to my “Data Science Deployed” opening keynote at PyConSE 2015.

The slides for “The Real Unsolved Problems in Data Science” are available on speakerdeck along with the full video. I wrote this for the more engineering-focused PyConIreland audience. These are the high level points, I did rather fill my hour:

Data Science is driven by companies needing new differentiation tactics (not by ‘big data’)
Problem 1 – People asking for too-complex stuff that’s not really feasible (‘magic’)
Problem 2 – Lack of statistical education for engineers – do go statistics courses!
Problem 3 – Dirty data is a huge cost – think about doing a Data Audit
Problem 4 – We need higher-level data cleaning APIs that understand human-level data (rather than numbers, strings and bools!) – much work is required here
Problem 5 – Visualisation with Python still hard and clunky, has a poor on-boarding experience for new users (and R does well here)
Problem 6 – Lots of go-faster/high-performance options but really Python should ‘handle this for us’ (and yes, I have written a book on this)
Problem 7 – Lack of shared vocabulary for statisticians & engineers
Problem 8 – Heterogeneous storage world is mostly non-Python (at least for high performance work), we need a “LAMP Stack for Data Science”
Problem 9 – Collaboration is still painful (but the IPython Notebook is improving this)
Problem 10 – We’re still building the same tools over and over (but the Notebook makes it easier) – we could do with some shared tools here
Linked Open Data is very useful and you should contribute to it and consume it
Our common tooling in Python is very powerful – please join numpy and scipy projects and contribute to the core
I noted a few times that the Python science stack works in Python 3 so you should just use Python 3.4+ for all new projects
PyData/EuroSciPy/SciPy/DataKind meetups are a great way to get involved
We need a “Design Patterns for Data Science with Python” book (and I want to know what you want to learn)

From discussions afterwards it seems that my message “you need clean data to do neat data science stuff” was well received. I’m certainly not the only person in the room battling with Unicode foolishness (not in Python of course as Python 3+ solves the Unicode problem :-).

Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.

Fourth PyDataLondon Meetup

pydata, Python September 5, 2014

We’ve just run our 4th PyDataLondon meetup (@PyDataLondon). Having over 500 members is superb for just 4 months growth, woot 🙂

Many thanks to @GoPivotalEMEA for hosting us.

We had 3 speakers and 1 lightning talk.

Here are my slides on “The High Performance Python Landscape”:

I’m still collecting data for my two surveys (to discuss at a future PyData when I’ve got enough data), one on Data Science training needs and one on Why Are More Companies Not Using Data Science?

Next Dirk spoke on Data for Good and datamining water sources in Tanzania including some very honest thoughts on how to (hopefully) leave behind working systems that local teams can maintain. Dirk’s talk is built on a project called Taarifa (and source) that our Florian helped build.

Finally we had Matt from Plot.ly over from San Francisco, he gave very compelling reasons to investigate the online visualisation (and data sharing) system for plot.ly. The matplotlib 1-line converter was particularly nice.

Tariq gave a lightning talk on his Make Your Own Mandelbrot book (aimed at kids and newbiews to Python), his slides are online.

We’ve got a growing collection of Offer/Want cards which help connect folk in the pub afterwards, we’ll keep building these up:

Our next event is on October 7th, be sure to follow the @pydatalondon twitter account and join the PyDataLondon meetup group to see the forthcoming announce. The RSVPs for the 4th event were filled in 2 hours of the general announce!

Slides for High Performance Python tutorial at EuroSciPy2014 + Book signing!

Life, pydata, Python August 30, 2014

Yesterday I taught an excerpt of my 2 day High Performance Python tutorial as a 1.5 hour hands-on lesson at EuroSciPy 2014 in Cambridge with 70 students:

We covered profiling (down to line-by-line CPU & memory usage), Cython (pure-py and OpenMP with numpy), Pythran, PyPy and Numba. This is an abridged set of slides from my 2 day tutorial, take a look at those details for the upcoming courses (including an intro to data science) we’re running in October.

I’ll add the video in here once it is released, the slides are below.

I also got to do a book-signing for our High Performance Python book (co-authored with Micha Gorelick), O’Reilly sent us 20 galley copies to give away. The finished printed book will be available via O’Reilly and Amazon in the next few weeks.

If you want to hear about our future courses then join our low-volume training announce list. I have a short (no-signup) survey about training needs for Pythonistas in data science, please fill that in to help me figure out what we should be teaching.

I also have a further survey on how companies are using (or not using!) data science, I’ll be using the results of this when I keynote at PyConIreland in October, your input will be very useful.

Here are the slides (License: CC By NonCommercial), there’s also source on github:

High Performance Python Training at EuroSciPy this afternoon

Python August 28, 2014

I’m training on High Performance Python this afternoon at EuroSciPy, my github source is here (as a shortlink: http://bit.ly/euroscipy2014hpc). There are prerequisites for the course.

This training is actually a tiny part of what I’ll teach on my 2 day High Performance Python course in London in October (along with a Data Science course). If you’re at EuroSciPy, please say Hi 🙂

Why are technical companies not using data science?

ArtificialIntelligence, Data science, pydata, Python August 26, 2014

Here’s a quick question. How come more technical companies aren’t making use of data science? By “technical” I mean any company with data and the smarts to spot that it has value, by “data science” I mean any technical means to exploit this data for financial gain (e.g. visualisation to guide decisions, machine learning, prediction).

I’m guessing that it comes down to an economic question – either it isn’t as valuable as some other activity (making mobile apps? improving UX on the website? paid marketing? expanding sales to new territories?) or it is perceived as being valuable but cannot be exploited (maybe due to lack of skills and training or data problems).

I’m thinking about this for my upcoming keynote at PyConIreland, would you please give me some feedback in the survey below (no sign-up required)?

To be clear – this is an anonymous survey, I’ll have no idea who gives the answers.

Create your free online surveys with SurveyMonkey , the world’s leading questionnaire tool.

If the above is interesting then note that we’ve got a data science training list where we make occasional announcements about our upcoming training and we have two upcoming training courses. We also discuss these topics at our PyDataLondon meetups. I also have a slightly longer survey (it’ll take you 2 minutes, no sign-up required), I’ll be discussing these results at the next PyDataLondon so please share your thoughts.

Ian Ozsvald
Read my book
Oreilly High Performance Python by Micha Gorelick & Ian Ozsvald
AI Consulting

Mor Consulting Ltd. is an A.I. focused consultancy offering strategic research and development owned by Ian Ozsvald, based in London (UK).
Co-organiser

PyData London provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other.
Trending Now
1
Leadership discussion session at PyDataLondon 2024
Data science, pydata, RebelAI
2
What I’ve been up to since 2022
pydata, Python
3
Upcoming discussion calls for Team Structure and Buidling a Backlog for data science leads
Data science, pydata, Python
4
My first commit to Pandas
Python
5
Skinny Pandas Riding on a Rocket at PyDataGlobal 2020
Data science, pydata, Python
Tags
Aim Api Artificial Intelligence Blog Brighton Conferences Cookbook Demo Ebook Email Emily Face Detection Few Days Google High Performance Iphone Kyran Laptop Linux London Lt Map Natural Language Processing Nbsp Nltk Numpy Optical Character Recognition Pycon Python Python Mailing Python Tutorial Robots Running Santiago Seb Skiff Slides Startups Tweet Tweets Twitter Ubuntu Ups Vimeo Wikipedia

Entrepreneurial Geekiness

My Keynote at PyConIreland 2014 – “The Real Unsolved Problems in Data Science”

Fourth PyDataLondon Meetup

Slides for High Performance Python tutorial at EuroSciPy2014 + Book signing!

High Performance Python Training at EuroSciPy this afternoon

Why are technical companies not using data science?

Read my book

AI Consulting

Co-organiser

Trending Now

Navigation

Recent Posts

About Ian

My Keynote at PyConIreland 2014 – “The Real Unsolved Problems in Data Science”

Fourth PyDataLondon Meetup

Slides for High Performance Python tutorial at EuroSciPy2014 + Book signing!

High Performance Python Training at EuroSciPy this afternoon

Why are technical companies not using data science?

Read my book

AI Consulting

Co-organiser

Trending Now

Tags

Navigation

Recent Posts

About Ian