High Performance Python Book pydata PythonOctober 11, 2014

My Keynote at PyConIreland 2014 – “The Real Unsolved Problems in Data Science”

I’ve just given the opening keynote here at PyConIreland 2014 – many thanks to the organisers for letting me get on stage. This is based on 15 years experience running my own consultancies in Data Science and Artificial Intelligence. (Small note – with the pic below James mis-tweeted ‘sexist’ instead of ‘sexiest’ (from my opening slide) <sigh>)

Sidenote – this is the precursor to my “Data Science Deployed” opening keynote at PyConSE 2015.

The slides for “The Real Unsolved Problems in Data Science” are available on speakerdeck along with the full video. I wrote this for the more engineering-focused PyConIreland audience. These are the high level points, I did rather fill my hour:

Data Science is driven by companies needing new differentiation tactics (not by ‘big data’)
Problem 1 – People asking for too-complex stuff that’s not really feasible (‘magic’)
Problem 2 – Lack of statistical education for engineers – do go statistics courses!
Problem 3 – Dirty data is a huge cost – think about doing a Data Audit
Problem 4 – We need higher-level data cleaning APIs that understand human-level data (rather than numbers, strings and bools!) – much work is required here
Problem 5 – Visualisation with Python still hard and clunky, has a poor on-boarding experience for new users (and R does well here)
Problem 6 – Lots of go-faster/high-performance options but really Python should ‘handle this for us’ (and yes, I have written a book on this)
Problem 7 – Lack of shared vocabulary for statisticians & engineers
Problem 8 – Heterogeneous storage world is mostly non-Python (at least for high performance work), we need a “LAMP Stack for Data Science”
Problem 9 – Collaboration is still painful (but the IPython Notebook is improving this)
Problem 10 – We’re still building the same tools over and over (but the Notebook makes it easier) – we could do with some shared tools here
Linked Open Data is very useful and you should contribute to it and consume it
Our common tooling in Python is very powerful – please join numpy and scipy projects and contribute to the core
I noted a few times that the Python science stack works in Python 3 so you should just use Python 3.4+ for all new projects
PyData/EuroSciPy/SciPy/DataKind meetups are a great way to get involved
We need a “Design Patterns for Data Science with Python” book (and I want to know what you want to learn)

From discussions afterwards it seems that my message “you need clean data to do neat data science stuff” was well received. I’m certainly not the only person in the room battling with Unicode foolishness (not in Python of course as Python 3+ solves the Unicode problem :-).

Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.

25 Comments

pyacademy
October 11, 2014 at 4:21 pm
RT @ianozsvald: My Keynote at PyConIreland 2014 - "The Real Unsolved Problems in Data Science": I've just given the openin... http://t.co/6…
pyconireland
October 11, 2014 at 4:27 pm
RT @ianozsvald: My Keynote at PyConIreland 2014 - "The Real Unsolved Problems in Data Science": I've just given the openin... http://t.co/6…
fluffyemily
October 11, 2014 at 4:28 pm
RT @ianozsvald: My Keynote at PyConIreland 2014 - "The Real Unsolved Problems in Data Science": I've just given the openin... http://t.co/6…
eoinbrazil
October 11, 2014 at 4:58 pm
RT @ianozsvald: My Keynote at PyConIreland 2014 - "The Real Unsolved Problems in Data Science": I've just given the openin... http://t.co/6…
maeglin23
October 11, 2014 at 5:03 pm
RT @ianozsvald: My Keynote at PyConIreland 2014 - "The Real Unsolved Problems in Data Science": I've just given the openin... http://t.co/6…
kyran_dale
October 11, 2014 at 5:24 pm
RT @ianozsvald: My Keynote at PyConIreland 2014 - "The Real Unsolved Problems in Data Science": I've just given the openin... http://t.co/6…
bfaludi
October 11, 2014 at 5:33 pm
RT @ianozsvald: My Keynote at PyConIreland 2014 - "The Real Unsolved Problems in Data Science": I've just given the openin... http://t.co/6…
wesmckinn
October 11, 2014 at 6:21 pm
RT @ianozsvald: My Keynote at PyConIreland 2014 - "The Real Unsolved Problems in Data Science": I've just given the openin... http://t.co/6…
xprimexinverse
October 11, 2014 at 6:24 pm
RT @ianozsvald: My Keynote at PyConIreland 2014 - "The Real Unsolved Problems in Data Science": I've just given the openin... http://t.co/6…
riannone
October 11, 2014 at 6:54 pm
RT @ianozsvald: My Keynote at PyConIreland 2014 - "The Real Unsolved Problems in Data Science": I've just given the openin... http://t.co/6…
NFauchereau
October 11, 2014 at 6:56 pm
RT @ianozsvald: My Keynote at PyConIreland 2014 - "The Real Unsolved Problems in Data Science": I've just given the openin... http://t.co/6…
akm
October 11, 2014 at 6:56 pm
RT @ianozsvald: My Keynote at PyConIreland 2014 - "The Real Unsolved Problems in Data Science": I've just given the openin... http://t.co/6…
rgrrl
October 11, 2014 at 7:01 pm
RT @ianozsvald: My Keynote at PyConIreland 2014 - "The Real Unsolved Problems in Data Science": I've just given the openin... http://t.co/6…
realchrisdev
October 11, 2014 at 7:13 pm
RT @ianozsvald: My Keynote at PyConIreland 2014 - "The Real Unsolved Problems in Data Science": I've just given the openin... http://t.co/6…
tartley
October 11, 2014 at 7:39 pm
RT @ianozsvald: My Keynote at PyConIreland 2014 - "The Real Unsolved Problems in Data Science": I've just given the openin... http://t.co/6…
chrys
October 11, 2014 at 8:07 pm
RT @ianozsvald: My Keynote at PyConIreland 2014 - "The Real Unsolved Problems in Data Science": I've just given the openin... http://t.co/6…
researcher00000
October 11, 2014 at 9:22 pm
RT @ianozsvald: My Keynote at PyConIreland 2014 - "The Real Unsolved Problems in Data Science": I've just given the openin... http://t.co/6…
ThomasArildsen
October 11, 2014 at 10:24 pm
RT @ianozsvald: My Keynote at PyConIreland 2014 - "The Real Unsolved Problems in Data Science": I've just given the openin... http://t.co/6…
nikhilkabbin
October 12, 2014 at 4:05 am
RT @ianozsvald: My Keynote at PyConIreland 2014 - "The Real Unsolved Problems in Data Science": I've just given the openin... http://t.co/6…
EzzeriEsa
October 12, 2014 at 6:33 am
RT @ianozsvald: My Keynote at PyConIreland 2014 - "The Real Unsolved Problems in Data Science": I've just given the openin... http://t.co/6…
GaelVaroquaux
October 12, 2014 at 11:39 am
RT @ianozsvald: My Keynote at PyConIreland 2014 - "The Real Unsolved Problems in Data Science": I've just given the openin... http://t.co/6…
EmreSevinc
October 12, 2014 at 11:41 am
RT @ianozsvald: My Keynote at PyConIreland 2014 - "The Real Unsolved Problems in Data Science": I've just given the openin... http://t.co/6…
StephenPiment
October 12, 2014 at 7:05 pm
RT @ianozsvald: My Keynote at PyConIreland 2014 - "The Real Unsolved Problems in Data Science": I've just given the openin... http://t.co/6…
LoveraGloria
October 21, 2014 at 9:23 am
RT @ianozsvald: My Keynote at PyConIreland 2014 - "The Real Unsolved Problems in Data Science": I've just given the openin... http://t.co/6…
matt1ab
October 21, 2014 at 9:30 am
RT @ianozsvald: My Keynote at PyConIreland 2014 - "The Real Unsolved Problems in Data Science": I've just given the openin... http://t.co/6…

My Keynote at PyConIreland 2014 – “The Real Unsolved Problems in Data Science”

25 Comments

Navigation

Recent Posts

About Ian