About

Ian Ozsvald picture

This is Ian Ozsvald's blog (@IanOzsvald), I'm an entrepreneurial geek, a Data Science/ML/NLP/AI consultant, founder of the Annotate.io social media mining API, author of O'Reilly's High Performance Python book, co-organiser of PyDataLondon, co-founder of the SocialTies App, author of the A.I.Cookbook, author of The Screencasting Handbook, a Pythonista, co-founder of ShowMeDo and FivePoundApps and also a Londoner. Here's a little more about me.

High Performance Python book with O'Reilly View Ian Ozsvald's profile on LinkedIn Visit Ian Ozsvald's data science consulting business Protecting your bits. Open Rights Group

3 April 2015 - 11:05PyDataParis 2015 and “Cleaning Confused Collections of Characters”

I’m at PyDataParis, this is the first PyData in France and we have a 300-strong turn-out. In my talk I asked about the split of academic and industrial folk, we have 70% industrialists here (at least – in my talk of 70 folk). The bulk of the attendees are in the Intro track and maybe the split is different in there. All slides are up, videos are following, see them here.

Here’s a photo of Gael giving a really nice opening keynote on Scikit-Learn:

I spoke on data cleaning with text data, I packed quite a bit into my 40 minutes and got a nice set of questions. The slides are below, it covers:

  • Data extraction from text files, PDF, HTML/XML and images
  • Merging on columns of data
  • Correctly processing datetimes from files and the dangers of relying on the pandas defaults
  • Normalising text columns so we could join on otherwise messy data
  • Automated data transformation using my annotate.io (Python demo)
  • Ideas on automated feature extraction
  • Ideas on automating visualisation for new, messy datasets to get a “bird’s eye view”
  • Tips on getting started – make a Gold Standard!

One question concerned the parsing of datetime strings from unusual sources. I’d mentioned dateutil‘s parser in the talk and a second parser is delorean. In addition I’ve also seen arrow (an extension of the standard datetime) which has a set of parsers including one for ISO8601. The parsedatetime module has an NLP module to convert statements like “tomorrow” into a datetime.

I don’t know of other, better parsers – do you? In particular I want one that’ll take a list of datetimes and return one consistent converter that isn’t confused by individual instances (e.g. “1/1″ is MM/DD or DD/MM ambiguous).

I’m also asking for feedback on the subject of automated feature extraction and automated column-join tools for messy data. If you’ve got ideas on these subjects I’d love to hear from you.

In addition I was reminded of DiffBot, it uses computer vision and NLP to extract meaning from web pages. I’ve never tried it, can any of you comment on its effectiveness? Olivier Grisel mentioned pyquery to me, it is an lxml parser which lets you make jquery-like queries on HTML.

update I should have mentioned chardet, it detects encodings (UTF8, CP1252 etc) from raw text, very useful if you’re trying to figure out the encoding for a collection of bytes off of a random data source! libextract looks like a young but nice tool for extracting text blocks from HTML/XML sources. boltons is a nice collection of bolton-tools to the standard library (e.g. timeutils, strutils, tableutils).

It might also be worth noting some useful data sources from which you can extract semi-structured data, e.g. ‘tech tags’ from stackexchange‘s forums (and I also see a new hackernews dump).

Camilla Montonen has just spoken on Rush Hour Dynamics, visualising London Underground behaviour. She noted graph-tool, a nice graphing/viz library I’d not seen before. Fabian has just shown me his new project, it collects NLP IPython Notebooks and lists them, it tries to extract titles or summaries (which is a gnarly sub-problem!). The AXA Data Innovation Lab have a nice talk on explaining machine learned models.

Gilles Loupe’s slides for his ML/sklearn talk on trees and boosting are online, as are Alexandre Gramfort‘s on sklearn linear models.


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

13 Comments | Tags: Data science, Life, pydata, Python

21 February 2015 - 21:05Data-Science stuff I’m doing this year

2014 was an interesting year, 2015 looks to be even richer. Last year I got to publish my High Performance Python book, help co-organise the rather successful PyDataLondon2014 conference, teach High Performance in public (slides online) and in private, keynote on The Real Unsolved Problems in Data Science and start my ModelInsight AI agency. That was a busy year (!) but deeply rewarding.

My High Performance Python published with O’Reilly in 2014

 

This year our consulting is branching out – we’ve already helped a new medical start-up define their data offering, I’m mentoring another data scientist (to avoid 10 years of my mistakes!) and we’re deploying new text mining IP for existing clients. We’ve got new private training this April for Machine Learning (scikit-learn) and High Performance Python (announce list) and Spark is on my radar.

Apache Spark maxing out 8 cores on my laptop

Python’s role in Data Science has grown massively (I think we have 5 euro-area Python-Data-Science conferences this year) and I’m keen to continue building the London and European scenes.

I’m particularly interested in dirty data and ways we can efficiently clean it up (hence my Annotate.io lightning talk a week back). If you have problems with dirty data I’d love to chat and maybe I can share some solutions.

For PyDataLondon-the-conference we’re getting closer to fixing our date (late May/early June), join this announce list to hear when we have our key dates. In a few weeks we have our 10th monthly PyDataLondon meetup, you should join the group as I write up each event for those who can’t attend so you’ll always know what’s going on. To keep the meetup from degenerating into a shiny-suit-fest I’ve setup a separate data science jobs list, I curate it and only send relevant contract/permie job announces.

This year I hope to be at PyDataParis, PyConSweden, PyDataLondon, EuroSciPy and PyConUK – do come say hello if you’re around!


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

5 Comments | Tags: ArtificialIntelligence, Data science, High Performance Python Book, Life, pydata, Python

19 February 2015 - 11:35Starting Spark 1.2 and PySpark (and ElasticSearch and PyPy)

The latest PySpark (1.2) is feeling genuinely useful, late last year I had a crack at running Apache Spark 1.0 and PySpark and it felt a bit underwhelming (too much fanfare, too many bugs). The media around Spark continues to grow and e.g. today’s hackernews thread on the new DataFrame API has a lot of positive discussion and the lazily evaluated pandas-like dataframes built from a wide variety of data sources feels very powerful. Continuum have also just announced PySpark+GlusterFS.

One surprising fact is that Spark is Python 2.7 only at present, feature request 4897 is for Python 3 support (go vote!) which requires some cloud pickling to be fixed. Using the end-of-line Python release feels a bit daft. I’m using Linux Mint 17.1 which is based on Ubuntu 14.04 64bit. I’m using the pre-built spark-1.2.0-bin-hadoop2.4.tgz via their downloads page and ‘it just works’. Using my global Python 2.7.6 and additional IPython install (via apt-get):

spark-1.2.0-bin-hadoop2.4 $ IPYTHON=1 bin/pyspark
...
IPython 1.2.1 -- An enhanced Interactive Python.
...
 Welcome to
 ____              __
 / __/__  ___ _____/ /__
 _\ \/ _ \/ _ `/ __/  '_/
 /__ / .__/\_,_/_/ /_/\_\   version 1.2.0
 /_/
Using Python version 2.7.6 (default, Mar 22 2014 22:59:56)
 SparkContext available as sc.
 >>>

Note the IPYTHON=1, without that you get a vanilla shell, with it it’ll use IPython if it is in the search path. IPython lets you interactively explore the “sc” Spark context using tab completion which really helps at the start. To run one of the included demos (e.g. wordcount) you can use the spark-submit script:

spark-1.2.0-bin-hadoop2.4/examples/src/main/python 
$ ../../../../bin/spark-submit wordcount.py kmeans.py  # count words in kmeans.py

For my use case we were initially after sparse matrix support, sadly they’re only available for Scala/Java at present. By stepping back from my sklean/scipy sparse solution for a minute and thinking a little more map/reduce I could just as easily split the problem into number of counts and that parallelises very well in Spark (though I’d love to see sparse matrices in PySpark!).

I’m doing this with my contract-recruitment client via my ModelInsight as we automate recruitment, there’s a press release out today outlining a bit of what we do. One of the goals is to move to a more unified research+deployment approach, rather than lots of tooling in R&D which we then streamline for production, instead we hope to share similar tooling between R&D and production so deployment and different scales of data are ‘easier’.

I tried the latest PyPy 2.5 (running Python 2.7) and it ran PySpark just fine. Using PyPy 2.5 a  prime-search example takes 6s vs 39s with vanilla Python 2.7, so in-memory processing using RDDs rather than numpy objects might be quick and convenient (has anyone trialled this?). To run using PyPy set PYSPARK_PYTHON:

$ PYSPARK_PYTHON=~/pypy-2.5.0-linux64/bin/pypy ./pyspark

I’m used to working with Anaconda environments and for Spark I’ve setup a Python 2.7.8 environment (“conda create -n spark27 anaconda python=2.7″) & IPython 2.2.0. Whichever Python is in the search path or is specified at the command line is used by the pyspark script.

The next challenge to solve was integration with ElasticSearch for storing outputs. The official docs are a little tough to read as a non-Java/non-Hadoop programmer and they don’t mention PySpark integration, thankfully there’s a lovely 4-part blog sequence which “just works”:

  1. ElasticSearch and Python (no Spark but it sets the groundwork)
  2. Reading & Writing ElasticSearch using PySpark
  3. Sparse Matrix Multiplication using PySpark
  4. Dense Matrix Multiplication using PySpark

To summarise the above with a trivial example, to output to ElasticSearch using a trivial local dictionary and no other data dependencies:

$ wget http://central.maven.org/maven2/org/elasticsearch/
 elasticsearch-hadoop/2.1.0.Beta2/elasticsearch-hadoop-2.1.0.Beta2.jar
$ ~/spark-1.2.0-bin-hadoop2.4/bin/pyspark --jars 
 elasticsearch-hadoop-2.1.0.Beta2.jar
>>> res=sc.parallelize([1,2,3,4])
 >>> res2=res.map(lambda x: ('key', {'name': str(x), 'sim':0.22}))
 >>> res2.collect()
 [('key', {'name': '1', 'sim': 0.22}),
 ('key', {'name': '2', 'sim': 0.22}),
 ('key', {'name': '3', 'sim': 0.22}),
 ('key', {'name': '4', 'sim': 0.22})]

>>>res2.saveAsNewAPIHadoopFile(path='-', 
 outputFormatClass="org.elasticsearch.hadoop.mr.EsOutputFormat", 
 keyClass="org.apache.hadoop.io.NullWritable", 
 valueClass="org.elasticsearch.hadoop.mr.LinkedMapWritable", 
 conf={"es.resource": "myindex/mytype"})

The above creates a list of 4 dictionaries and then sends them to a local ES store using “myindex” and “mytype” for each new document.  Before I found the above I used this older solution which also worked just fine.

Running the local interactive session using a mock cluster was pretty easy. The docs for spark-standalone are a good start:

sbin $ ./start-master.sh
 #  the log (full path is reported by the script so you could `tail -f `) shows
 # 15/02/17 14:11:46 INFO Master: 
 # Starting Spark master at spark://ian-Latitude-E6420:7077
 # which gives the link to the browser view of the master machine which is 
 # probably on :8080 (as shown here http://www.mccarroll.net/blog/pyspark/).
#Next start a single worker:
sbin $ ./start-slave.sh 0 spark://ian-Latitude-E6420:7077
 # and the logs will show a link to another web page for each worker 
 # (probably starting at :4040).
#Next you can start a pySpark IPython shell for local experimentation:
$ IPYTHON=1 ~/data/libraries/spark-1.2.0-bin-hadoop2.4/bin/pyspark 
  --master spark://ian-Latitude-E6420:7077
 # (and similarity you could run a spark-shell to do the same with Scala)
#Or we can run their demo code using the master node you've configured setup:
$ ~/spark-1.2.0-bin-hadoop2.4/bin/spark-submit 
  --master spark://ian-Latitude-E6420:7077 
  ~/spark-1.2.0-bin-hadoop2.4/examples/src/main/python/wordcount.py README.txt

Note if you tried to run the above spark-submit (which specifies the –master to connect to) and you didn’t have a master node, you’d see log messages like:

15/02/17 14:14:25 INFO AppClient$ClientActor: 
 Connecting to master spark://ian-Latitude-E6420:7077...
15/02/17 14:14:25 WARN AppClient$ClientActor: 
 Could not connect to akka.tcp://sparkMaster@ian-Latitude-E6420:7077: 
 akka.remote.InvalidAssociation: 
 Invalid address: akka.tcp://sparkMaster@ian-Latitude-E6420:7077
15/02/17 14:14:25 WARN Remoting: Tried to associate with 
 unreachable remote address 
 [akka.tcp://sparkMaster@ian-Latitude-E6420:7077]. 
 Address is now gated for 5000 ms, all messages to this address will 
 be delivered to dead letters. 
 Reason: Connection refused: ian-Latitude-E6420/127.0.1.1:7077

If you had a master node running but you hadn’t setup a worker node then after doing the spark-submit it’ll hang for 5+ seconds and then start to report:

15/02/17 14:16:16 WARN TaskSchedulerImpl: 
 Initial job has not accepted any resources; 
 check your cluster UI to ensure that workers are registered and 
 have sufficient memory

and if you google that without thinking about the worker node then you’d come to this diagnostic page  which leads down a small rabbit hole…

Stuff I’d like to know:

  • How do I read easily from MongoDB using an RDD (in Hadoop format) in PySpark (do you have a link to an example?)
  • Who else in London is using (Py)Spark? Maybe catch-up over a coffee?

Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

10 Comments | Tags: ArtificialIntelligence, Data science, Life, pydata, Python

8 February 2015 - 23:54Lightning talk at PyDataLondon for Annotate

At this week’s PyDataLondon I did a 5 minute lightning talk on the Annotate text-cleaning service for data scientists that I made live recently. It was good to have a couple of chats after with others who are similarly bored of cleaning their text data.

The goal is to make it quick and easy to clean data so you don’t have to figure out a method yourself. Behind the scenes it uses ftfy to fix broken unicode, unidecode to remove foreign characters if needed and a mix of regular-expressions that are written on the fly depending on the data submitted.

I suspect that adding some datetime-fixers will be a next step (dealing with UK data when tools often assume that 1/3/13 is 3rd January in US-format is a pain), maybe a fact-extractor will follow.

 

 


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

5 Comments | Tags: Data science, pydata, Python

18 January 2015 - 19:40Data Science Jobs UK (ModelInsight) – Python Jobs Email List

I’ve had people asking me about how they can find data scientists in London and through our PyDataLondon meetup we’ve had members announcing jobs. There’s no central location for data science jobs so I’ve put together a new list (administered through my ModelInsight agency).

Sign-up to the list here: Data Science Jobs UK (ModelInsight)

  • Aimed at Data Science jobs in the UK
  • Mostly Python (maybe R, Matlab, Julia if relevant)
  • It’ll include Permie and Contract jobs

The list will only work if you can trust it so:

  • Your email is private (it is never shared)
  • The list is on MailChimp so you can unsubscribe at any time
  • We vet the job posts and only forward them if they’re in the interests of the list
  • Nobody else can post into the list (all jobs are forwarded just by us)
  • It’ll be low volume and all posts will be very relevant

Sign-up to the list here: Data Science Jobs UK (ModelInsight)

Obviously if you’re interested in joining the London Python data science community then come along to our PyDataLondon meetups.


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

6 Comments | Tags: Data science, pydata, Python

11 October 2014 - 16:18My Keynote at PyConIreland 2014 – “The Real Unsolved Problems in Data Science”

I’ve just given the opening keynote here at PyConIreland 2014 – many thanks to the organisers for letting me get on stage. This is based on 15 years experience running my own consultancies in Data Science and Artificial Intelligence. (Small note  – with the pic below James mis-tweeted ‘sexist’ instead of ‘sexiest’ (from my opening slide) <sigh>)

 

The slides for “The Real Unsolved Problems in Data Science” are available on speakerdeck along with the full video. I wrote this for the more engineering-focused PyConIreland audience. These are the high level points, I did rather fill my hour:

  • Data Science is driven by companies needing new differentiation tactics (not by ‘big data’)
  • Problem 1 – People asking for too-complex stuff that’s not really feasible (‘magic’)
  • Problem 2 – Lack of statistical education for engineers – do go statistics courses!
  • Problem 3 – Dirty data is a huge cost – think about doing a Data Audit
  • Problem 4 – We need higher-level data cleaning APIs that understand human-level data (rather than numbers, strings and bools!) – much work is required here
  • Problem 5 – Visualisation with Python still hard and clunky, has a poor on-boarding experience for new users (and R does well here)
  • Problem 6 – Lots of go-faster/high-performance options but really Python should ‘handle this for us’ (and yes, I have written a book on this)
  • Problem 7 – Lack of shared vocabulary for statisticians & engineers
  • Problem 8 – Heterogeneous storage world is mostly non-Python (at least for high performance work), we need a “LAMP Stack for Data Science”
  • Problem 9 – Collaboration is still painful (but the IPython Notebook is improving this)
  • Problem 10 – We’re still building the same tools over and over (but the Notebook makes it easier) - we could do with some shared tools here
  • Linked Open Data is very useful and you should contribute to it and consume it
  • Our common tooling in Python is very powerful – please join numpy and scipy projects and contribute to the core
  • I noted a few times that the Python science stack works in Python 3 so you should just use Python 3.4+ for all new projects
  • PyData/EuroSciPy/SciPy/DataKind meetups are a great way to get involved
  • We need a “Design Patterns for Data Science with Python” book (and I want to know what you want to learn)

From discussions afterwards it seems that my message “you need clean data to do neat data science stuff” was well received. I’m certainly not the only person in the room battling with Unicode foolishness (not in Python of course as Python 3+ solves the Unicode problem :-).


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

25 Comments | Tags: High Performance Python Book, pydata, Python

5 September 2014 - 12:20Fourth PyDataLondon Meetup

We’ve just run our 4th PyDataLondon meetup (@PyDataLondon). Having over 500 members is superb for just 4 months growth, woot :-)

Many thanks to @GoPivotalEMEA for hosting us.

We had 3 speakers and 1 lightning talk.

Here are my slides on “The High Performance Python Landscape”:

I’m still collecting data for my two surveys (to discuss at a future PyData when I’ve got enough data), one on Data Science training needs and one on Why Are More Companies Not Using Data Science?

Next Dirk spoke on Data for Good and datamining water sources in Tanzania including some very honest thoughts on how to (hopefully) leave behind working systems that local teams can maintain. Dirk’s talk is built on a project called Taarifa (and source) that our Florian helped build.

Finally we had Matt from Plot.ly over from San Francisco, he gave very compelling reasons to investigate the online visualisation (and data sharing) system for plot.ly. The matplotlib 1-line converter was particularly nice.

Tariq gave a lightning talk on his Make Your Own Mandelbrot book (aimed at kids and newbiews to Python), his slides are online.

We’ve got a growing collection of Offer/Want cards which help connect folk in the pub afterwards, we’ll keep building these up:

Our next event is on October 7th, be sure to follow the @pydatalondon twitter account and join the PyDataLondon meetup group to see the forthcoming announce. The RSVPs for the 4th event were filled in 2 hours of the general announce!


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

3 Comments | Tags: pydata, Python

30 August 2014 - 12:06Slides for High Performance Python tutorial at EuroSciPy2014 + Book signing!

Yesterday I taught an excerpt of my 2 day High Performance Python tutorial as a 1.5 hour hands-on lesson at EuroSciPy 2014 in Cambridge with 70 students:

IMG_20140828_155857

We covered profiling (down to line-by-line CPU & memory usage), Cython (pure-py and OpenMP with numpy), Pythran, PyPy and Numba. This is an abridged set of slides from my 2 day tutorial, take a look at those details for the upcoming courses (including an intro to data science) we’re running in October.

I’ll add the video in here once it is released, the slides are below.

I also got to do a book-signing for our High Performance Python book (co-authored with Micha Gorelick), O’Reilly sent us 20 galley copies to give away. The finished printed book will be available via O’Reilly and Amazon in the next few weeks.

Book signing at EuroSciPy 2014

If you want to hear about our future courses then join our low-volume training announce list. I have a short (no-signup) survey about training needs for Pythonistas in data science, please fill that in to help me figure out what we should be teaching.

I also have a further survey on how companies are using (or not using!) data science, I’ll be using the results of this when I keynote at PyConIreland in October, your input will be very useful.

Here are the slides (License: CC By NonCommercial), there’s also source on github:


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

No Comments | Tags: Life, pydata, Python

26 August 2014 - 21:35Why are technical companies not using data science?

Here’s a quick question. How come more technical companies aren’t making use of data science? By “technical” I mean any company with data and the smarts to spot that it has value, by “data science” I mean any technical means to exploit this data for financial gain (e.g. visualisation to guide decisions, machine learning, prediction).

I’m guessing that it comes down to an economic question – either it isn’t as valuable as some other activity (making mobile apps? improving UX on the website? paid marketing? expanding sales to new territories?) or it is perceived as being valuable but cannot be exploited (maybe due to lack of skills and training or data problems).

I’m thinking about this for my upcoming keynote at PyConIreland, would you please give me some feedback in the survey below (no sign-up required)?

To be clear – this is an anonymous survey, I’ll have no idea who gives the answers.

Create your free online surveys with SurveyMonkey , the world’s leading questionnaire tool.

 

If the above is interesting then note that we’ve got a data science training list where we make occasional announcements about our upcoming training and we have two upcoming training courses. We also discuss these topics at our PyDataLondon meetups. I also have a slightly longer survey (it’ll take you 2 minutes, no sign-up required), I’ll be discussing these results at the next PyDataLondon so please share your thoughts.


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

3 Comments | Tags: ArtificialIntelligence, Data science, pydata, Python

20 August 2014 - 21:24Data Science Training Survey

I’ve put together a short survey to figure out what’s needed for Python-based Data Science training in the UK. If you want to be trained in strong data science, analysis and engineering skills please complete the survey, it doesn’t need any sign-up and will take just a couple of minutes. I’ll share the results at the next PyDataLondon meetup.

If you want training you probably want to be on our training announce list, this is a low volume list (run by MailChimp) where we announce upcoming dates and suggest topics that you might want training around. You can unsubscribe at any time.

I’ve written about the current two courses that run in October through ModelInsight, one focuses on improving skills around data science using Python (including numpy, scipy and TDD), the second on high performance Python (I’ve now finished writing O’Reilly’s High Performance Python book). Both courses focus on practical skills, you’ll walk away with working systems and a stronger understanding of key Python skills. Your developer skills will be stronger as will your debugging skills, in the longer run you’ll develop stronger software with fewer defects.

If you want to talk about this, come have a chat at the next PyData London meetup or in the pub after.


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

3 Comments | Tags: Data science, pydata, Python