Entrepreneurial Geekiness

Ian is a London-based independent Chief Data Scientist who coaches teams, teaches and creates data products. More about Ian here.
Entrepreneurial Geekiness
Ian is a London-based independent Chief Data Scientist who coaches teams, teaches and creates data products.
Coaching
Training
Jobs
Products
Consulting

New public course on Successfully Delivering Data Science Projects for Feb 1st

During my Pythonic data science team coaching I see various problems coming up that I’ve helped solve before. Based on these observations and my prior IP design and delivery for clients over the years I’ve put together a 1 day public course aimed at data scientists (any level) who want to be more confident with lower-risk approaches to delivering data science projects.

Successfully Delivering Data Science Projects runs on Friday February 1st 2019, early bird tickets have sold out, a handful of regular tickets remain (be quick). This course suits any data scientist who has discovered just how vague and confusing a research to deployment project can be, who’d like to be more confident in their plans and outcomes.

I’ve developed these techniques whilst working with the teams at companies like Hotels.com, Channel 4, QBE Insurance and smaller companies across domains including media, energy, adtech and travel.

The topics covered in the course will include:

  • Building a Project Plan that derisks uncertainties and identifies expected deliverables, based on a well-understood problem and data set (but starting from…we don’t know what we have or really what we want!)
  • Scenarios based on real-world (and sometimes very difficult) experience that have to be solved in small teams
  • Team best practice with practical exercises covering coding standards, code reviews, testing (during R&D and in production) and retrospectives using tools such as pyjanitor, engarde, pandas profiling and discover-feature-relationships.
  • Group discussion around the problems everyone faces, to be solved or moved forwards by everyone in the group (the group will have more experience than any single teacher)
  • A slack channel that lives during and after the course for continued support and discussion among the attendees

You’re welcome to get in contact if you have questions. Further announces will be made on my low-volume training email list. I will also link to upcoming courses from my every-two-weeks data scientist jobs and thoughts email list.

 


Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.
Read More

Talking on “High Performance Python” at Linuxing In London last week

Mario of PyLondonium (where I gave a keynote talk earlier this year) was kind enough to ask me along to speak at Linuxing in London. I gave an updated version of one of my older High Performance Python talks based on material I’d covered in my book, to show the more-engineering audience how to go about profiling and speeding up Python code. The audience was lovely, many were new to Python and also first-timers at the meetup, here’s half the room:

Audience at Linuxing in London

We covered:

  • Profiling with line_profiler (in a Notebook – thanks Robert!) to identify slow code in functions
  • Using numpy incorrectly to try to get a speed up, then profiling it to see why it didn’t work
  • Using Anaconda’s Numba on the numpy code to get a 20* overall speedup
  • Using a different algorithm entirely to get a further 1000* speedup (!)
  • Thoughts on the two main ways to get a speed-up (do less work or spend less time waiting for data)
  • Looking at Py-Spy which hooks into an existing process to profile on-the-fly – a take-away for anyone in an engineering team

Here’s a link to the slides.Thanks to Brian, David and the other organisers for hosting the four of us, it was a fun evening.

I also mentioned my London-based jobs and training email lists and promised to link them here. It was fun to speak to a less-data-science focused audience (where PyData is pretty much my bubble-reality nowadays), especially to meet new folk transitioning into Python from entirely non-technical careers. I reminded everyone that they’re most welcome to visit our PyDataLondon meetups to widen their network, of course London Python and PyConUK should definitely be on your radar too.

 


Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.
Read More

“On the Diagramatic Diagnosis of Data” at BudapestBI 2018

A couple of days back I spoke on using diagrams (matplotlib, seaborn, pandas profiling) to diagnose data during the exploratory data analysis phase. I also introduced my new tool discover_feature_relationships which helps prioritise which features to investigate in a new dataset by identifying pairs of features that have some sort of ‘interesting’ relationship. We finished with a short note on Bertil’s ‘data story‘ concept for documenting the EDA process.

I had a lovely room of international folk. We had a higher proportion of Hungarians this year as the organiser Bence has worked to build up the local community. This was followed by a variety of interesting questions around ways to tackle the EDA challenge:

BudapestBI room for my talk

My new tool discover_feature_relationships uses a Random Forest to identify predictive (and possibly non-linear) relationships between all pairs of columns in a dataframe. Typically we’d like to identify which features identify a target in machine learning, here instead I’m asking “what relationships exist throughout my data?”. I’ve used this to help me understand how data ‘works’, this is especially useful in semi-structured business data dumps which aren’t necessarily the right source of data to solve a particular task, but where up-front we don’t know what we have and what we need. I’d certainly welcome feedback on this idea, you’ll see diagrams and example for the Boston and Titanic datasets on the github page.

Next year I’d like to run some courses on the subject of successful project delivery (which includes “what have I got and what do I need to solve this challenge?!”), if you’d like to hear about that then you might want to join my training notification list.

Here are the slides for my talk:

 

 


Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.
Read More

On helping to open the inaugural PyDataPrague meetup

A couple of weeks back I had the wonderful opportunity to open the PyDataPrague meetup – this is the second meetup I’ve opened after our PyDataLondon started back in 2014. The core organisers Ondřej Kokeš, Jakub Urban and Jan Pipek asked me to give two short talks on:

We had over 100 people in the room, many from the extant local Python meetup.

 

Štěpán Roučka also gave a talk on SymPy with lots of lovely demos (video). The organisers were lovely – do please think on speaking out at PyDataPrague, you’ll get a lovely reception. I also got to see the wonderful architecture in Prague and even visit the local observatory where we saw the sun’s corona.


Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.
Read More

On receiving the Community Leadership Award at the NumFOCUS Summit 2018

At the end of September I was honoured to receive the Community Leadership Award from NumFOCUS for my work building out the PyData community here in London and at associated events. This was awarded at the NumFOCUS 2018 Summit, I couldn’t attend the New York event and James Powell gave my speech on my behalf (thanks James!).

I’m humbled to be singled out for the award – things only worked out so well because of the work of all of my colleagues (and alumni) at PyDataLondon and all the other wonderful folk at events like PyDataBerlin, PyDataAmsterdam, EuroPython (which has had a set of PyData sub-tracks) and PyConUK (with similar sub-tracks).

NumFOCUS posted a blog entry on the awards, in addition Kelle Cruz received the Project Sustainability Award and Shahrokh Mortazavi received the Corporate Stewardship Award.

Cecilia Liao and Emlyn Clay and myself started the first PyDataLondon conference in 2014 with lots of help, guidance and nudging from NumFOCUS (notably Leah – thanks!), James and via Continuum (now Anaconda Inc) Travis and Peter. Many thanks to you all for your help – we’re now at 8,000+ members and our monthly events have 200+ attendees thanks to AHL’s hosting.

If you don’t know NumFOCUS – they’re the group who do a lot of the background support for a number of our PyData ecosystem packages (including numpy, Jupyter and Pandas and beyond to R and Julia), back the PyData conference series and help lots of associated events and group. They’re a non profit and an awful lot of work goes on that you never see – if you’d like to provide financial support, you can setup a monthly sponsorship here. If you currently don’t provide any contributions back into our open source ecosystem – setting up a regular monthly payment is the easiest possible thing you could do to help NumFOCUS raise more money which helps more development occur in our ecosystem.

https://twitter.com/holdenweb/status/1044221855341645825


Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.
Read More