Entrepreneurial Geekiness

Ian is a London-based independent Chief Data Scientist who coaches teams, teaches and creates data products. More about Ian here.
Entrepreneurial Geekiness
Ian is a London-based independent Chief Data Scientist who coaches teams, teaches and creates data products.
Coaching
Training
Jobs
Products
Consulting

New public course on Successfully Delivering Data Science Projects for March 1st

On Friday February 1st I ran my first Successfully Delivering Data Science Projects, this is a part of my new plan to give more training this year. This went really well and I got to both teach and learn a lot from my students. We talked through best practice, project design, derisking strategies, communication plans and we tried various new tools that’ll improve workflow. Conversation has continued in our private slack channel (which all attendees get access to).

The next iteration of Successfully Delivering Data Science Projects is online for March 1st, the course has half sold-out already. If you’d like to improve your confidence around the successful delivery of Python data science projects – you’ll want to get a ticket soon. The material I teach is based on years of helping clients from start-ups to corporates to successfully deliver data science projects.

I’m really happy that the discursive format gave room for students to raise their own issues and to add recommendations for tools and books in addition to my own. We continued our conversations in the pub after whilst decompressing – there we got to dig into some of the hard topics (such as mental health, imposter syndrome and running open source projects) in a more relaxed setting.

The topics covered in the next iteration will include:

  • Building a Project Plan that derisks uncertainties and identifies expected deliverables, based on a well-understood problem and data set (but starting from…we don’t know what we have or really what we want!) – you take the project plan template away for use in your own projects
  • Scenarios based on real-world (and sometimes very difficult) experience that have to be solved in small teams
  • Team best practice with practical exercises covering coding standards, code reviews, testing (during R&D and in production) and retrospectives using tools such as nbdime, pandas profiling and discover-feature-relationships – you take away the solutions and a guide to running code reviews to support relentless quality improvements in your team’s solutions
  • Group discussion around the problems everyone faces, to be solved or moved forwards by everyone in the group (the group will have more experience than any single teacher)
  • A slack channel that lives during and after the course for continued support and discussion among the attendees

You’re welcome to get in contact if you have questions. Further announces will be made on my low-volume training email list. I will also link to upcoming courses from my every-two-weeks data scientist jobs and thoughts email list.

 


Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.
Read More

“discover feature relationships” – new EDA tool

I’ve built a new Exploratory Data Analysis tool, I used it in a few presentations last year with the code on github and have now (finally) published it to PyPI.

The goal is to quickly check in a DataFrame using machine learning (sklearn’s Random Forests) if any column predicts any other column. I’m interested in the question “what relationships exist in my data” – particularly if I’m working in an unknown domain and on new data. I’ve used this on client projects during the discovery phase to learn more about the sort of questions I should ask a client.

The GitHub Readme includes a screenshot which will give you an idea using the Titanic classification and Boston regression examples.

This is a very light project at the moment, I think the idea has value, I’m very open to feedback.


Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.
Read More

Looking back on 2018, looking to 2019

So last year was a damned hard year – ignoring Brexit and other international foolishness, on a personal level (without going in to details) by mid-year I was emotionally wiped out. A collection of health issues between family and friends kept rearing their ugly heads and over time I ran very low of emotionally supportive energy to share. Our not-so-old cat suddenly dying of kidney failure just about topped the year off. Thankfully by Christmas most of the health issues had sorted themselves out, massively reducing their induced stress.

From August to December I worked deliberately at a much lighter level to give myself time to recuperate, that’s paid off well and by Christmas I could consider myself “reasonably back to my old self”. Sometimes it pays to just be kind to ourselves.

This led to the odd situation later in the year when I was given the NumFOCUS Community Service Award – I had to accept it with a bit of a wry grin as I’d already stepped back from many organisational roles in PyDataLondon by this point. The lovely outcome of stepping back was that…nothing really changed for PyDataLondon. I’m immensely proud of the organising team we’ve built, everything just kept ticking along nicely. I’m now back to being more involved and I’m happy to say we’ve got so many suggested talks coming through that we’re scheduled now for a chunk of the year ahead.

The continued growth in our PyDataLondon community (with 8,500+ members – AFAIK we’re the largest data science event in the UK) and the wider PyData community (over 127 international PyData communities) is lovely to see. I helped open the PyDataPrague meetup a few months back and was happy to share some of our lessons from growing our London community.

I’m also very happy to see the PyData conferences experiment with more non-traditional sessions. At PyDataLondon 2018 we’d added a creche and ran sprints and workshops like “making your first open source contribution” and “understanding how git works” to help attendees get more involved in our open source ecosystem. Last year we had art and political hackathons and a woman-focused lunch. At PyDataAmsterdam we ran some similar experiments and I know others were tried at other events. This year I’m looking forward to seeing even more experiments, we’ll certainly run more at PyDataLondon 2019 (July 12-14).

Out of all of this there are a few things I’m particularly proud of:

  • We raised £91,000 for NumFOCUS from our volunteered efforts in PyDataLondon 2018 towards grants and work to support open source
  • We saw the opening of 6 regional PyData events in the UK (by recency: Oxford, Cambridge, Manchester, Edinburgh, Bristol, Cardiff)
  • I got to speak on ways of tackling new data science projects, high performance and how NumFOCUS works at a variety of international events
  • Via the “making your first open source contribution” sessions I ran I helped several groups of people start to contribute on github to Python projects

Whilst 2018 have some tough components, I’m really happy with the positive events that occurred.

Separately from all of this Chris and I have started to shut down ModelInsight after 5 years of collaboration. We only lightly worked on our consultancy in 2017 and we hardly touched it in 2018. The market for the combination of data science and data engineering that we were interested in exploring never grew, we had a lot of fun with our clients but it didn’t feel like we were taking the business anywhere special. Shutting this down was the right call.

I continue with my usual activities under my own name. In a few weeks I run a new course on Successfully Delivering Data Science Projects, I have other training planned, I’ve started to author on-line videos for Pluralsight and I continue to coach teams as Interim Chief Data Scientist whilst my jobs list continues to help companies recruit and folk get new jobs.

I’m also trying a few personal-focus experiments. From Christmas I put in a time limit on my Android for a maximum of 5 minutes daily on Twitter and 10 minutes daily on Reddit. I’ve also blocked The Independent (my preferred news site) in Firefox to reduce my time-wasting habits. I’ve set aside a day for personal development (I have such a pile of interesting math & data science stuff I want to read). Ask me in a few months how this is all turning out.


Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.
Read More

New public course on Successfully Delivering Data Science Projects for Feb 1st

During my Pythonic data science team coaching I see various problems coming up that I’ve helped solve before. Based on these observations and my prior IP design and delivery for clients over the years I’ve put together a 1 day public course aimed at data scientists (any level) who want to be more confident with lower-risk approaches to delivering data science projects.

Successfully Delivering Data Science Projects runs on Friday February 1st 2019, early bird tickets have sold out, a handful of regular tickets remain (be quick). This course suits any data scientist who has discovered just how vague and confusing a research to deployment project can be, who’d like to be more confident in their plans and outcomes.

I’ve developed these techniques whilst working with the teams at companies like Hotels.com, Channel 4, QBE Insurance and smaller companies across domains including media, energy, adtech and travel.

The topics covered in the course will include:

  • Building a Project Plan that derisks uncertainties and identifies expected deliverables, based on a well-understood problem and data set (but starting from…we don’t know what we have or really what we want!)
  • Scenarios based on real-world (and sometimes very difficult) experience that have to be solved in small teams
  • Team best practice with practical exercises covering coding standards, code reviews, testing (during R&D and in production) and retrospectives using tools such as pyjanitor, engarde, pandas profiling and discover-feature-relationships.
  • Group discussion around the problems everyone faces, to be solved or moved forwards by everyone in the group (the group will have more experience than any single teacher)
  • A slack channel that lives during and after the course for continued support and discussion among the attendees

You’re welcome to get in contact if you have questions. Further announces will be made on my low-volume training email list. I will also link to upcoming courses from my every-two-weeks data scientist jobs and thoughts email list.

 


Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.
Read More

Talking on “High Performance Python” at Linuxing In London last week

Mario of PyLondonium (where I gave a keynote talk earlier this year) was kind enough to ask me along to speak at Linuxing in London. I gave an updated version of one of my older High Performance Python talks based on material I’d covered in my book, to show the more-engineering audience how to go about profiling and speeding up Python code. The audience was lovely, many were new to Python and also first-timers at the meetup, here’s half the room:

Audience at Linuxing in London

We covered:

  • Profiling with line_profiler (in a Notebook – thanks Robert!) to identify slow code in functions
  • Using numpy incorrectly to try to get a speed up, then profiling it to see why it didn’t work
  • Using Anaconda’s Numba on the numpy code to get a 20* overall speedup
  • Using a different algorithm entirely to get a further 1000* speedup (!)
  • Thoughts on the two main ways to get a speed-up (do less work or spend less time waiting for data)
  • Looking at Py-Spy which hooks into an existing process to profile on-the-fly – a take-away for anyone in an engineering team

Here’s a link to the slides.Thanks to Brian, David and the other organisers for hosting the four of us, it was a fun evening.

I also mentioned my London-based jobs and training email lists and promised to link them here. It was fun to speak to a less-data-science focused audience (where PyData is pretty much my bubble-reality nowadays), especially to meet new folk transitioning into Python from entirely non-technical careers. I reminded everyone that they’re most welcome to visit our PyDataLondon meetups to widen their network, of course London Python and PyConUK should definitely be on your radar too.

 


Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.
Read More