Software Engineering for Data Scientists

This course is based on hard-won experience by Ian from client engagements aimed at getting you more reliably over the line with successful working code.
The next public courses will run on July 8-10 and September 23-25.
Please get in touch if you'd like a private course in your company.
Get a notification for future courses by using this notification form.
This is a 3 morning virtual course (Zoom & Slack) held for a small group of circa 10 people. We'll focus on developing a standard process from R&D through to production backed by code reviews, documentation, refactoring, unit tests and a Notebook based git process.

It is aimed at existing Python programmers who have 2+ years of prior programming experience and who need their Python development process to be more efficient and trusted.

This course is aimed at any Pythonic data scientist who:

  • ● Wants more confidence that their code runs correctly during deployment to avoid downtime and friction with business colleagues
  • ● Needs to learn more about testing and debugging
  • ● Wants ideas on routes to deployment and solutions to post-deployment changing data
  • ● Wants to collaborate with confidence with other technical team members to raise the team's long-term velocity

During the course we'll:

  • ● Develop defensive coding practices in our Notebook which double up as documentation
  • ● Refactor Notebook code into modules for reuse and increased trust
  • ● Add unit-tests to test our modules for trust and integration with a Continuous Integration pipeline
  • ● Review a git process that uses nbdime for collaboration on Jupyter Notebooks
  • ● Practice code reviews backed by a documented process you can take back to your team
  • ● Review a standard research-to-deployment process using cookiecutter
  • ● Discuss how to sell these new techniques to team members and senior staff to get critical buy-in to see change occur after the course
  • ● Look at how "traditional" software engineering and "data science engineering" differ to highlight process differences that your software engineering colleagues probably haven't seen
  • ● Write useful documentation in our code to improve future support

After the course you'll:

  • ● Have a working cookiecutter layout to demonstrate all of the processes to your team
  • ● Have a "cheat sheet" to help you quickly use these new techniques at the right time back in the office
  • ● Take home a practical guide for code reviews to significantly improve your team's code quality and overall velocity
  • ● Have gained answers to the questions you arrived with, so your personal blockers will be resolved
  • ● Have a plan for new tools and processes to introduce at work to make your team more efficient
  • ● Have access to our Slack channel to continue the conversation with class mates and to download any shared material, you'll be able to see conversations from previous courses and you'll be able to collaborate with past and current students
  • ● Receive a Certificate of Professional Development
"Ian's Software Engineering for Data Scientist's course provides an excellent overview of best practices with focus on testing, debugging and general code maintenance. Ian has a wealth of experience and also makes sure to keep on top of the latest tools and libraries in the Data Science world. I would especially recommend the course to Data Science practitioners coming from an academic rather than software engineering background."
- LibertyGlobal Mirka

"Ian's course Software engineering for Data Scientist was really useful to me. I learned more about refactoring and testing which I implemented at work in my current project the week after the training. There are other good practices (including the use of libraries I didn't know) that I am willing to put in place in the future."
- Sandrine Pataut, QBE

Get in contact with Ian
If you’d like to discuss how Ian can help your team, get in contact by emailing Ian[at]
  • Read my book

    Oreilly High Performance Python by Micha Gorelick & Ian Ozsvald
    Oreilly High Performance Python by Micha Gorelick & Ian Ozsvald