Entrepreneurial Geekiness

Ian is a London-based independent Chief Data Scientist who coaches teams, teaches and creates data products. More about Ian here.
Entrepreneurial Geekiness
Ian is a London-based independent Chief Data Scientist who coaches teams, teaches and creates data products.
Coaching
Training
Jobs
Products
Consulting

“Higher Performance Python” at PyDataCambridge 2019

I’ve had the pleasure of speaking at the first PyDataCambridge conference (2019), this is the second PyData conference in the UK after PyDataLondon (which colleagues and I co-founded 6 years back). I’m super proud to see PyData spread to 6 regional meetups and now 2 UK conferences.

We had over 200 attendees and the conference (and a swanky black-tie conference dinner) and the single-track event had a rich set of topics (schedule). For me scikit-multiflow (extending sklearn to streaming data) was a hit along with model stability checking (by FarFetch) and an overview of GA2M (an extended Generalised Adaptive Model with explainability). Thanks to the speakers for fine talks and the audience for fine questions and Cambridge Spark and the PyDataCambridge meetup for helping make it all happen!

I spoke on Higher Performance Python with a focus towards making Pandas operations go faster and an eye on the upcoming Second Edition of our High Performance Python (O’Reilly) book. The talk covers:

  • Using line_profiler to evaluate sklearn’s LinearRegression vs NumPy’s lstsq (spoiler – lstsq is much faster but that’s due to sklearn being much safer, the slow-down is all due to safety code in sklearn that helps keep your productivity higher overall)
  • Using Pandas for line-by-line iteration (slow) vs apply (faster) and apply with raw=True to expose NumPy arrays (fastest)
  • Using Numba to JIT compile lstsq using apply with raw=True for a huge speed-up
  • Using Dask to parallelise the Numba solution for further speed-ups
  • Advice on being a “highly performant data scientist”

The last point is important – going “compiler happy” and writing highly efficient code may well slow down your team and your overall velocity. Amongst other items I recommended profiling first, maybe introducing Dask & Numba only with a team’s consent and looking at tools like Bulwark to add tests to DataFrames to avoid being derailed by strange data bugs.

Right now Micha and I are busily working to complete the second edition of our book, all going well it’ll be in for Christmas with a publication date around April 2020.

 


Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.
Read More

“A starter data science process for software engineers” – talk at PyLondinium 2019

I’ve just spoken on “A starter data science process for software engineers” (slides linked) at PyLondinium 2019, this talk is aimed at software engineers who are starting to ask data related questions and who are starting a data science journey. I’ve noted that many software engineers – without a formal data science background – are joining our PyData/data science world but lack useful transitionary resources. [note – video to come]

In this talk (based in part upon my current training courses and my recent PyDataCambridge talk) I cover:

  • What enables a good data science project
  • Ways to plan a project spec for success (really, do this, it saves so much pain)
  • A live demo covering a Jupyter Notebook with Altair, matplotlib, sklearn, yellowbrick, Widgets and then serve this up with Voila and Binder

The Notebook lives in github and this link should start a live Binder version (in which Altair is interactive and the slider Widget at the bottom of the Notebook live-drives scikit-learn predictions).

After the talk it seems that both Altair and the message “make a project spec” were the main winners, with Voila as a close third.

PyLondinium were also kind enough to organise a book signing for my High Performance Python book where I got to talk a bit about our in-preparation 2nd edition (for January).

This conference builds on last year’s inaugural event, it has grown and has a lovely feel. You may want to think on putting in a talk for next year’s PyLondinium!

 


Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.
Read More

“On the Delivery of Data Science Projects” – talk at PyDataCambridge meetup

A few weeks I got to speak at PyDataCambridge (thanks for having me!), slides are here for “On The Delivery of Data Science Projects“.

This talk is based on my experiences coaching teams (whilst building IP for clients) to help them derisk, design and deliver working data science products. This talk is really in two halves – it takes the important lessons from my two training classes and boils them down into a 30 minute talk. We cover:

  • What makes for a successful data science project?
  • Developing a Project Specification for shared agreement including a Definition of Done
  • Using standard tools and processes to standardize and simplify
  • Ideas around best practice

Let me know if you found this talk useful? I really think the ideas around successful project delivery need to be collected and shared, we’re still in the “wild west” and I’m keen to collate more examples of successful process.

 


Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.
Read More

Thoughts on how to start a PyData or Python meetup

At PyConLT 2019 (Lithuania) we just had a 10-person meeting on “how to start a new PyData or Python meetup” with existing organisers and some potential new event organisers. The night before in the conference bar Radovan and I had spent an hour helping someone from Latvia figure out their plan to start a new Python meetup. Given that I’m a co-founder for PyDataLondon and after 6 years we’re at 9,500+ members, I have some opinions. Maybe sharing these will help others. All going well we’ll see a new PyDataVilnius start with what looks like a 7+ person volunteer group, all organised at PyConLT.

I’m pretty sure that the key to starting a meetup is to have:

  • lots of marketing (I can’t stress this enough – to kick start a new event you have to help relevant people find you)
  • regular meetups (e.g. monthly or e.g. every 3 months – whatever fits your community and your level of engagement)
  • high quality speakers
  • the same venue every month (it is easier for attendees and much easier for the organisers to have a fixed venue)

None of the above are “magic”, they each help. Regular meetups mean people put them into the calendar. High quality speakers mean people get used to turning up because “they’ll either learn something or meet other interesting attendees” – I’m really proud of the quality of speakers we’ve had consistently over 6 years at the PyDataLondon meetups. The same venue means there’s no confusion about how to find you for the attendees and there’s no extra effort for the organisers to have to keep finding and negotiating new venues.

For the first few months keep your life simple – get setup on meetup.com (if you’re doing a PyData you can email NumFOCUS to ask for an official PyData branch to be created) or have a landing page (/blog/mailing list/facebook page). Find a couple of good speakers – ask around, you’ll quickly be surprised at how good your existing network already is. Choose a regular date (PyDataLondon is “the first Tuesday of the month” and we’ve rarely broken that rule). Find yourself a couple of co-organisers – 3 co-organisers is great, 1 is too much of a burden for 1 person I think.

Don’t stress too much about the first venue. You’ll probably grow out of it so you just need a stable base to start from. I’ve run prior meetups out of pub function rooms, renovated Victorian townhouses and co-working spaces. PyDataLondon started at Pivotal (thanks Ian!), then moved to Lyst (thanks Steve!) and then for 3+ years at Man AHL (super thanks Slavi, Gary and colleagues!).

Next – publicise it. Tell all of your friends who’d be even vaguely interested and ask them to tell three friends. Post to Facebook, Twitter, your favourite forums/slack/whatsapp groups. Ask friends in companies or at universities to advertise internally. Go to allied meetups that are already running and ask the organisers if you could have 2 minutes on-stage to announce to their audience (and of course offer to publicise their event to your audience in the future – always reciprocate).

If you’re stuck for speakers – look at the history of speakers at other related meetups and conferences, then reach out to them. Tell your friends that you’d like to reach persons Y and X and maybe they can do you intros.

At each meetup remind everyone that you’re a volunteer groups – so many attendees don’t realise that the organisers & speakers aren’t paid to do this. Remind them that you’re all volunteers giving up your time, ask them to thank the speakers and your colleagues (and maybe to buy them beer or coffee). At the end of the event get your volunteers and speakers to stand and ask everyone to thank them (and maybe buy them a beer in the pub after) – make sure your volunteers are acknowledged. Also remind everyone “we want new speakers – please come and have a chat with the volunteers!” and you’ll develop a feed of people willing to speak for 0 extra effort on your part.

At PyDataLondon we get lots of first-time attendees now (maybe 30% of the room are first-timers now). I’ve taken inspiration from other meetups (e.g. LinuxingInLondon and others) to make the start ‘more friendly’. We have 200 people in the room, many of whom don’t know who they sit next to. We do a 1 minute intro section where “you turn to your neighbour and ask ‘What do you do with Python?’ and then they reciprocate” – so then everyone has someone to talk to in the break. I also get and pass aroud “green newbie badges” – these are stickers that any newbie can stick on themselves, this gives them the super power to go and join any conversation in the breaks. I also get stickers for the speakers & volunteers (robots, unicorns, planets – the London Science Museum is great for these). These techniques work nicely.

Think about your motivations for running a meetup. It takes a lot of sustained effort to make it consistent and then it just takes effort to keep it going. You’ll want to grow an organising group to keep it maintainable. Ask yourself – why do you and your colleagues want to run this group? Make sure you’re all in alignment about your core values, else you may be ignoring issues which might make things weird later. Thankfully the groups I’ve been involved in haven’t “gone weird” but I’ve seen it happen (e.g. founders who won’t let go of the reigns and then burn out), you can sensibly avoid some of these problems by having frank discussions up front.

Later on you may need to scale up your organising group. At PyDataLondon we’ve been blessed with a super feed of lovely volunteers – we run a committee with circa 13 organisers. My earlier meetup “fivepoundapps” was an awful lot of fun but my friend John and I realised after some years that nobody else wanted to run the event – no amount of cake baking (John) and beer carrying (me) would change the feel of the event from “John and Ian’s fivepoundapp” which was a pity. When we burned out it just died. Happy memories, but a pity that it didn’t sustain.

For PyDataLondon I put thought into sustainability and our organisers discussed this as we grew. In the background we have a rotating Release Manager (they check with the venue & speakers, send out a “please unRSVP if you can’t attend” email and do some housekeeping – essential and mostly invisible work), several of us look for new speakers, we used to do a lot of publicity but thankfully that’s mostly automatic now. I write an “update email” most months to our 9k+ members, that takes 1-2 hours to write and includes some links about upcoming relevant events and often requests new speakers. All the volunteers rotate to be on stage, not everyone is comfortable with that (and sadly some of us – myself, Emlyn, Marco – are more comfortable with it than others and we get he limelight) but we try to make sure that all volunteers get some visibility. I don’t know a better way to do this, I’m open to feedback.

You don’t have to start big, you just have to be consistent and it’ll grow sensibly over time. We’re 6 years in now with PyDataLondon and it just seems to keep growing. Build a nice small meetup, let it grow, delegate and find colleagues who want to help it grow and…let it. Beer generally helps (+wine+soft drinks+pizza+suitable alternative food – suit the needs of your audience). A code of conduct is very sensible (NumFOCUS sensibly insist on this for PyData events). Keep telling people. Find high quality speakers. Rinse and repeat. Enjoy the new community. Sometimes I joke that the reason I’ve put in 6 years of effort is, on a personal level, because I couldn’t find 50 interesting folk to come to the pub with me – now I get it for free every month. That’s magic.

Thanks for Radovan of PyDataBratislava, Jan of PyDataPrague, Aidis of PyConLT for direct feedback yesterday, to my colleagues at PyDataLondon and global PyData events and to others who have helped me figure out these approaches over the years.

ps. written at the airport – please excuse any typos!


Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.
Read More

PyCon Lithuania 2019 and a keynote on “Citizen Science with Python”

I’ve had the great pleasure of attending PyConLT 2019 – my first trip to Lithuania. I had no idea what to expect (I’ve never been to this part of Europe) – Vilnius is a lovely city full of lovely Pythonistas. There’s a bunch of lovely art hanging underneath bridges, an amazing Soviet Palace of Arts and Sports and a number of castles – it really is very lovely here.

This keynote talk builds on a couple of previous iterations, here I spoke on “Citizen Science with Python“:

  • Dirty North-Macedonian air analysis by Gorjan (from PyDataAmsterdam)
  • Improving child births via Anna (from PyDataWarsaw)
  • Monitoring saved and relocated Orangutangs using SciPy and drones via Dirk (from PyDataAmsterdam)
  • Improving political engagement in the run up to the UK EU elections (via myself and colleagues at a political hackathon)
  • Short demo of using open data in a Notebook to visualise  the 30 year increase in new breweries in the UK

The keynote was aimed at a mostly-engineer audience, to show that “with the power of the keyboard” we all have skills to make the world a better place that most of our fellow citizens lack.

Whilst here I got the chance to get involved in two very interesting discussions:

  • How can others who want to start a meetup get some advice? This (hopefully) leads to the formation of a new PyDataVilnius (it was announced on stage in the closing notes – yay!) and the expansion of a similar group back in Cyprus (which has already had some positive feedback – yay!)
  • How can folk take the next step with their data science project which led to interesting chats over lunch

I spent most of my time on the PyData track and met a load of very smart local and international folk, organiser Aidis and his volunteer crew did a really good job of pulling this together. I also got to meet PyDataBratislava co-organiser Radovan and PyDataPrague co-organiser Jan (we had the pleasure to meet last year when I went to their first meeting). These events really do help to build such a great international network.

 


Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.
Read More