I’ve used the Pandas data science toolkit for over a decade and I’ve filed a couple of issues, but I’ve never contributed to the source. At the weekend I got to balance the books a little by making my first commit. With this pull request I fixed the recent request to update the pct_change docs to make the final example more readable.
The change was trivial – adding “periods=-1” to the argument and updating the docstring. The build process was a lot more involved – thankfully I was on a call with PyLadies London to try to help others make their first contribution to Pandas and I had organiser Marco Gorelli (a core contributor) to help when needed.
Ultimately it boiled down to setting up a docker environment, running a new example in my shell, updating the relevant docstring on the local filesystem and then following the “contributing to the documentation” guide. My initial commit fell foul of the docstring style rules and the automated checking tools in the docker environment point this out. Once the local filesystem checker scripts were happy I pushed to my fork, created a PR and shortly after everything was done.
All in it took 45 minutes to get the environment setup, another 45 minutes to make my changes and figure out how to run the right scripts, then a bit longer to push and submit a PR (followed by overnight patience before it got picked up by the team).
When I teach my classes I always recommend that a good way to learn new development practices (like the automated use of black & flake8 in a precommit process) is to submit small fixes to open source projects – you learn so much along the way. I’ve not used docker in years and I don’t use automated docstring checking tools, so both presented nice little points for learning. I also have never used the pct_change function in Pandas…and now I have.
If you’ve not yet made a commit to an open source project, do have a think about it – you’ll get lots of hand holding (just be patient, positive and friendly when you leave comments) and you can stick a reference to the result on your CV for bragging rights. And you’ll have made the world a slightly better place.
Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.