“discover feature relationships” – new EDA tool

I’ve built a new Exploratory Data Analysis tool, I used it in a few presentations last year with the code on github and have now (finally) published it to PyPI.

The goal is to quickly check in a DataFrame using machine learning (sklearn’s Random Forests) if any column predicts any other column. I’m interested in the question “what relationships exist in my data” – particularly if I’m working in an unknown domain and on new data. I’ve used this on client projects during the discovery phase to learn more about the sort of questions I should ask a client.

The GitHub Readme includes a screenshot which will give you an idea using the Titanic classification and Boston regression examples.

This is a very light project at the moment, I think the idea has value, I’m very open to feedback.

Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.