“#talkpay” tweet salary visualisation

This weekend the #talkpay tag has shown people outing their salaries, to democratise some of this information. This provides some interesting data for visualisation. If you’re curious about a discussion around salary data then @patio11’s blog entry is a good starting point.

@echen grabbed some of the data, I took a copy of the online sheet and made the following code to visualise the salaries. This is a very simplistic analysis, it is mostly US data, there’s no filtering for location (you’d expect San Francisco to pay significantly more than many other US cities).

First, here’s a histogram of the majority of the salaries listed (ignoring the top-9 which go up to $1.1 million which distort the plot):

Next we can filter by some text terms, here’s a similar histogram for software developers. Note the interesting peaks at $80k and $120k, then smaller but obvious bumps at $150k, $200k and $250k:

There’s much less data for teachers but you can get an idea of the difference in likely salaries:

Finally we can plot a normed (summed to 1.0) cumulative histogram, you can think of the data as probabilities to get an idea of the proportion of people who earn less/more than a certain amount:

It is worth remembering that the data is thin, just 800 samples, it is also self-reported so most of the reports will be from people who are confident in being public. It is likely that the true distribution of salaries is lower, as people who aren’t confident are less likely to publish.

Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.