More and more frequently I’m finding myself presenting data to clients. Wether it’s through code reviews or performance audits, taking a statistical look at code is an important part of my job, and it needs presenting to my clients in a way that is representative and honest. This has led me more and more into looking at different averages, and knowing when to use the correct one for each scenario. Interestingly, I have found that the mean, the average most commonly referred to as simply the average, is usually the least useful.Thank you for reading this post, don't forget to subscribe!
I want to step through the relative merits of each, using some real life examples.
Disclaimers: Firstly, I’m by no means a statistician or data scientist; if anything in here isn’t quite correct, please let me know. Secondly, I deal with relatively small data sets, so we don’t need to worry about more complex measurements (e.g. standard deviation or interquartile ranges); sticking with simplified concepts is fine. Finally, all of the data in this article is fictional, please do not cite any figures in any other articles or papers.
The mean is the most common average we’re likely to be aware of. It works well when trying to get a representative overview of a data set with a small range; it is much less representative if we have extremities in our data. If you’ve ever split a bill at a restaurant, you’ve used a mean average. We arrive at the mean by adding all of our data points together and then dividing by the number of data points there were, e.g.: