A common mistake I often observe in data analysis is the averaging of percentages. To explain this concept let’s examine a simple example.
Suppose we conduct a survey on whether people like or do not like chocolate. We discover that 90% of children like chocolate; however, only 60% of adults like chocolate. So, can we claim that 75% of the population ( average = (90%+60%)/2 = 75% ) like chocolate? The answer is NO! There is one exception which we will discuss at the end of this post.
The reason this is not correct is because we do not know anything about the sample size. Let’s provide more information. In the table below we learn that there are 100,000 children and 400,000 adults surveyed. From those, 170,000 people do not like chocolate, while 330,000 like chocolate. Here, we can confirm that 90,000/100,000x100% = 90% of the children, and 240,000/400,000x100% = 60% of adults like chocolate.
To calculate the percentage of people who like chocolate across the entire population, we use the following equation:
It can be observed that we have inconsistencies between the 66% (accurate calculation), and the 75% (inaccurate calculation). Therefore, the temptation of averaging percentages can provide inaccurate results.
As previously mentioned, there is one exception where the average of percentages agrees with the accurate percentage calculation. This occurs when the sample size in both groups are the same. For example, if there are 100,000 children and 100,000 adults surveyed, as shown below. Here, we can confirm that 90,000/100,000x100% = 90% of the children, and 60,000/100,000x100% = 60% of adults like chocolate.
To calculate the percentage of people who like chocolate across the entire population, we use the following equation:
Here, both percentages agree in the 75% value. However, this is an exception and is not commonly found in real world applications. Therefore, we should be careful to not average percentages.
I hope this helps clarify a common misconception. Let me know your thoughts!