Over 100%? Here’s Why That Statistic Isn’t Wrong — And What It Really Means

January 1, 2026 · By Kanja Mwingirwa

When I saw this statistic, my data analyst instincts kicked in.

I’ve seen this happen in real reports: perfectly accurate numbers end up causing frustration because users assume something is wrong. Moments like this are a good reminder that part of being a data analyst is storytelling — but also making the numbers make sense for everyone, ideally at first glance, without extra explanation.

In this post, I explore why percentages over 100% aren’t necessarily wrong and what they really mean in multi-label data scenarios.

It’s not the math that’s broken; it’s how it’s presented.

At first glance, it looks wrong because the percentages add up to more than 100%. That immediately creates friction for the reader. Most people are used to percentages summing to 100%, so when they don’t, the assumption is that there’s a mistake somewhere.

The first question that came to mind was: what exactly is being measured?
Are these pathways mutually exclusive, or can a candidate demonstrate potential in more than one pathway?

This distinction matters a lot in multi-label data analysis.

If each candidate is meant to fall into only one pathway, then the numbers are simply incorrect. But if candidates are being assessed across multiple dimensions — meaning one person can show potential in STEM and arts, for example — then the math itself may actually be fine.

This is where data analysis goes beyond calculation and into interpretation.

A clearer way to tell the story behind overlapping percentages

When categories overlap, good data communication should quantify the overlap, not just imply it. Simply stating that candidates can qualify for more than one pathway explains why the percentages exceed 100%, but it doesn’t help the reader understand how they overlap.

And that “how” is where the insight lives.

For instance, the results could have been reported like this:

40% of candidates qualified for only one pathway
50% qualified for two pathways (for example, STEM and arts)
10% qualified for all three pathways

Or even more explicitly:

30% showed potential in both STEM and arts
20% in STEM and social sciences
8% in all three pathways

With this kind of breakdown, the original percentages suddenly make sense. More importantly, the reader actually learns something about the structure of the data instead of being left to guess.

This kind of reporting also changes the conversation. Instead of arguing about whether the math is “wrong,” we can start asking better questions:

Are certain pathways strongly correlated?
Are we seeing more well-rounded candidates than expected?
What does this mean for curriculum design or resource allocation?

None of those questions are possible if the overlap remains hidden.

See… the math isn’t wrong, but the presentation is everything. Data should reduce confusion, not create it. At its best, data makes it easier to understand reality and make decisions — not harder.

This is a small example, but it highlights a bigger point: data literacy isn’t just about getting the numbers right. It’s about communicating them responsibly.

← Back to Blog