Should Subjects Be Required to Report Attainment on a Common Scale?
Why there is no perfect way to report attainment to students and parents
When we began visiting secondary schools to research this Substack, there was always one person who wanted to talk the most: the Head of Assessment and Data. Often newly promoted to leadership, almost always conscientious, and usually halfway through rewriting the assessment and reporting policy. And without fail, they would ask: Should we make all subjects report attainment in the same way? Percentages? Grades? Below/at/above expected?
They’d explain that they’d picked one approach. Then they’d tell us about the English department pushing back. Or PE. Or maths.
If you’ve read our earlier posts on reliability and discrimination, you’ll already know why the subjects disagree. And yet, as this post explains, there are many reasons why consistency in reporting is valuable. We are going to be forced into trade-offs, and the costs will land differently in every subject.
The desire for consistency
It’s no surprise that the first big decision for a newly appointed assessment lead is often: How should we report attainment across subjects? The question looms large not just because it’s contentious, but because the case for consistency is, on the face of it, quite strong.
Parents expect to be told how their child is doing. And they expect to understand it. They want to know where strengths lie, which subjects need more focus, whether their child is keeping up or falling behind. Consistent reporting across subjects helps make this picture legible.
Leaders, too, need to compare performance across subjects: to support individual students, allocate resources and support to departments, and make decisions about curriculum, class groupings, or intervention. If a student is visibly struggling across the board, or underachieving in just one or two areas, the reporting system needs to make that pattern clear at a glance.
Consistency, then, isn’t just a bureaucratic neatness. It underpins communication, decision-making, and in many cases, action. That’s why assessment leads reach for it first.
But what if the cost of consistency is meaning?
Why consistency is inherently problematic
The trouble is, consistency only works if the underlying assessments support it, and they often don’t.
As we’ve explored in recent posts on reliability and discrimination, subjects differ sharply in how precisely they can measure attainment, and how confidently they can rank students. In maths, it’s feasible to produce reliable percentage scores across a wide mark range. In PE or drama, it usually isn’t. The knowledge architecture, the types of tasks, the class time with students, and the limits of judgement all get in the way.
This is why some subjects prefer fine-grained numbers, while others stick to grouped descriptors like “expected” or broad grades. It’s not just habit but rather what their assessments can support.
The result is a tension: do we force uniformity and risk distortion? Or allow variation and lose comparability? There’s no perfect solution. But if we pretend these differences don’t exist, our reporting system may look consistent while saying very little at all.
The problem with groups
When schools try to enforce consistent reporting, the usual compromise is to ask subjects with fine-grained data, such as percentages or standardised scores, to group students into broad categories: grades, levels, or “expected” judgments. The reverse isn’t possible. You can’t conjure up precise rankings from a subject that only ever judges students as broadly similar.
Grouped judgements feel comfortable. They acknowledge uncertainty and avoid over-claiming. If we’re not sure whether two students performed differently, it feels fairer to place them in the same band than to split hairs.
But these groups come at a cost. As soon as a third student lands just below the boundary, they’re assigned a lower grade, and so the whole thing feels unjust. The cliff edge effect is hard to avoid. In theory, larger groups reduce the number of questionable pairwise distinctions we have to make. But when a cut-off does bite, it bites harder. The student who missed the grade by one mark feels just as different from the one who scraped in, even if their work was almost identical.
It is understandable why many teachers prefer grouping: in subjects with fractured attainment or fuzzy marking, we often don’t feel sure who did better. We can’t point to clear evidence. On another day, with a different task, the order might reverse. And yet, from a student’s perspective, groups can feel static and opaque. If you’re always told you’re “expected”, it’s hard to see how effort might shift anything. The feedback-effort loop breaks and if our primary goal of assessment is to promote learning, this is catastrophic.
We group because we want to reflect uncertainty, but in doing so, we risk undermining motivation and creating new kinds of unfairness.
Motivational goals
Ultimately, how we report attainment matters less than how it is interpreted. And interpretation depends not just on clarity, but on motivation.
Attainment information shapes two key beliefs for students:
Where do I stand relative to others?
Can I improve if I try?
The first belief fuels ambition and identity. The second sustains effort. But different students respond differently to the same information. As we will write about further in forthcoming posts, some will feel energised by detailed attainment feedback; others defeated. This is the feedback-effort loop - and it is fragile. Too much precision, and you risk sending noisy signals about progress. Too little, and effort feels pointless as students as frozen into fixed positions.
And then there are parents.
Parents interpret reports with far less context than teachers or students. Yet their involvement, especially how they frame effort, aspiration and support, can have profound effects on learning. If reporting is inconsistent or opaque, they may either disengage or overcorrect. Neither is helpful.
We’ll return to the role of parents in future posts. But for now, it’s enough to say: motivation is not a secondary concern. It is typically the most important route by which reporting promotes learning.
What could we do instead?
There’s no universally valid way to report attainment. Percentage scores only make sense when we trust the precision of the underlying marks. Three-point scales like “below / expected / above” are easier to apply, but often so vague they hide real variation and make progress feel impossible to track.
Since no reporting scale works equally well for all subjects (or for all students) we’re left with few choices about how to manage the reporting consistency problem.
So what are the options?
Uniform Grouped Reporting Model
The most common approach we observe, especially from EYFS up to Key Stage 3, is to keep any detailed mark information private - sometimes even from students - and report to parents using a single, simple system across all subjects. This is typically a grouped judgement with just a few categories, such as below / at / above expected.
The reason it’s usually a 3-point scale is that it represents the lowest common denominator, something that even subjects with limited or hard-to-quantify assessment data (like PE or the arts) can reasonably produce. It’s administratively straightforward and feels fair across departments. But it’s often too blunt to support meaningful interpretation or motivation.
Dual Option Model
Some schools allow each subject to choose from a small menu of reporting styles, usually one grouped and one more granular. For instance, subjects might be given the option to report either on a 3-point scale or to use percentage scores. Parents then receive a mixture of the two formats across subjects (though typically only one per subject), which feels manageable to understand. This approach brings a degree of comfort to subject teams, who can select something that feels workable, even if they don't have full autonomy.
Common Scale, Custom Detail Model
This approach combines a shared top layer with subject-level flexibility. All subjects report against a single universal scale - often three points - but are also permitted to include a second attainment measure that reflects their actual assessments. This could be a percentage score, rank percentile, or a brief comment. It preserves cross-subject consistency at the surface level while giving teachers space to report something meaningful.
But this flexibility only works if parents are clearly guided. Without tight oversight, second-layer data can easily become a jumble of inconsistent formats and unexplained labels. It’s the job of the assessment lead to ensure that every number, grade, or phrase on the report is properly defined and interpretable. A vague descriptor with no shared meaning just shifts the confusion onto families.
Our view is that this two-layered approach works best when the shared top layer uses a 5-point scale, rather than a 3-point scale. A 3-point system tends to collapse into sameness: too many students stuck as “expected”, regardless of their effort, trajectory, or actual differences. A five-point scale gives just enough resolution to capture variation and direction of travel, without pretending to a level of precision we don’t have.
It’s not perfect. But it might be the least-worst option we have.
Conclusion: Surface consistency isn’t enough
There is no perfect way to report attainment to students and parents. Any system that tries to enforce consistency across subjects has to grapple with deep technical and motivational trade-offs. A tidy format is no use if it leads to confusion, flat feedback, or stalled effort.
This post has only taken us as far as surface-level consistency: getting everyone to report in the same way. But even when we do that, a bigger problem remains: what does a “B” in music actually mean compared to a “B” in art? That’s the question we’ll explore next.




We used 6 at my most recent school: colours of the rainbow (minus indigo as it was too similar) with very clear descriptors of what that meant - in line with top 1% of the country, in line with top 10% of country etc, which we chose carefully using knowledge of our own cohort and how many were likely to hit those groups.
Another gem. Thank you.