Weighing up the benefits of Comparative Judgement
Photo: Raquel Martinez/Unsplash
What is Comparative Judgement?
If asked to rank ten biscuits in order of tastiness, chances are you would choose one, then a second and compare it as either better or worse than the first.
A third would be ranked in relation to the existing two and so on until you have ten in order of preference.
This method of ‘pairwise comparison’ or Comparative Judgement developed by psychologist Louis Thurstone rests on the idea that we are better at comparing items that giving individual marks linked to a set criteria.
How does Comparative Judgement work?
Applied to the world of education and assessment, this involves examiners assessing student scripts by reading two results, quickly identifying the ‘better’ one, before repeating the process again with another pair and so on and so on.
This could all be done without in-depth reading of the script, relying on judges having good knowledge of what ‘quality’ looks like, comparatively, rather than using a mark scheme.
After various rounds of comparing pairs, an algorithm would rank all the judged pieces into a hierarchy determined by the dominant judgements.
This would, in theory, identify the best and worst pieces, and order all the others in between.
What are the advantages of Comparative Judgement?
The biggest advantage of CJ in assessing academic performance is that each students’ work is judged many times by many different examiners.
Because the final script assessment comes from a combination of all the various judgements it avoids the potential of a rogue, but decisive, single judgment.
Comparative Judgement and accuracy
CJ advocates point to studies appearing to show it is more accurate than traditional marking.
One review saw Scale Separation Reliability (SSR) scores – which could be thought of as a measure of the degree of agreement between the assessors – range from .73 to .99 where 1.0 would be total agreement.
This is higher than those usually achieved by examiners with a mark-scheme.
Comparative Judgement and validity
Another claim is that it has greater validity, in that because it does not rely on a tightly-defined mark scheme, judges are assessing a greater breadth of knowledge and understanding.
Comparative Judgement is quick
A final benefit proponents highlight is speed. Swift judgement makes the process quicker than traditional marking when it comes to assessing open-ended answers showing students’ understanding.
So far so good.
But if CJ offers so many benefits, why does it not dominate the field of marking?
Are there any downsides of using Comparative Judgement?
Well, one of CJ’s advantages - that an individual piece of work is seen multiple times by multiple examiners - is also one of its disadvantages.
All those people involved in making it work add a lot of cost to the process.
Comparative Judgement can be resource intensive
When there are a lot of items to be judged it can turn out to be much more time and resource intensive than current methods.
One study found that when it came to time spent per question, CJ may be best for relatively open items that assess understanding, rather than factual recall.
Who answers for a Comparative Judgement decision?
Another barrier to CJ’s implementation is that it is more opaque.
Because it is often difficult to work out what judges’ decisions are based on it could be challenging for students to appeal against what they see as an unfair mark.
And to convert the judges’ decisions into a ranking it uses an algorithm - something less familiar than adding up marks
Does Comparative Judgement work for all forms of assessment?
The validity argument might hold for short, concise pieces of writing – but skimming an extended essay is likely not the best way to glean its nuances and assess understanding.
While it is possible to check whether CJ correct ranks or physical quantities such as height, which can be measured accurately by other means, it is not possible to do so for phenomena such as quality of writing, which are not objective constructs And assessing formula or factual recall appears to be more efficiently done with a rubric.
Data appearing to show its greater accuracy has been challenged on several grounds including that it is 'misleading' and ‘incomplete and inconsistent.’
Is there much research into Comparative Judgement?
Despite the concept being first aired 96 years ago, there is still relatively little scientific proof of the benefits of CJ.
Some who are not confident in applying CJ widely say there is sparse research into how it performs compared to other systems. Nor, they say, is there much evidence to prove that humans are better at comparative judgements than other forms of judgement.
Indeed, those calling for more research say this claim is usually only backed up by the work of Thurstone himself and one other experimental psychologist Donald Laming.
Is it time to start using Comparative Judgement more widely?
From a practical perspective, CJ is not yet a viable alternative to exam marking.
The cost in terms of time and resources needed in preparation and then execution makes it impractical to apply it on a large scale or to certain types of assessment.
What work has AQA done with Comparative Judgement?
Much research into CJ continues within AQA.
We successfully used it to confirm that grade boundaries across two different GCSE English Language series were at similar standards.
We explored, with Ofqual, using it to estimate grade boundaries in GCSE English Language and GCE Chemistry.
It also reliably ranked AS Geography essays and showed that examiners were making valid judgements.
All of this shows that Comparative Judgement is a useful research tool but there are still many more questions to be answered before AQA has full confidence in using it for exams themselves.
Read more on this subject:
Using adaptive comparative judgement to obtain a highly reliable rank order in summative assessment (aqa.org.uk)
Adaptive testing – tailoring the future of assessments? | AQi powered by AQA
Could comparative judgement replace traditional exam marking? | AQi powered by AQA