Assessment

Comparative Judgement: The pros and cons

Psychometrics pioneer Louis Thurstone first introduced the idea of Comparative Judgement in 1927. His pitch was that while people would be unlikely to correctly guess someone else’s height, they would be able to say with almost 100 percent accuracy whether a person was taller or shorter than another. It meant you could rank a group of people by height without having to compare every subject with every other one. This concept of an alternative marking system is almost a century old and has vocal supporters in the education world, so why is it still not more widely used? AQA’s policy team sheds some light on the topic.

Date26/04/23

AuthorAQA Policy Team

Weighing up the benefits of Comparative Judgement
Photo: Raquel Martinez/Unsplash

What is Comparative Judgement?

If asked to rank ten biscuits in order of tastiness, chances are you would choose one, then a second and compare it as either better or worse than the first.

A third would be ranked in relation to the existing two and so on until you have ten in order of preference.

This method of ‘pairwise comparison’ or Comparative Judgement developed by psychologist Louis Thurstone rests on the idea that we are better at comparing items that giving individual marks linked to a set criteria.

How does Comparative Judgement work?

Applied to the world of education and assessment, this involves examiners assessing student scripts by reading two results, quickly identifying the ‘better’ one, before repeating the process again with another pair and so on and so on.

This could all be done without in-depth reading of the script, relying on judges having good knowledge of what ‘quality’ looks like, comparatively, rather than using a mark scheme.

After various rounds of comparing pairs, an algorithm would rank all the judged pieces into a hierarchy determined by the dominant judgements.

This would, in theory, identify the best and worst pieces, and order all the others in between.

What are the advantages of Comparative Judgement?

The biggest advantage of CJ in assessing academic performance is that each students’ work is judged many times by many different examiners.

Because the final script assessment comes from a combination of all the various judgements it avoids the potential of a rogue, but decisive, single judgment.

Comparative Judgement and accuracy

CJ advocates point to studies appearing to show it is more accurate than traditional marking.

One review saw Scale Separation Reliability (SSR) scores – which could be thought of as a measure of the degree of agreement between the assessors – range from .73 to .99 where 1.0 would be total agreement.

This is higher than those usually achieved by examiners with a mark-scheme.

Comparative Judgement and validity

Another claim is that it has greater validity, in that because it does not rely on a tightly-defined mark scheme, judges are assessing a greater breadth of knowledge and understanding.

Comparative Judgement is quick

A final benefit proponents highlight is speed. Swift judgement makes the process quicker than traditional marking when it comes to assessing open-ended answers showing students’ understanding.

So far so good.

But if CJ offers so many benefits, why does it not dominate the field of marking?

Are there any downsides of using Comparative Judgement?

Well, one of CJ’s advantages - that an individual piece of work is seen multiple times by multiple examiners - is also one of its disadvantages.

All those people involved in making it work add a lot of cost to the process.

Comparative Judgement can be resource intensive

When there are a lot of items to be judged it can turn out to be much more time and resource intensive than current methods.

One study found that when it came to time spent per question, CJ may be best for relatively open items that assess understanding, rather than factual recall.

Who answers for a Comparative Judgement decision?

Another barrier to CJ’s implementation is that it is more opaque.

Because it is often difficult to work out what judges’ decisions are based on it could be challenging for students to appeal against what they see as an unfair mark.

And to convert the judges’ decisions into a ranking it uses an algorithm - something less familiar than adding up marks

Does Comparative Judgement work for all forms of assessment?

The validity argument might hold for short, concise pieces of writing – but skimming an extended essay is likely not the best way to glean its nuances and assess understanding.

While it is possible to check whether CJ correct ranks or physical quantities such as height, which can be measured accurately by other means, it is not possible to do so for phenomena such as quality of writing, which are not objective constructs And assessing formula or factual recall appears to be more efficiently done with a rubric.

Data appearing to show its greater accuracy has been challenged on several grounds including that it is 'misleading' and ‘incomplete and inconsistent.’

Is there much research into Comparative Judgement?

Despite the concept being first aired 96 years ago, there is still relatively little scientific proof of the benefits of CJ.

Some who are not confident in applying CJ widely say there is sparse research into how it performs compared to other systems. Nor, they say, is there much evidence to prove that humans are better at comparative judgements than other forms of judgement.

Indeed, those calling for more research say this claim is usually only backed up by the work of Thurstone himself and one other experimental psychologist Donald Laming.

Is it time to start using Comparative Judgement more widely?

From a practical perspective, CJ is not yet a viable alternative to exam marking.

The cost in terms of time and resources needed in preparation and then execution makes it impractical to apply it on a large scale or to certain types of assessment.

What work has AQA done with Comparative Judgement?

Much research into CJ continues within AQA.

We successfully used it to confirm that grade boundaries across two different GCSE English Language series were at similar standards.

We explored, with Ofqual, using it to estimate grade boundaries in GCSE English Language and GCE Chemistry.

It also reliably ranked AS Geography essays and showed that examiners were making valid judgements.

All of this shows that Comparative Judgement is a useful research tool but there are still many more questions to be answered before AQA has full confidence in using it for exams themselves.

Computing

Could girls be the secret to boosting the UK’s growth as a technology superpower?

What if women played a more central role in responding to the rapid technological changes in our world? Girls in England outperform boys at every grade level but disproportionately don’t take Computer Science at GCSE.

Assessment

What can this year’s GCSE entries tell us to look for in tomorrow’s results?

With GCSE Results published on 22nd August, Dr Chinwe Njoku looks into the underlying data on what subjects this year's cohort took and how this has changed from previous years.

Assessment

A-level maths students hit six figures – what’s behind its popularity?

On Thursday, more than 100,000 A-level maths students in England will find out their results – 11.4% more than last year. Why the upturn? Dr Chinwe Njoku, AQA Education Insights Lead and former maths teacher, was heartened by the news and keen to look at the story behind the data.

Education Policy

What comes after ‘urgent’ for the new Education Secretary?

After the burning issues are addressed, what should come next for the new Education Secretary?

Education Policy

Labour’s oracy plans: They need clear goals

Sir Keir Starmer has said he wants to boost students’ confidence by raising the importance of speaking skills – oracy. In this previously published blog, Reza Schwitzer, AQA’s director of external affairs, applauds the ambition but warns there needs to be clear goals

Education Policy

Through the looking glass: How polling the public can help policymakers learn about themselves

Public attitude data is key to effective policymaking. Proper polling can reveal what people think about existing policies and what they want for the future. But, if looked at from a different angle, it can also help policymakers question themselves and their assumptions about the public. In this blog, AQA’s Policy and Evidence Manager Adam Steedman-Thake, reveals the lessons he learned about himself while reading a recent public attitude survey.

Assessment

Assessing oracy: Is Comparative Judgement the answer?

Oracy skills are vital to success in school and life. And yet, for many children, opportunities to develop them are missed. Educationalists are engaging in a growing debate about where oracy fits into the school system. Labour has put it at the heart of its plans to improve social mobility and an independent commission is looking at how it is taught in the classroom. This renewed focus on oracy means it is more important than ever that teachers have a way to reliably assess and understand their students’ attainment and progression. Amanda Moorghen of oracy education charity Voice 21 explains how Comparative Judgement can help with that and why it may be a game changer.

Education

TV subtitles as an aid to literacy: What does the research say?

Jack Black is probably best known in educational circles for playing a renegade substitute teacher in School of Rock. But the Hollywood star has made a more conventional foray into education by backing the use of TV subtitles to improve child literacy. Stephen Fry and the World Literacy Foundation also want parents to use their TV remotes to get children reading. So, could this simple click of a button be a solution to boost pupils’ reading skills? AQA’s resident expert on language teaching, Dr Katy Finch, casts her eye over the research to see whether it stacks up.

Data Analysis

What is left behind now education’s Data Wave has receded?

Is data the solution to all education’s issues? About a decade ago the prevailing wisdom said it was. Advocates of this Data Wave argued that harvesting internal statistics would help schools solve issues such as teacher accountability and attainment gaps. As with all waves, after crashing onto the beach they recede, leaving space for another to roll in. In this blog, teacher, author and data analyst Richard Selfridge looks at the legacy of the Data Wave to see what schools can take from it.

International Approaches

Finland & PISA – A fall from grace but still a high performer?

Finland was once recognised as one of the most successful educational systems in the world. At the turn of the millennium, it topped the PISA rankings in reading, maths and science. But by 2012, decline set in. The last set of results showed performances in maths, reading and science were at an all-time low. In this blog Dr Jonathan Doherty of Leeds Trinity University outlines some reasons that may account for the slide.

Download a PDF version.

Download a copy of this content to your device as a PDF file. We generate PDF versions for the convenience of offline reading, but we recommend sharing this link if you'd like to send it to someone else.

Comparative Judgement: The pros and cons

What is Comparative Judgement?

How does Comparative Judgement work?

What are the advantages of Comparative Judgement?

Comparative Judgement and accuracy

Comparative Judgement and validity

Comparative Judgement is quick

Are there any downsides of using Comparative Judgement?

Comparative Judgement can be resource intensive

Who answers for a Comparative Judgement decision?

Does Comparative Judgement work for all forms of assessment?

Is there much research into Comparative Judgement?

Is it time to start using Comparative Judgement more widely?

What work has AQA done with Comparative Judgement?

Could girls be the secret to boosting the UK’s growth as a technology superpower?

What can this year’s GCSE entries tell us to look for in tomorrow’s results?

A-level maths students hit six figures – what’s behind its popularity?

What comes after ‘urgent’ for the new Education Secretary?

Labour’s oracy plans: They need clear goals

Through the looking glass: How polling the public can help policymakers learn about themselves

Assessing oracy: Is Comparative Judgement the answer?

TV subtitles as an aid to literacy: What does the research say?

What is left behind now education’s Data Wave has receded?

Finland & PISA – A fall from grace but still a high performer?

Join the conversation on Twitter

Download a PDF version.

Sign up to our newsletter