Assessment
Comparable Outcomes: Setting the standard?
What is the comparable outcomes framework, how does it underpin grade standards and are there alternatives?
Why is the role of comparable outcomes being debated?
The comparable outcomes framework has been at the heart of standards and grade setting for GCSEs, AS and A-levels in England over the last decade. Politicians credit it for having “halted” grade inflation.[1]
However, over time, stakeholders have raised concerns around its use and perceived impact on schools in particular.
This briefing describes how comparable outcomes is used, its key benefits, challenges and alternative approaches to setting standards.
What is the comparable outcomes framework?
Comparable outcomes is a framework that exam boards in England, under the oversight of the exams regulator Ofqual, use to guide the setting of grade boundaries for GCSEs, AS and A-levels. By applying the comparable outcomes framework, exam boards and the regulator are better able to standardise the grades awarded each year at a national level. The aim of the comparable outcomes framework is to ensure that grades are comparable over time and across exam boards.[2]
Why is the comparable outcomes framework used?
For students, the value of GCSE, AS and A-level grades depends on their currency with employers and education institutions, such as universities.
Without any attempt at standardisation, qualifications would lose their value. If an employer or university cannot be confident as to what level of attainment is represented by different grades awarded to a student, the value of that student’s qualifications is undermined.
The comparable outcomes framework is used in England as the preferred method for enabling greater standardisation of grades.
Crucially, applying the comparable outcomes framework means a grade in one subject in one year can more meaningfully be compared:
- with other years, e.g. a grade 6 in GCSE Maths in 2018 will reflect a similar level of attainment to a grade 6 in GCSE Maths in another year;
- across exam boards, e.g. the outcome of a student taking an AQA exam should be comparable to the outcome of a student taking the same qualification from another exam board.[3]
What is the principle underlying the comparable outcomes framework?
The basic principle that underpins the comparable outcomes framework is that for a qualification where the entry cohort is similar to previous years, the overall proportion of students achieving each grade should also be similar to previous years, with some adjustment for differences in the difficulty of the papers.[4]
How does the comparable outcomes approach work in practice?
Each exam board uses the same statistical process to model grade outcomes for each of their qualifications. Ofqual oversees the application of comparable outcomes by exam boards, and describes the approach as follows:[5]
“Predictions are based on the relationship [emphases added] between prior attainment and national results in a reference year. Exam boards use prior attainment at Key Stage 2 when predicting GCSE outcomes, and prior attainment at GCSE when predicting AS and A level outcomes.”
These predictions are used to generate statistically recommended grade boundaries i.e. where the statistics suggest the position of grade boundaries should be. Senior examiners then look at student exam papers around each boundary to see whether they represent an appropriate level of performance. The examiners make adjustments to the boundaries as they see necessary. The weighting given to the statistical predictions in making decisions about grade boundaries also depends on the robustness of the statistics.
Final grade boundaries are therefore decided by a combination of the examiner perspective and the statistics.
What are the benefits of the comparable outcomes framework?
In addition to improving comparability over time and across exam boards, there are other key benefits to the application of the comparable outcomes framework.
First, the comparable outcomes approach enables new qualifications – and new versions of existing qualifications – to be introduced without penalising the earliest cohorts who take these qualifications.[6]
When new qualifications are introduced, there can be a sudden drop in performance followed by a slow recovery, often referred to as the ‘Sawtooth Effect’.[7] There are a number of reasons why the first students to take a new qualification might be expected to perform worse relative to later cohorts.[8] For example, teachers may be less familiar with teaching the content of the new specification, and there may be fewer supporting materials available, such as text books and sample exam papers.[9]
The comparable outcomes framework ensures there is a smooth transition between old and new qualifications (e.g. in the wake of a reform of the national curriculum).
Second, the comparable outcomes approach makes it easier to separate out changes in performances caused by differences in the difficulty of the paper from changes caused by differences in the students taking the paper. For example, if a GCSE Physics paper is harder than the paper from the previous year – despite the many quality assurance processes exam boards use to make the difficulty of papers as similar as possible – the comparable outcomes approach is used to help ensure students sitting the harder paper aren’t unfairly penalised for doing so.[10]
Third, the comparable outcomes approach has proved very effective in addressing concerns about grade inflation. In periods of stability, there has been a historical tendency for average grades to increase over time, damaging trust in the qualifications.[11] The comparable outcomes framework helps stabilise the value of the qualifications for students, universities and employers.[12]
What are the challenges with applying the comparable outcomes framework in setting grade boundaries?
A number of stakeholders have criticised the use of the comparable outcomes approach to inform grade boundaries for GCSE, AS and A-levels in England.
These criticisms typically do not stem from concerns with the use of the comparable outcomes framework itself, nor the policy decision to attempt to standardise grades.
Instead, these concerns typically relate to the interaction of these policy choices with other aspects of the school and accountability system.
In particular, there is concern that the comparable outcomes approach does not reflect any genuine rise (or fall) in attainment in the grade outcomes awarded.[13] [14] By maintaining stability of outcomes at a national level, it is sometimes perceived that individual schools cannot demonstrate improvements, or that they cannot do so without a corresponding drop in another school’s results, although there has been no decline in the standards achieved by that school.[15] [16] It is therefore felt that comparable outcomes cannot fully reflect improvements in teaching and professional development.
Another concern arises from the way in which applying the comparable outcomes framework can mean a broadly comparable proportion of students is awarded a GCSE grade 3 to 1 each year. Although exam boards do not award ‘fail’ grades, the government’s policy of describing grade 4 as a ‘standard pass’ means grades 3 to 1 are often viewed implicitly as fail grades. Some stakeholders argue that in effect, some students will ‘inevitably’ fail, and that this is unacceptable in the context of student trust and experience, as well as damaging to student motivation.
Several points can be made in response to these concerns.
First, in practice, large cohort-level changes in performance in a single year are rare.[17] [18] Small changes at a national level can and do occur even within comparable outcomes – the process aims for stability over time, it does not fix outcomes. Over several years, trends in performance could be observed.
Second, as an additional safeguard, Ofqual has introduced the National Reference Test (NRT) to detect any improvements in teaching standard over time in GCSE English Language and Maths, at a national level.[19] The test is administered to a sample of Year 11 students in the March before they sit their GCSE examinations. The results could be used to amend the GCSE grade boundaries if any large changes in performance were observed.
Historically, prior to the adoption of the comparable outcomes approach, two other processes were used to ensure comparability of grades within the exam system in England. Both processes are, to some extent, still applied alongside the comparable outcomes framework, but in a more nuanced way.
Norm-referencing sees each grade consistently awarded to the same percentage of each subject cohort consistently year on year and across exam boards without adjusting results based on previous years. This approach was popular in the 1950s but fell out of favour towards the 1980’s.[20]
However, norm-referencing confronts a number of challenges associated with the comparable outcomes approach – e.g. the inability to demonstrate systemic improvements in learning outcomes – without the additional flexibility provided for comparable outcomes by the use of data from the National Reference Test.
The second approach, criterion-referencing, was popularised in the 1980s as replacement for norm-referenced tests.[21] It aims to measure a candidate’s performance against a set of pre-defined assessment criteria; each grade is awarded to all those who satisfy the performance-related criteria stated for that grade.[22] In contrast to norm-reference tests, in criterion-referenced tests, the performance of other students does not affect a student’s score.[23]
Crucially, however, unlike the comparable outcomes framework, when used alone, criterion referencing does not protect against the Sawtooth Effect described above.
What are the alternatives to standardising GCSE, AS and A-level grades?
If policymakers decided to end the use of the comparable outcomes framework and any attempt to standardise grades, and instead rely entirely on subjective examiner judgement, this would result in the loss of the benefits described above in relation to the currency of grades, grade inflation and the smooth transition between cohorts.
In addition to the loss of such benefits, policymakers would likely need to accept accompanying consequences. For example, the development of bespoke entrance tests to enable differentiation between applicants by employers and education institutions who no longer felt were able to rely on GCSE, AS and A-level grades.
However, the use of bespoke grades would undoubtedly lead to concerns around fairness. For example, variations in the ability of individual schools (or families) to prepare students for the entrance tests of specific institutions would be likely to lead to challenges for social mobility.
What is the future for the comparable outcomes framework?
The principal beneficiaries of comparable outcomes are students, given the critical role the framework has had in underpinning the currency of GCSE, AS and A-level qualifications.
As noted, many criticisms of the use of the comparable outcomes approach to standardise grades do not reflect objections to the approach itself - or the policy decision to standardise GCSE, AS and A-level grades – but rather, they relate to the interaction of these decisions with policy decisions around school accountability and the GCSE ‘pass’ framework.
There is no doubt that the comparable outcomes approach represents a complex process that is prone to misunderstanding by stakeholders,[24] with accompanying risks to public confidence.
Ultimately, policymakers may conclude public confidence in comparable outcomes cannot be maintained alongside the existing school accountability framework. In this situation, policymakers may need to choose between the relative importance of grading standards versus other policy objectives, or consider other options to meet those objectives.
[1] Gibb, N. (2016). Government Response to the Consultation on Ofqual’s National Reference Test. Department for Education. https://questions-statements.parliament.uk/written-statements/detail/2016-03-24/HCWS650
[2] Newton P., E. (2020) What is the Sawtooth Effect? The nature and management of impacts from syllabus, assessment, and curriculum transitions in England. Ofqual.
[3] Ofqual (2017). Inter-board comparability of grade standards in GCSEs AS and A levels 2017. Inter-board comparability of grade standards in GCSEs, AS and A levels 2017 (publishing.service.gov.uk)
[4] Ofqual (2017). Inter-board comparability of grade standards in GCSEs AS and A levels 2017. Inter-board comparability of grade standards in GCSEs, AS and A levels 2017 (publishing.service.gov.uk)
[5] Ofqual (2017) Inter-board comparability of grade standards in GCSEs, AS and A levels 2017. Inter-board comparability of grade standards in GCSEs, AS and A levels 2017 (publishing.service.gov.uk)
[6] Newton, P. E. (2021). Demythologising A level Exam Standards. Research Papers in Education, 1-32
[7] Newton, P. E. (2021). Demythologising A level Exam Standards. Research Papers in Education, 1-32
[8] Jadhav C (2016) Setting standards for new AS qualifications. The Ofqual Blog. https://ofqual.blog.gov.uk/2016/08/09/setting-standards-for-new-as-qualifications/
[9] Newton, P. E. (2021). Demythologising A level Exam Standards. Research Papers in Education, 1-32
[10] Newton P., E. (2020) What is the Sawtooth Effect? The nature and management of impacts from syllabus, assessment, and curriculum transitions in England. Ofqual.
[11] Benton, T. & Sutch, T. (2014). Analysis of use of Key Stage 2 data in GCSE predictions, ARD Research Division. Ofqual
[12] Benton, T. (2016). Comparable Outcomes: Scourge or Scapegoat? Cambridge Assessment Research Report. Cambridge, UK: Cambridge Assessment
[13] Newton, P. E. (2021). Demythologising A level Exam Standards. Research Papers in Education, 1-32
[14] Benton, T. (2016). Comparable Outcomes: Scourge or Scapegoat? Cambridge Assessment Research Report. Cambridge, UK: Cambridge Assessment
[15] Bousted, M. (2015) ‘England’s secondary heads and teachers are stuck in a zero-sum game from which it’s impossible to escape’. TES News. ‘England’s secondary heads and teachers are stuck in a zero-sum game from which it’s impossible to escape’ | Tes News
[16] Benton, T. (2016). Comparable Outcomes: Scourge or Scapegoat?. Cambridge Assessment Research Report. Cambridge, UK: Cambridge Assessment
[17] Sizmur, J., Ager, R., Bradshaw, J., Classick, R., Galvis, M., Packer, J., ... & Wheater, R. (2019). Achievement of 15-year-olds in England: PISA 2018 results: Research report, December 2019. Research and analysis overview: PISA 2018: national report for England - GOV.UK (www.gov.uk)
[18] Coe, R. (2007). Changes in standards at GCSE and A-level: Evidence from ALIS and YELLIS: A report for the ONS.
[19] Stacey, G. (2015). The national reference test. The Ofqual Blog. The national reference test - The Ofqual blog
[20] Newton, P. E. (2021). Demythologising A level Exam Standards. Research Papers in Education, 1-32
[21] Newton, P. E. (2021). Demythologising A level Exam Standards. Research Papers in Education, 1-32
[22] Newton, P. E. (2021). Demythologising A level Exam Standards. Research Papers in Education, 1-32
[23] Prince, R. (2016). Predicting Success in Higher Education: The Value of Criterion and Norm-Referenced Assessments. Practitioner Research in Higher Education, 10(1), 22-38.
[24] Newton, P. E. (2021). Demythologising A level Exam Standards. Research Papers in Education, 1-32