Mutant algorithm – exams during Covid

In Spring 2020, due to the Covid 19 pandemic, UK schools were closed to all except the children of key workers. It was decided early on that this would mean the normal A’ level exams, administered at school leaving age (usually 18), would have to be cancelled. However, these exam results are used both for future jobs and university entrance. Ofqual, the body responsible for overseeing UK exams, was tasked with coming up with an alternative way to deliver grades, based on teachers’ predicted grades for their pupils, which are collected routinely every year to help universities make initial (usually conditional offers). However, it was known that predicted grades might overstate actual performance and so might not be fair both within the year (if different schools or teachers were more or less generous in their grading), or compared with previous years (if predicted grades were over-generous on average).

Ofqual created an algorithm that combined various factors including the predicted grade and statistics on the performance of each school in previous years. This was then used to create a corrected grade that could be higher or lower than the teachers’ predictions.

When these predicted exam results were published in the summer of 2020, it created uproar.

The system placed constraints on how many pupils could achieve certain grades and based its outputs on a schools’ prior performance, downgrading around 40 per cent of predicted results (although still allocating overall results far higher than previous years). The data across centres/schools was not equal and different centres had applied different principles to its standardization.

Individual pupils felt that their grades had been unfairly reduced, sometimes substantially, and furthermore analysis of the overall effect showed it was operating unfairly between different parts of society. Overall, the algorithm allowed a level of ‘grade inflation’ compared with previous years, so that there were more top grades overall. However, analysis by journalists found that the increase amongst those going to fee-paying schools was around twice as high as that for state-funded schools [DM20]; that is the algorithm seemed to entrench inequality.

There followed media furore and protests by pupils outside parliament. Government ministers blamed Ofqual and the Prime Minister described it as ‘mutant algorithm’ [Co20]. Following the uproar from students and advocacy groups who took Ofqual and it’s system to court [Fx20] – the system was withdrawn and Ofqual ended up utilising the teachers’ predicted grades. Ofqual later took the highly unusual step of publishing a 318 pages report, which provided explanations of the logics, goals and objective of the system [Of20].

The sad thing is that Ofqual were trying to be fair. As soon as they knew the exams were to be cancelled, they gathered data on potential bias in teachers’ predicted grades (aka. centre-assessed grades) and produced a public report on their findings [LW20]. This included some data from the UK, but also from other countries. The remarkable thing reading this report is how little data they had to go on. One might have thought that the body responsible for oversight of qualifications would be constantly monitoring data about social, racial and gender equality, but in fact this was limited to a few targeted research projects. Possibly this is because the regular collection of protected characteristics would potentially be seen as problematic.

Another factor evident in the whole process is a mutual lack of trust. Government and Ofqual did not trust the teachers, the pupils did not trust Ofqual and the whole rigour of the exam system is in part built on distrust of the pupils themselves. Exam grades are effectively a replacement for personal relationships.

Looking back at the algorithm itself. One of the problems was the unreasonable expectations of the algorithm – it was being asked to do an intrinsically impossible task. Without the actual data on exam results, the belief was that it was possible to create an algorithm that would in a sense fairly (at an individual level) recreate the missing exam grades, whilst also being fair (in a group sense) within and between cohorts. The algorithm was expected to create data out of thin air. It later transpired that early in the process Ofqual had realised this and advised government to scrap the exams entirely rather than attempt to create an algorithmic corrections, but were overruled.

However, perhaps the most enlightening aspect of the story is the way in which the failings of the algorithm cast a mirror on the existing systems.

The Guardian article in August 2020 [DM20] highlighted that the growth in top grades in independent (that is fee-paying) schools was 4.7% compared with 2% for state schools. However, this was an uplift to the previous year where around 45% of pupils in independent schools saw growth in top grades compared to around 20% of those at state schools, that is both had an uplift of 10% of their previous levels. It wasn’t so much that the algorithm was acting unfairly but that the existing A level system is deeply socially divisive.

Finally, one of the reasons why this caused so much discord is that A level results are critically important to pupils. For those continuing to university the grades determine which university you can attend or whether you are admitted at all. For those going into the workplace, these are the grades that will be on their job applications for years to come. These grades quite literally shape the rest of the students’ lives. Even in a normal year, a cold at the wrong time of year or a difficult time at home can make a difference to a student’s life that will last years. This can be a personal tragedy for anyone, but also it is evident both from the Ofqual report and the raw grade figures that these impacts to not fall uniformly in society.

As a postscript to this story, after the dust settled Ofqual updated the literature review from 2020. This revised report, based on UK and overseas evidence, concluded that centre-assessed grades were likely to disadvantage both poorer pupils and those with special educational needs [LN21]. Researchers at UCL also did a post-hic survey which highlighted the disproportionate impact of Covid on socially disadvantaged young people [AM21]. This survey included assessing the impact of the last-minute change from the Ofqual algorithm to using teachers’ grades. It found that dropping the algorithm had led to 15% higher uplift for pupils with graduate parents [AM21b] – that is dropping the algorithm had yet further increased social disparity.

In other words, despite it being a bad idea in the first place, despite the public protests, government approbation and media vilification, Ofqual probably did a good job after all.

References

[AM21] Anders, J., Macmillan, L., Sturgis, P. and Wyness, G. (2021). Inequalities in young peoples’ educational experiences and wellbeing during the Covid-19 pandemic (CEPEO Working Paper No. 21-08). Centre for Education Policy and Equalising Opportunities, UCL. https://EconPapers.repec.org/RePEc:ucl:cepeow:21-08

[AM21b] Anders, J., Macmillan, L., Sturgis, P. and Wyness, G. (2021). The ‘graduate parent’ advantage in teacher assessed grades. (blog) UCL, Centre for Education Policy and Equalising Opportunities. 8 June 2021. https://blogs.ucl.ac.uk/cepeo/2021/06/08/thegraduate-parentadvantageinteacherassessedgrades/

[Co20] Sean Coughlan (2020). A-levels and GCSEs: Boris Johnson blames ‘mutant algorithm’ for exam fiasco. BBC News. 26 August 2020. https://www.bbc.co.uk/news/education-53923279

[DM20] Pamela Duncan, Niamh McIntyre, Rhi Storer and Cath Levett (2020). Who won and who lost: when A-levels meet the algorithm. The Guardian, 13 Aug 2020. https://www.theguardian.com/education/2020/aug/13/who-won-and-who-lost-when-a-levels-meet-the-algorithm

[Fx20] Foxglove Legal (2020). We put a stop to the A Level grading algorithm! Foxglove Blog, 17th August 2020. https://www.foxglove.org.uk/2020/08/17/we-put-a-stop-to-the-a-level-grading-algorithm/

[LW20] Ming Wei Lee and Merlin Walter (2020). Equality impact assessment: literature review. Office of Qualifications and Examinations Regulation (Ofqual) April 2020. https://assets.publishing.service.gov.uk/media/5e971f1de90e071a145ec51f/Equality_impact_assessment_literature_review_15_April_2020.pdf

[LN21] Ming Wei Lee and Paul Newton (2021). Systematic divergence between teacher and test-based assessment. Office of Qualifications and Examinations Regulation (Ofqual), 17 May 2021. Ref. Ofqual/21/6781. https://www.gov.uk/government/publications/systematic-divergence-between-teacher-and-test-based-assessment

[Of20] Ofqual (2020). Awarding GCSE, AS & A levels in summer 2020: interim report. Office of Qualifications and Examinations Regulation (Ofqual)13 August 2020. Ref: Ofqual/20/6656/1. https://www.gov.uk/government/publications/awarding-gcse-as-a-levels-in-summer-2020-interim-report