In January this year, with great sighs of relief, we submitted our final manuscript to SAGE. Entitled “Is assessment fair?” it examines what is meant by fairness in educational assessment, considers examples from this country and overseas and makes recommendations for the future. The book was written before the COVID-19 pandemic (with its tumultuous impact on education), but its subject matter has been set ablaze by the row this summer about the awarding of grades for A levels, GCSEs and equivalent qualifications when it was not possible for students to sit actual exams. The approach initially taken in all the countries of the UK, involving grades calculated using statistical information, was howled down as ‘unfair’. A scathing email to one of us from an academic colleague ended, “Fair assessment by algorithm? I don’t think so.” So, were the critics right? Was the approach originally proposed this summer unfair? And was the action eventually taken – awarding grades assessed by schools and colleges – any fairer?
We believe that our book will help provide a conceptual framework and a language for understanding and addressing such questions.
The first response may be obvious, but is no less important for that: it depends what you mean by “fair”. In our book we distinguish several senses of “fair” that are all potentially relevant to assessment. One of these is relational fairness – treating (relevantly) like cases alike. Most (if not all) assessments involve some kind of ‘discrimination’, meaning distinguishing between levels of achievement or between candidates who perform differently in relevant respects. The discrimination is fair if it is based on relevant considerations and unfair if it is based on something else, such as the candidate’s race or gender. Whilst no-one expects every exam candidate to get the same mark, we would think it was unfair if students from some schools, boys/girls, or students from particular ethnic backgrounds were marked more strictly/generously than others. We expect the same criteria to be applied to all candidates.
Another form of relational fairness is the expectation that the standards used for marking exams will remain stable over time. This concern, which is written into Ofqual’s statutory objectives, does seem relevant to fairness, at least over a limited period of years – if my son is competing for a scarce university place with someone whose exam was marked more generously a year later, then that might be unfair.
But that is not the only relevant meaning of “fair” – another is that a fair outcome is deserved. In the book we consider discussions of fairness by philosophers, going back as far as Aristotle. These largely derive from fundamental notions of some kind of equality (relational fairness) and desert. We tend to think of desert at an individual level – it is fair for a student to get the grade that his or her work deserves. Where that doesn’t happen for any reason, that seems unfair. This concern lies behind many of the distressing personal accounts which we heard when students received their results this summer. Linked to the importance of desert is the sense of “fair” in which a fair outcome is what those affected can legitimately expect.
A second response – perhaps just as obvious but also highly relevant – is that something can be fair in some respects and unfair in others. Traditional exams are often thought of – perhaps uncritically – as a paradigm of fairness, but there are unfairnesses in using them as s as a source – let alone the only source – of evidence for assessment.
Relational fairness can be judged at different levels. An exam in which economically disadvantaged students have poorer outcomes than those of their richer contemporaries may be technically fair in many senses – for example, it may have been scrutinised to avoid bias in the test questions – but the outcomes may be unfair at a higher level, because they reflect unjust inequalities in society. Some of the inequalities observed in the results this year may reflect those factors rather than the statistical model used.
Also, some aspects of fairness may be judged to be more important than others. And that judgement may differ in different circumstances. For example, it could be argued that although the maintenance of standards over time is one kind of fairness – and in normal times it helps to sustain confidence in the system – given the abnormality of this year, it was not as important as attempting to give each individual student the grades they deserved. Once the decision had been taken that the exams could not go ahead in the summer, the problem was how to enable students to get the grades they deserved in the absence of exam-based evidence about each individual, while maintaining relational fairness as much as possible.
So, was the approach proposed this summer fair? In our view it was an attempt to use statistics to help achieve relational fairness between as many candidates as possible. Statistical information has for many years been taken into account in marking exams, but it carried greater weight this year. Much analysis took place of alternative models to try to achieve that and there were voluminous consultations at various stages. In the event, there was a lot of criticism of the calculated grades for favouring some kinds of centre and disadvantaging others. But these criticisms may also apply to the centre-assessed grades – possibly more so?
No statistical model could guarantee to give each individual student the grade they deserved. That was recognised at the time, and the main means of remedying individual unfairnesses that the statistical analysis could not capture was to be the appeals process. However, even if the eventual outcome is fair, should a student not have a legitimate expectation of receiving a grade that is as fair as possible, without having to go through the emotional turmoil of huge disappointment and lodging an appeal? There needed to be some way of balancing individual desert with systemic fairness. The approach proposed – using statistics for systemic fairness and picking up individual unfairnesses through appeals – was placing a burden on the appeals process that could not be sustained.
There is no silver bullet to guarantee fair assessment for all candidates in normal times, let alone now. We hope that our book will help to provide a language for understanding and evaluating assessment issues relating to fairness. That vocabulary should support the debate that is needed and which should be public and inclusive, not held behind closed doors. In the book we advocate a “situated” view of assessment – considering assessments in the broader context in which they are taken, rather than isolating them from their context and just getting them technically right. In the academic year that is starting now, when many students will have missed months of learning and society is still coping with a totemic crisis, the moral judgement required may be that relational fairness between candidates – and maintenance of standards over time – are less important than providing the best quality of evidence (whatever the precise nature of that evidence) to enable students to progress in their lives and in their education. That may mean more inconsistency in outcomes than is usually deemed acceptable. It should also require a more humane way for individuals who are not typical of their school or of any category in which they are analysed to produce evidence to show what they can really do.