Blog Archives

Exam standards

8/27/2012

The current, and of course annual, debate about the examination system, rumbles on. Conservatives (with a small c) are arguing that better results mean a decline in standards, and others say that it’s because teaching and learning have improved. The apparent change in the grade boundaries for GCSE English has raised the question of norm referencing versus criterion referencing, but the wider debate is also about questioning the purpose of exams too, with a particular obsession about university entry. However, the answers to these two questions don’t have to be related. For what it’s worth, I believe that keeping standards the same over time is important (for reasons that aren’t discussed much) and that widening participation to higher education doesn’t need standards to change, and I also have a method that would keep standards and couldn’t be ‘gamed’.

Peter Wilby* in the Guardian seems to argue that the proportion of passes goes up because we are allowing more people to go to university:

‘Gove should understand that exams are rationing devices. Success rates depend not on objective measures of performance but on the availability of the rationed goods: university places and positions in elite professional occupations. Fifty years ago, when barely 6% went to university, fewer than half the entrants to A-level were awarded passes. Now, with higher education available to 36%, the pass rate from a far larger pool of candidates is close to 100%, and more than 25% get A grades.’ (Guardian, 26 August 2012)

Now he’s absolutely right that one purpose of exams is to determine access to higher education. Indeed, this is also true for access to the jobs market. However, I would change the word rationing to sorting. For employers and universities the exam results are an indicator of ability (not fully reliable, mind) and thus can be used alongside other indicators to put the individual applicants in some kind of order, from best to worst. If an institution has ten places or one post, it can take the top ten or top one applicant to fill the vacancy. Occasionally organisations might not recruit if they have no suitable applicants, but this is comparatively rare. The more prestigious courses, universities, and employers get too many qualified applicants and have to choose the best from them.

For simplicity, I’ll use the example of university entry and divide the institutions into three levels: the reality works the same but with more complexity. Let’s say that the top universities have 20,000 places, the middle universities have 50,000 places, and the bottom have 300,000 and when the results come out we find that 20,000 potential students have ABB and above, 50,000 have CCC to BBB, and 250,000 have less than CCC. Any changes in the numbers of A-levels taken or the success rate (whether real or due to grade inflation) result in adaptation by the universities. So if 40,000 students are going to get ABB, then the top universities change their criteria to AAA to get the best 20,000. The middle and bottom universities follow suit so that the best candidates go to the best courses and the worst to the worst. Conversely, if a university has the ability to expand it can take more candidates from lower down the pecking order, and the universities below it lose their best candidates and gain some from below their normal range. This is why it is best described as sorting.

What’s important to note is that, for the purpose of university entrance, overall increases in ability or grade inflation are irrelevant. The sort more or less accurately puts the best candidates in the best courses and so on, all the way to the bottom: whether we have changes in grades or not, the Russell group institutions or Oxbridge get the best candidates, and so on. Conversely, the growth in participation in HE doesn’t need changes to A-level standards, it just needs institutions to change their entry requirements to ensure they get the best candidates they can.

That said, the GCSE English story has a different complexion because of the status of the ‘C at GCSE’ as being some kind of mystical threshold. There are reports of 16 year-olds unable to get into college because they got a D. Now if it was only about sorting, then the colleges would find they had spare places and would offer them to those with Ds (who would have got in last year with a C): this would be the rational response. However, the setting of arbitrary criteria for winning or losing a college place, or considering someone literate or not, means that any movement in standards is especially unfair.

This, of course, is where we need to understand the two ways to maintain standards. Andrew Rawnsley described the events in the Guardian:

'What appears to have happened is that the exam boards suddenly panicked when they saw that there would be an unexpected rise in the top grade pass rate. So they steeply raised the grade boundaries at the last minute. As a result, many students who would have been rewarded with that vital C had they taken the exam in January have instead been stamped with a D.' (Guardian, 26 August 2012)

This implies that the exam boards had started with criterion referencing – ‘test[ing] with accuracy achievement measured against a carefully selected set of criteria’ – that allows results to be compared over time. However, this referencing is hard to do and when it appeared to increase the pass rate again, raising fears that standards are slipping, they fell back on norm referencing - (keeping the quota of grades the same). And because the whole numbers used in marking can’t be divided into fractions, the pass mark is moved from 70 to 71, say, this resulted in a fall in the pass rate. Furthermore, the fact that it was done last minute, with a change in the pass mark from January 2012, means that those losing out have every right to have it overturned, even if they didn’t actually deserve the C when compared to past years.

However, the discussion of norm v. criterion referencing is helpful for understanding why it is useful to keep standards stable. Norm referencing assumes that the ability or achievement of entrants stays broadly similar year to year. Each year we give the top X% of entrants an A, setting the pass mark accordingly. This might mean easier sorting (see above) as universities could just keep their entrance requirements the same, but then they would need to change them if they were increasing places or competing with new establishments. However, the downside is that we don’t know if improvements to education make for a better educated society. More people might be literate, but the pass rate would stay the same. Moreover, we wouldn’t know if Joe, who got an A in 1992, was any more or less literate than Jim, who got an A in 2002.

Criterion referencing means setting out a list of criteria to be reached for a particular mark and keeping it stable over time. This is how the US SAT is supposed to work**, where a number is given and there is ‘no failing score... each college or institution sets their own score standards for admission or awards’. If entrants were better taught, or cleverer, overall then the average mark goes up. Not only does this help employers, who may need to compare people who took exams in different years, it is of most help to educators and their managers, who might know whether education has improved and why. It would also be good for candidates who would be encouraged by the knowledge that only their ability matters for the result, as opposed to where they are in the pecking order (this will still be true for job and HE applications, but that’s life) and that no-one can say the exams are easier.

However, criterion referencing is difficult, and this is why it’s open to accusations of getting easier. How, given the changing question styles, topics, and so on, can we say that a mark of X in a given exam is the same as a mark of X 20 years before, especially if there are rumours of political interference.

I have a solution to this that also allows us to keep the idea of grades, that draws on the combined experience of all the teachers that do the examining, and it came to me about 20 years ago when I was doing some work for the various exam boards***. I thought that the small committees that decide on grade boundaries (see here for the current arrangements) weren’t best placed to do it: as they are just a handful of people they can easily be swayed from any standard by accident or design (see Cambridge Assessment, Zones of Uncertainty), and because the meetings are closed they aren’t seen to be impartial either. Instead, I had the idea that all the examiners (essentially teachers in the summer break) could decide on the boundaries collectively, and because they don’t see the overall picture they cannot decide based on any norms other than their own. It would require each examiner to give a mark and a grade, such that if a maths paper had a maximum mark-scheme score of, say 80, the examiner would use their professional judgement to say whether the 45 script was typical of an A, B, C etc. By combining all the examiners opinions we could get a collective opinion as to where the grade boundaries ought to be. What a particular grade looked like would be the decision of 100s of examiners, all voting according to their long-term experience of teaching and examining. The beauty of this is that any drift in standards would necessarily be extremely slow, and there is no way that politicians or managers could influence the system.

*The more important point of the article was that access to the best universities and jobs remains something that is easier if you are already part of the elite as parents buy good education and so on. This was true before and still true after an expansion of HE, through this sorting mechanism, and mere expansion couldn’t make society more egalitarian. Getting more people to have AAA at A-level, or more people into university doesn’t mean that more people can become barristers, and the mass of AAAs or graduates can be discriminated between by other criteria.

** the SAT methods and scoring schemes have been changed on various occasions in the past.

***It was 22 years ago when I got my GCSE results, 20 since my A-levels. Around this time I also did some work for the exam system and did more after university too. As a child I used to get paid a few pence per script to check the addition – my mum was a school teacher who earned extra cash in the summer doing the marking. I also worked for UCLES as a young adult

0 Comments

Exam standards

Archives

Categories