320 likes | 634 Views
Improving assessment: the key to education reform Daisy Christodoulou Director of Education, No More Marking Research Ed, Saturday September 9 th 2017. Improving assessment: the key to education reform.
E N D
Improving assessment: the key to education reform Daisy Christodoulou Director of Education, No More Marking Research Ed, Saturday September 9th 2017
Improving assessment: the key to education reform National exams and schools' internal assessment systems have a big impact on what gets taught in the classroom, and often lead to unintended and damaging consequences. How can we change assessment so that it helps to improve education rather than distorting it?
Why do we need to improve assessment? Better measurement leads to improvements and innovations Bad measurement leads to distortion and unintended consequences
Four assessment practices that are distorting • Using prose descriptors to grade work and give pupils feedback • Marking essays using absolute judgement • Viewing grades as discrete categories • Thinking that test scores matter!
Prose descriptors aren’t accurate… ‘Can compare two fractions to identify which is larger’ 90% get this right Which is bigger: 3/7 or 5/7? Which is bigger: 3/4 or 4/5? 75% get this right 15% get this right Which is bigger: 5/7 or 5/9? Qtd in Wiliam, Principled Assessment Design, SSAT 2014
…and they aren’t helpful! • ‘I remember talking to a middle school student who was looking at the feedback his teacher had given him on a science assignment. The teacher had written, • “You need to be more systematic in planning your scientific inquiries.” I asked the student what that meant to him, and he said, • “I don’t know. If I knew how to be more systematic, I would have been more systematic the first time.” • This kind of feedback is accurate—it is describing what needs to happen—but it is not helpful because the learner does not know how to use the feedback to improve. It is rather like telling an unsuccessful comedian to be funnier—accurate, but not particularly helpful, advice.’ • Dylan Wiliam, Embedded Formative Assessment
Good and bad multiple-choice questions… What is the capital of Moldova? • Baku • Tbilisi • Chisinau • Minsk • Yerevan What is the capital of Moldova? • Paris • London • Chisinau • New York • Mexico City Unambiguously wrong... but still plausible! Unambiguously wrong... but notplausible! Create unambiguously wrong but plausible distractors!
What is 20% of 300? Correct answer • 60 • 20 • 15 • 30 Common pupil misconception: if you work out 10% by dividing by 10, then you must work out 20% by dividing by 20
Prose descriptors & written comments Bad practice • Using prose descriptors to judge work and give feedback Good practice • Define descriptors as questions and use those instead What’s the research? • Wolf, Alison. "Portfolio assessment as national policy: the National Council for Vocational Qualifications and its quest for a pedagogical revolution." Assessment in Education: principles, policy & practice 5.3 (1998): 413-445, p.442. • Polanyi, Michael. Personal knowledge. Routledge, 2012 • Sadler, D.R. 1987. ‘Specifying and promulgating achievement standards.’ Oxford Review of Education, 13: 191–209.
Absolute judgement (1) Stealing a towel from a hotel(2) Keeping a dime you find on the ground(3) Poisoning a barking dog (1*) Testifying falsely for pay(2*) Using guns on striking workers(3*) Poisoning a barking dog Mozer, Michael C., et al. "Decontaminating human judgments by removing sequential dependencies." Advances in Neural Information Processing Systems 23 (2010).
Comparative Judgement Normally, we ask: does this essay meet the criteria?
Instead, we should ask, is this essay better than this essay?
Absolute judgement Bad practice • Trying to mark essays absolutely Good practice • Use comparative judgement instead What’s the research? • Thurstone, Louis L. ‘A law of comparative judgment.’ Psychological review 34.4 (1927) • Laming, Donald. Human judgment: the eye of the beholder. Cengage Learning EMEA, 2003.
Reporting Paul Ringo John George Expected Standard EXS Greater Depth GDS Working towards WTS
Viewing grades as discrete categories • Grades ‘simply get layered on top of the scale’. • ‘The labels chosen for performance standards [such as working towards, expected standard] have their own meanings independent of their use with the standards, and these clearly influence how people interpret the results they are given.’ Koretz, Daniel M. Measuring up. Harvard University Press, 2008
Viewing grades as discrete categories Bad practice • Viewing grades as discrete categories Good practice • Recognise that grades are lines on a continuum What’s the research? • Koretz, Daniel. Measuring up. Harvard University Press, 2008
Thinking that test scores matter! The really important idea here is that we are hardly ever interested in how well a student did on a particular assessment. What we are interested in is what we can say, from that evidence, about what the student can do in other situations, at other times, in other contexts. Some conclusions are warranted on the basis of the results of the assessment, and others are not. The process of establishing which kinds of conclusions are warranted and which are not is called validation. Wiliam, Dylan, Principled Assessment Design, SSAT: London, 2014 Test scores reflect a small sample of behaviour and are valuable only insofar as they support conclusions about the larger domains of interest. This is perhaps the most fundamental principle of achievement testing. Koretz, Daniel M. Measuring up. Harvard University Press, 2008.
The sample and the domain The domain The sample
In political polling… The domain – the entire electorate – 40m people The sample = 1,000 voters who are representative of the larger domain
In TV advertising… The domain – number of people watching particular TV channels The sample – people watching at three specified weeks of the year
In the postal service… The domain – delivery times to all addresses in US – c. 300 m people The sample – delivery to 1,000 addresses
In a vocabulary test… The domain – all the words you know – approx 20,000 The sample – a 40 word vocab test
Any exam All your skills and knowledge What can be assessed in 2-3 hours
Goodhart’s Law / Campbell’s Law • "When a measure becomes a target, it ceases to be a good measure.“ • "The more any quantitative social indicator (or even some qualitative indicator) is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor."
Thinking that test scores matter! Bad practice • Thinking that the test score by itself matters! Good practice • Recognise that what matters are the inferences we can make from the test score. What’s the research? • Wiliam, Dylan, Principled Assessment Design, SSAT: London, 2014 • Koretz, Daniel, Measuring up. Harvard University Press, 2008
Four practical assessment errors • Using prose descriptors to grade work and give pupils feedback • Marking essays using absolute judgement • Viewing grades as discrete categories • Thinking that test scores matter!