270 likes | 278 Views
Explore the purposes, structure, and format of large-scale assessments to support learning. Understand the role of teachers and the impact of contextual factors on assessment. Examine the Lake Woebegon effect and the challenges of reconciling different pressures in assessment.
E N D
Designing Large-Scale Assessment to Support Learning Dylan Wiliam Institute of Education, University of London
Overview • The purposes of assessment • The structure of the assessment system • The locus of assessment • The extensiveness of the assessment • Assessment format • Scoring models • Quality issues • The role of teachers • Contextual issues
Functions of assessment • For evaluating institutions • For describing individuals • For supporting learning • Monitoring learning • Whether learning is taking place • Diagnosing (informing) learning • What is not being learnt • Forming learning • What to do about it
The Lake Woebegon effect All the women are strong, all the men are good-looking, and all the children are above average. Scores • Time
Goodhart’s law • All performance indicators lose their usefulness when used as objects of policy • Privatization of British Rail • Targets in the Health Service • “Bubble” students in high-stakes settings
Reconciling different pressures • The “high-stakes” genie is out of the bottle, and we cannot put it back • The only thing left to us is to try to develop “tests worth teaching to”
Sensitivity to instruction 1 year Distribution of attainment on an item highly sensitive to instruction
Sensitivity to instruction (2) 1 year Distribution of attainment on an item moderately sensitive to instruction
Sensitivity to instruction (3) 1 year Distribution of attainment on an item relatively insensitive to instruction
Insensitivity to instruction • A key, and under-investigated, idea • Strongly affected by the normal mechanisms of test development • Has profound consequences for the performance of cohorts over time • Social promotion vs. grade retention
Curriculum design • Curricula must be designed vertically, not horizontally • Key idea: when someone gets better, what is it that gets better? • Learning progressions
The locus of assessment • Authority • Who creates the assessment? • Resources • What resources does the student have? • Interactivity • Negotiation of meaning of response • Scoring • Externalization of standards increases accuracy
The extensiveness of assessment • Are all students assessed on the same basis? • Notions of “fairness” • Adaptations and adjustments for some • But not all • Treating different people the same is no fairer that treating similar people differently
Assessment formats • Multiple-choice • Larger (but unsystematic) coverage • Low cost • Negative backwash effects • Constructed response • Large item-student effects • High cost • Positive backwash effects
Item format and sex-differences Boys do better Girls do better
Scoring models • High cut-scores • Positive experiences for students • Strong warrants for particular skills • Low reliability • Moderate cut scores • Negative experiences for some students • Weak warrants for particular skills • High reliability
Quality issues • Threats to validity • Construct under-representation • Construct-irrelevant variance • Unreliability • Dynamic trade-offs between these
The role of teachers • Using teacher assessment in certification is attractive: • Increases reliability (increased test time) • Increases validity (addresses construct under-representation) • But problematic • Problems of bias (construct-irrelevant variance) • Lack of trust (“Fox guarding the hen house”)
A possible model • All students are assessed at test time • Different students in the same class are assigned different tasks • The performance of the class defines an “envelope” of scores, e.g. • Advanced: 5 students • Proficient: 8 students • Basic: 10 students • Below basic: 2 students • Teacher allocates levels on the basis of whole-year performance
Benefits and problems • Benefits • The only way to teach to the test is to improve everyone’s performance on everything (which is what we want!) • Validity and reliability are enhanced • Problems • Students’ scores are not “inspectable” • Assumes student motivation
The effects of context • beliefs about what constitutes learning; • beliefs in the reliability and validity of the results of various tools; • a preference for and trust in numerical data, with bias towards a single number; • trust in the judgments and integrity of the teaching profession; • belief in the value of competition between students; • belief in the value of competition between schools; • belief that test results measure school effectiveness; • fear of national economic decline and education’s role in this; • belief that the key to schools’ effectiveness is strong top-down management;
Conclusion • There is no “perfect” assessment system anywhere. • Each nation’s assessment system is exquisitely tuned to local constraints and affordances • Assessment practices have impacts on teaching and learning which may be strongly amplified or attenuated by the national context. • The overall impact of particular assessment practices and initiatives is determined at least as much by culture and politics as it is by educational evidence and values.
Conclusion (2) • It is probably idle to draw up maps for the ideal assessment policy for a country, even although the principles and the evidence to support such an ideal might be clearly agreed within the ‘expert’ community. • Instead, focus on those arguments and initiatives which are least offensive to existing assumptions and beliefs, and which will nevertheless serve to catalyze a shift in them while at the same time improving some aspects of present practice.