300 likes | 315 Views
This article explores the differences in assessment practices between Europe and the United States, including the focus on achievement vs. effort, high-stakes assessment, education and the law, and the origins of intelligence assessment. It also discusses technical issues in assessment, such as interpretability over time and reliability.
E N D
Craft and science: European and American traditions in assessment AEA-Europe 5th annual conference, Budapest, Hungary; November 2004 Dylan Wiliam, ETS
What can we say about the differences? • Nothing really • If you’re not confused, you don’t really understand the situation • USA and Europe each have ~50 systems • Variability within is greater than difference between • But here goes…
Education in America • Highly localized • 50 states, 17 000 school districts, 100 000 schools • Education controlled and funded locally • High proportion of offices filled by election • Residential segregation • Huge discrepancies in per-student funding • Lower Merrion school district: $19600 per year • Rural Arizona: $3000 per year • Grade-based system, but not operated as such • Structure of ethnicity and class quite different from Europe
Assessment in schools • European tradition: • Examination-based • Synoptic • Focus on achievement • American tradition: • Coursework-based • Component-based • Focus on effort • Correlation of IQ with school achievement: • UK: ~0.70 • US: ~0.45
Quality in education • US 20th century industrial success • Based on a mass education system of moderate quality • European emphasis: • Elite education of high quality • Scaled up without substantial loss of quality
Assessment for accountability • Demand for accountability • (don’t let the fox guard the chicken coop) • Introduction of accountability tests in US • largely by private-sector publishers • Profit margin on tests: 0 — 5% • Profit margin on textbooks: ~40%
High-stakes assessment • Assessment can be high-stakes for • Students • Teachers and schools • In Europe, high-stakes assessment of students has been used to evaluate teachers • In the USA assessment of schools has been broadened to matter for students
Education and the law • Key issues: • Precision in law-making • Constitution, Bill of Rights, • Availability of appeals and remedies • Litigation, ‘Grade court’ • Recovery of defendants’ costs • Possible in most of Europe • Not possible in most states in the USA
Entry to higher education • Key issues • Selection • Placement • Combined in most European countries • Separate in the USA
Origins of intelligence assessment • British empiricist tradition • Knowledge comes from experience (outside) • Tests of sensory acuity (Galton, Cattell) • Innate differences in acuity of individuals • Focus on measurement • Continental rationalist tradition • Knowledge comes from reasoning (inside) • Tests of reasoning (Binet) • All students share the same trajectory, at different speeds • Focus on classification
The big test • Binet & Simon’s ideas brought to USA by Goddard • US army recruits 3m new soldiers in 1917 • Yerkes proposes testing for the ‘feebleminded’ • Terman proposes testing all recruits • Otis develops the multiple choice format • 1 726 966 recruits tested by January 1919 • No use made of results • But mass group testing is here to stay
The development of the SAT • 1920: College Entrance Examinations Board sets up a commission: to investigate and report on general intelligence examinations and other new types of examinations offered in several secondary school subjects • After several more commissions … • Scholastic Aptitude Test administered to College Board applicants in 1926
Technical issues in the SAT • Key issues: • Interpretability over time • All 82 versions of the SAT administered between April 1942 and May 1969 were equated to the original norm group taking the test in 1941 • Legal defensibility
Reliablity • Consistency under changes in • occasion (test-retest) • scorer (mark-remark) • items (question-requestion) Europe USA
Speech acts • Perlocutionary speech acts are statements about what was, is or will be (eg Michael knows his number bonds to 10) • Illocutionary speech acts are performative: they create social facts (eg “I now pronounce you husband and wife”)
Social facts Interviewer: Did you call them the way you saw them, or did you call them the way they were? Umpire: The way I called them was the way they were.
Assessments as speech acts • Assessments in the US are treated as perlocutionary speech acts • Assessments in Europe are treated as illocutionary speech acts • That’s why there is no measurement error in Europe
Item-response modelling • All test theories assume an item response model • Classical test theory assumes a flat line • Gutman scaling assumes a step function • All real items are somewhere between the two • US modellers assumed a logistic curve • Computationally tractable (if unidimensionality is also assumed) • Can be made very close to cumulative normal • Others question these assumptions • e.g. Goldstein (1979, 1980, 1982, 1989)
Assessment formats • Debates about assessment formats are often disguised debates about constructs • Bias is a property of inferences, not tests • So, multiple-choice tests are not biased • Multiple-choice vs Constructed-response • CR items yield more information, but take longer • MC items yield more information per minute • Fewer items means more student-task effects • Correlations between MC and CR formats are high, but can change (eg NAEP) • Reliance on MC items has backwash effects
Standard setting • Test-centred vs. examinee-centred • Key issue: do you set the cut-score before or after you see the results • Policy-oriented vs. evidence-oriented • Key issue: do you adjust the cut-score to fit the test, or adjust the test to fit the cut-score
Not invented here syndrome… • Constructivism • Standards-based assessment/Outcomes based assessment
Standards-based assessment • What?! • Originally criteria for high-school diploma set locally • Introduction of state tests • In many (most?) cases state tests are not aligned to district curricula
No Child Left Behind Act • Reauthorization of the Elementary and Secondary Education Act (ESEA) • Commanded bi-partisan support • Not a plot to declare all state schools failing • States must establish state standards • But are free to decide how to do this • Huge differences in standards • Students tested • Language and maths grades 3 to 8 and in high school • Science 3 times (in grades K-5, 6-8, 9-12)
Key features of NCLB • All students to be ‘proficient’ by 2014 • Achievement rather than growth • States determine intermediate steps to this goal • Some states opt for steady progress • Others go for ‘Balloon payments’ • Each year, each school must make adequate yearly progress to this goal • Cohort based • Disaggregation of key groups • Students with special needs • Ethnic minorities • Language learners • Failure to achieve AYP has profound impact
Exit from higher education • Key issues • Qualification • Licensure • Combined in most European countries • Separate in the USA
The mangle of practice • Andrew Pickering (1995) • Critique of traditional views of science • Science is what scientists do • Science as a series of truths waiting to be found • The development of traditions of assessment are not just bound up in culture • They are the result of messy, contingent, fragile, politically and personally influenced events
In summary • Viewed from outside, any national assessment system seems to work in practice, but not in theory • Assessment systems are much smarter than they appear… • …and are exquisitely attuned to the constraints and affordances provided by the contexts in which they operate. • We can learn from them, but we cannot import them