Craft and science: European and American traditions in assessment

Craft and science: European and American traditions in assessment AEA-Europe 5th annual conference, Budapest, Hungary; November 2004 Dylan Wiliam, ETS

What can we say about the differences? • Nothing really • If you’re not confused, you don’t really understand the situation • USA and Europe each have ~50 systems • Variability within is greater than difference between • But here goes…

Ludicrously simplistic comparison

Education in America • Highly localized • 50 states, 17 000 school districts, 100 000 schools • Education controlled and funded locally • High proportion of offices filled by election • Residential segregation • Huge discrepancies in per-student funding • Lower Merrion school district: $19600 per year • Rural Arizona: $3000 per year • Grade-based system, but not operated as such • Structure of ethnicity and class quite different from Europe

Assessment in schools • European tradition: • Examination-based • Synoptic • Focus on achievement • American tradition: • Coursework-based • Component-based • Focus on effort • Correlation of IQ with school achievement: • UK: ~0.70 • US: ~0.45

Quality in education • US 20th century industrial success • Based on a mass education system of moderate quality • European emphasis: • Elite education of high quality • Scaled up without substantial loss of quality

Assessment for accountability • Demand for accountability • (don’t let the fox guard the chicken coop) • Introduction of accountability tests in US • largely by private-sector publishers • Profit margin on tests: 0 — 5% • Profit margin on textbooks: ~40%

High-stakes assessment • Assessment can be high-stakes for • Students • Teachers and schools • In Europe, high-stakes assessment of students has been used to evaluate teachers • In the USA assessment of schools has been broadened to matter for students

Education and the law • Key issues: • Precision in law-making • Constitution, Bill of Rights, • Availability of appeals and remedies • Litigation, ‘Grade court’ • Recovery of defendants’ costs • Possible in most of Europe • Not possible in most states in the USA

Entry to higher education • Key issues • Selection • Placement • Combined in most European countries • Separate in the USA

Origins of intelligence assessment • British empiricist tradition • Knowledge comes from experience (outside) • Tests of sensory acuity (Galton, Cattell) • Innate differences in acuity of individuals • Focus on measurement • Continental rationalist tradition • Knowledge comes from reasoning (inside) • Tests of reasoning (Binet) • All students share the same trajectory, at different speeds • Focus on classification

The big test • Binet & Simon’s ideas brought to USA by Goddard • US army recruits 3m new soldiers in 1917 • Yerkes proposes testing for the ‘feebleminded’ • Terman proposes testing all recruits • Otis develops the multiple choice format • 1 726 966 recruits tested by January 1919 • No use made of results • But mass group testing is here to stay

The development of the SAT • 1920: College Entrance Examinations Board sets up a commission: to investigate and report on general intelligence examinations and other new types of examinations offered in several secondary school subjects • After several more commissions … • Scholastic Aptitude Test administered to College Board applicants in 1926

Technical issues in the SAT • Key issues: • Interpretability over time • All 82 versions of the SAT administered between April 1942 and May 1969 were equated to the original norm group taking the test in 1941 • Legal defensibility

Reliablity • Consistency under changes in • occasion (test-retest) • scorer (mark-remark) • items (question-requestion) Europe USA      

Speech acts • Perlocutionary speech acts are statements about what was, is or will be (eg Michael knows his number bonds to 10) • Illocutionary speech acts are performative: they create social facts (eg “I now pronounce you husband and wife”)

Social facts Interviewer: Did you call them the way you saw them, or did you call them the way they were? Umpire: The way I called them was the way they were.

Assessments as speech acts • Assessments in the US are treated as perlocutionary speech acts • Assessments in Europe are treated as illocutionary speech acts • That’s why there is no measurement error in Europe

Item-response modelling • All test theories assume an item response model • Classical test theory assumes a flat line • Gutman scaling assumes a step function • All real items are somewhere between the two • US modellers assumed a logistic curve • Computationally tractable (if unidimensionality is also assumed) • Can be made very close to cumulative normal • Others question these assumptions • e.g. Goldstein (1979, 1980, 1982, 1989)

Assessment formats • Debates about assessment formats are often disguised debates about constructs • Bias is a property of inferences, not tests • So, multiple-choice tests are not biased • Multiple-choice vs Constructed-response • CR items yield more information, but take longer • MC items yield more information per minute • Fewer items means more student-task effects • Correlations between MC and CR formats are high, but can change (eg NAEP) • Reliance on MC items has backwash effects

Effect of assessment format

Standard setting • Test-centred vs. examinee-centred • Key issue: do you set the cut-score before or after you see the results • Policy-oriented vs. evidence-oriented • Key issue: do you adjust the cut-score to fit the test, or adjust the test to fit the cut-score

Standard setting

Not invented here syndrome… • Constructivism • Standards-based assessment/Outcomes based assessment

Standards-based assessment • What?! • Originally criteria for high-school diploma set locally • Introduction of state tests • In many (most?) cases state tests are not aligned to district curricula

No Child Left Behind Act • Reauthorization of the Elementary and Secondary Education Act (ESEA) • Commanded bi-partisan support • Not a plot to declare all state schools failing • States must establish state standards • But are free to decide how to do this • Huge differences in standards • Students tested • Language and maths grades 3 to 8 and in high school • Science 3 times (in grades K-5, 6-8, 9-12)

Key features of NCLB • All students to be ‘proficient’ by 2014 • Achievement rather than growth • States determine intermediate steps to this goal • Some states opt for steady progress • Others go for ‘Balloon payments’ • Each year, each school must make adequate yearly progress to this goal • Cohort based • Disaggregation of key groups • Students with special needs • Ethnic minorities • Language learners • Failure to achieve AYP has profound impact

Exit from higher education • Key issues • Qualification • Licensure • Combined in most European countries • Separate in the USA

The mangle of practice • Andrew Pickering (1995) • Critique of traditional views of science • Science is what scientists do • Science as a series of truths waiting to be found • The development of traditions of assessment are not just bound up in culture • They are the result of messy, contingent, fragile, politically and personally influenced events

In summary • Viewed from outside, any national assessment system seems to work in practice, but not in theory • Assessment systems are much smarter than they appear… • …and are exquisitely attuned to the constraints and affordances provided by the contexts in which they operate. • We can learn from them, but we cannot import them

Craft and science: European and American traditions in assessment

Craft and science: European and American traditions in assessment

Presentation Transcript

PLAY LEADERSHIP IN AMERICAN AND EUROPEAN PLAYGROUNDS

European Conquests and American Identity

European and American Imperialism

European Conquest and American Identity

Japanese, European, and American Plums

American Military Traditions, Customs, And Courtesies

American and European put option

Native American Traditions

Chinese and American Wedding Traditions

African American Quilting Traditions

Native American Oral Traditions

European COoperation in Science and Technology

European COoperation in Science and Technology

European COoperation in Science and Technology

Three Traditions in Science

Native American Traditions

Three Traditions in Science

ANGLO-EUROPEAN AMERICAN CULTURE: IMPLICATIONS FOR ASSESSMENT AND TREATMENT

Buy American Craft

American Traditions and Holidays