1 / 30

Craft and science: European and American traditions in assessment

This article explores the differences in assessment practices between Europe and the United States, including the focus on achievement vs. effort, high-stakes assessment, education and the law, and the origins of intelligence assessment. It also discusses technical issues in assessment, such as interpretability over time and reliability.

lashbrook
Download Presentation

Craft and science: European and American traditions in assessment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Craft and science: European and American traditions in assessment AEA-Europe 5th annual conference, Budapest, Hungary; November 2004 Dylan Wiliam, ETS

  2. What can we say about the differences? • Nothing really • If you’re not confused, you don’t really understand the situation • USA and Europe each have ~50 systems • Variability within is greater than difference between • But here goes…

  3. Ludicrously simplistic comparison

  4. Education in America • Highly localized • 50 states, 17 000 school districts, 100 000 schools • Education controlled and funded locally • High proportion of offices filled by election • Residential segregation • Huge discrepancies in per-student funding • Lower Merrion school district: $19600 per year • Rural Arizona: $3000 per year • Grade-based system, but not operated as such • Structure of ethnicity and class quite different from Europe

  5. Assessment in schools • European tradition: • Examination-based • Synoptic • Focus on achievement • American tradition: • Coursework-based • Component-based • Focus on effort • Correlation of IQ with school achievement: • UK: ~0.70 • US: ~0.45

  6. Quality in education • US 20th century industrial success • Based on a mass education system of moderate quality • European emphasis: • Elite education of high quality • Scaled up without substantial loss of quality

  7. Assessment for accountability • Demand for accountability • (don’t let the fox guard the chicken coop) • Introduction of accountability tests in US • largely by private-sector publishers • Profit margin on tests: 0 — 5% • Profit margin on textbooks: ~40%

  8. High-stakes assessment • Assessment can be high-stakes for • Students • Teachers and schools • In Europe, high-stakes assessment of students has been used to evaluate teachers • In the USA assessment of schools has been broadened to matter for students

  9. Education and the law • Key issues: • Precision in law-making • Constitution, Bill of Rights, • Availability of appeals and remedies • Litigation, ‘Grade court’ • Recovery of defendants’ costs • Possible in most of Europe • Not possible in most states in the USA

  10. Entry to higher education • Key issues • Selection • Placement • Combined in most European countries • Separate in the USA

  11. Origins of intelligence assessment • British empiricist tradition • Knowledge comes from experience (outside) • Tests of sensory acuity (Galton, Cattell) • Innate differences in acuity of individuals • Focus on measurement • Continental rationalist tradition • Knowledge comes from reasoning (inside) • Tests of reasoning (Binet) • All students share the same trajectory, at different speeds • Focus on classification

  12. The big test • Binet & Simon’s ideas brought to USA by Goddard • US army recruits 3m new soldiers in 1917 • Yerkes proposes testing for the ‘feebleminded’ • Terman proposes testing all recruits • Otis develops the multiple choice format • 1 726 966 recruits tested by January 1919 • No use made of results • But mass group testing is here to stay

  13. The development of the SAT • 1920: College Entrance Examinations Board sets up a commission: to investigate and report on general intelligence examinations and other new types of examinations offered in several secondary school subjects • After several more commissions … • Scholastic Aptitude Test administered to College Board applicants in 1926

  14. Technical issues in the SAT • Key issues: • Interpretability over time • All 82 versions of the SAT administered between April 1942 and May 1969 were equated to the original norm group taking the test in 1941 • Legal defensibility

  15. Reliablity • Consistency under changes in • occasion (test-retest) • scorer (mark-remark) • items (question-requestion) Europe USA      

  16. Speech acts • Perlocutionary speech acts are statements about what was, is or will be (eg Michael knows his number bonds to 10) • Illocutionary speech acts are performative: they create social facts (eg “I now pronounce you husband and wife”)

  17. Social facts Interviewer: Did you call them the way you saw them, or did you call them the way they were? Umpire: The way I called them was the way they were.

  18. Assessments as speech acts • Assessments in the US are treated as perlocutionary speech acts • Assessments in Europe are treated as illocutionary speech acts • That’s why there is no measurement error in Europe

  19. Item-response modelling • All test theories assume an item response model • Classical test theory assumes a flat line • Gutman scaling assumes a step function • All real items are somewhere between the two • US modellers assumed a logistic curve • Computationally tractable (if unidimensionality is also assumed) • Can be made very close to cumulative normal • Others question these assumptions • e.g. Goldstein (1979, 1980, 1982, 1989)

  20. Assessment formats • Debates about assessment formats are often disguised debates about constructs • Bias is a property of inferences, not tests • So, multiple-choice tests are not biased • Multiple-choice vs Constructed-response • CR items yield more information, but take longer • MC items yield more information per minute • Fewer items means more student-task effects • Correlations between MC and CR formats are high, but can change (eg NAEP) • Reliance on MC items has backwash effects

  21. Effect of assessment format

  22. Standard setting • Test-centred vs. examinee-centred • Key issue: do you set the cut-score before or after you see the results • Policy-oriented vs. evidence-oriented • Key issue: do you adjust the cut-score to fit the test, or adjust the test to fit the cut-score

  23. Standard setting

  24. Not invented here syndrome… • Constructivism • Standards-based assessment/Outcomes based assessment

  25. Standards-based assessment • What?! • Originally criteria for high-school diploma set locally • Introduction of state tests • In many (most?) cases state tests are not aligned to district curricula

  26. No Child Left Behind Act • Reauthorization of the Elementary and Secondary Education Act (ESEA) • Commanded bi-partisan support • Not a plot to declare all state schools failing • States must establish state standards • But are free to decide how to do this • Huge differences in standards • Students tested • Language and maths grades 3 to 8 and in high school • Science 3 times (in grades K-5, 6-8, 9-12)

  27. Key features of NCLB • All students to be ‘proficient’ by 2014 • Achievement rather than growth • States determine intermediate steps to this goal • Some states opt for steady progress • Others go for ‘Balloon payments’ • Each year, each school must make adequate yearly progress to this goal • Cohort based • Disaggregation of key groups • Students with special needs • Ethnic minorities • Language learners • Failure to achieve AYP has profound impact

  28. Exit from higher education • Key issues • Qualification • Licensure • Combined in most European countries • Separate in the USA

  29. The mangle of practice • Andrew Pickering (1995) • Critique of traditional views of science • Science is what scientists do • Science as a series of truths waiting to be found • The development of traditions of assessment are not just bound up in culture • They are the result of messy, contingent, fragile, politically and personally influenced events

  30. In summary • Viewed from outside, any national assessment system seems to work in practice, but not in theory • Assessment systems are much smarter than they appear… • …and are exquisitely attuned to the constraints and affordances provided by the contexts in which they operate. • We can learn from them, but we cannot import them

More Related