270 likes | 285 Views
It ’ s a myth: High stakes cause test score inflation. Richard P. Phelps International Test Commission 11th Conference, July 4, 2018 Montréal, Canada. Educational testing in the US: early 1980s. Educational testing in the US: 1980s.
E N D
It’s a myth: High stakes cause test score inflation Richard P. Phelps International Test Commission 11th Conference, July 4, 2018 Montréal, Canada
Educational testing in the US: early 1980s International Test Commission, 11th Conference, Montreal, Canada
Educational testing in the US: 1980s Student testing with stakes reintroduced: late 1970s, early 1980s Debra P. v. Turlington “Truth in testing” laws International Test Commission, 11th Conference, Montreal, Canada
Residency in rural, poor Appalachia, 1980s Surprised by claims that state and school district scored “above average” on national tests Investigated, all US states claimed to be “above average” John J. Cannell, M.D. International Test Commission, 11th Conference, Montreal, Canada
“Welcome to Lake Wobegon, where all the women are strong, all the men are good-looking, and all the children are above average.” - Garrison Keillor, A Prairie Home Companion International Test Commission, 11th Conference, Montreal, Canada
Lax security Outdated or invalid norms Deliberate educator manipulation (i.e., cheating) Cannell’s suspects International Test Commission, 11th Conference, Montreal, Canada
“While supporting Cannell’s general finding … our analyses lead us to conclusions that are different, and certainly less sensational, than the ones he reached.” — Linn, Graue, Sanders , CRESST, 1990 “There are many reasons for the Lake Wobegon Effect, most of which are less sinister than those emphasized by Cannell.” — Linn, CRESST, 2000 International Test Commission, 11th Conference, Montreal, Canada
Outdated or invalid norms High stakes, that induce “teaching to the test” (i.e., test coaching) under pressure CRESST’s Lake Wobegon suspects International Test Commission, 11th Conference, Montreal, Canada
CRESST counters Cannell’s Lake Wobegon study with their own, 1991 Students took test a few years. Scores rose. Then took “competing test” district had used before. Scores fell. International Test Commission, 11th Conference, Montreal, Canada
CRESST 1991 “Generalization” Study • 3 tests in the study • Annual NRT • Parallel form • A “competing” NRT International Test Commission, 11th Conference, Montreal, Canada
CRESST 1991 “Generalization” Study Unnamed school district Unnamed tests Neither replicable nor falsifiable A conference presentation; not peer-reviewed Called an “experiment”, but no controls for test security or other factors. International Test Commission, 11th Conference, Montreal, Canada
1991 CRESST “Generalization” Study Study’s assumptions 1. Publication of aggregate results = “high stakes” 2. “Competing” NRTs should get same results 3. “Test coaching” improves scores 4. Low-stakes test scores are reliable and can be used to benchmark unreliable high stakes scores 5. High-stakes cause test-score inflation? International Test Commission, 11th Conference, Montreal, Canada
Jim Popham “high stakes” definition 1987 1. Publication of aggregate results = high stakes? “... Such tests include the many statewide achievement tests whose results are reported by local newspapers on a school-by-school or district-by-district basis.” International Test Commission, 11th Conference, Montreal, Canada
2. Research: Comparability of different tests Scores Comparable ? Scores Not Comparable NRTs Freeman, Kuhs, Porter, Floden, Schmidt, Schwille (1983);Debra P. v. Turlington (1984); Cohen, Spillane (1993); La Marca, Redfield, Winter, Bailey, and Despriet (2000); Wainer (2011) Standards Archbald (1994); Buckendahl, Plake, Impara, Irwin (2000); Bhola, Impara, Buckendahl (2003); Phelps (2005) CRTs Massell, Kirst, Hoppe (1997); Wiley, Hembry, Buckendahl, Forte,Towles Nebelsick-Gullett (2015) International Test Commission, 11th Conference, Montreal, Canada
3. Research: Effects of test coaching It works Significant score increase from learning format tricks Aldeman & Powers (1980) Samson (1985) Scruggs (1985) Roznowski & Bassett (1992) McMann (1994) Holmes, Keffer (1995) Camel & Chung (2002) Filizola (2008) It doesn’t work Negligible score increase Messick & Jungeblut (1981) Ellis, Konoske, Wulfeck, & Montague (1982) DerSimonian and Laird (1983) Kulik, Bangert-Drowns & Kulik (1984) Fraker (1986/1987) Halpin (1987) Whitla (1988) Snedecor (1989) Becker (1990) Smyth (1990) Moore (1991) Alderson & Wall (1992) Powers (1993) Powers & Rock (1994) Scholes, Lane (1997) Allalouf & Ben Shakhar (1998) Robb & Ercanbrack (1999) McClain (1999) Camara (1999, 2001, 2008) Stone & Lane (2000, 2003) Din & Soldan (2001) Briggs (2001) Palmer (2002) Briggs & Hansen (2004) Cankoy & Ali Tut (2005) Crocker (2005) Allensworth, Correa, & Ponisciak (2008) Domingue & Briggs (2009) International Test Commission, 11th Conference, Montreal, Canada
4. Research: Low-stakes test reliability Not reliable student effort varies; scores easy to manipulate Rothe (1947); Jennings (1953); Uguroglu, Walberg (1979); Taylor & White (1981); Arvey, et al. (1990); Schmit, Ryan (1992); Brown & Walberg (1993); Kim, McLean (1995), Wolf, Smith (1995), Wolf, Smith, DiPaulo (1996); Schiel (1996); Sundre (1999), Sundre, Moore (2002), Sundre, Wise (2003); DeMars (2000), Wise (2006ª, 2006b), Wise, DeMars (2005, 2005, 2006, 2010), Wise, et al., (2009); Hoyt (2001); Eklof (2006, 2007, 2010); List, Livingston, Neckerman (2016) ….....etc. Reliable “no incentive to manipulate scores” Kipliinger, Linn (1992) O’Neil, Sugre, Baker (1995) * Hout, Elliot (2011) * 1 of 2 groups International Test Commission, 11th Conference, Montreal, Canada
5. High stakes cause test score inflation? Then, why no score inflation with certification and licensure tests? International Test Commission, 11th Conference, Montreal, Canada
Large-scale internally-administered test, tight security International Test Commission, 11th Conference, Montreal, Canada
Large-scale internally-administered test, lax security International Test Commission, 11th Conference, Montreal, Canada
Cannell found score inflation in elementary school tests in dozens of states – none of those tests had high stakes. Cannell also found score inflation in secondary school tests in dozens of states – only one had high stakes. Test Score Inflation Occurs where Security is Lax International Test Commission, 11th Conference, Montreal, Canada
Harms of misinformation • 1. Unfairly discredits useful evaluation tool • 2. Test security (in U.S.) remains shoddy • 3. Teachers given mixed messages • 4. Now spreading worldwide International Test Commission, 11th Conference, Montreal, Canada
1. Uniquely useful evalution tool is discredited …and, in the US, the only objective measure available to the public (i.e., not under the control of insiders). International Test Commission, 11th Conference, Montreal, Canada
2. Test security (in U.S.) remains shoddy ACT, SAT, PARCC, SBAC now administered statewide by schools, on varying dates. Tests save money, hassle, gain customers by outsourcing (or, ignoring) test security. International Test Commission, 11th Conference, Montreal, Canada
3. Teachers given mixed messages “Teaching to the test” is unethical; Don’t do it! Teach content beyond the standards. “Teaching to the test works! You and your students will be better off if you do it! International Test Commission, 11th Conference, Montreal, Canada
4. Misinformation spreading worldwide International Test Commission, 11th Conference, Montreal, Canada
Cover-up successful; most believe CRESST’s version Cannell’s work was an opportunity to fix a large problem US education chose to deny, confuse, and cover up. This unfortunate tendency blocks genuine progress.
http://nonpartisaneducation.org/Review/Articles/v6n3.htm richard@nonpartisaneducation.org International Test Commission, 11th Conference, Montreal, Canada