1 / 49

It's a myth: High stakes cause test score inflation

The myth is popular among education insiders who oppose high-stakes or externally mandated tests, but is based on just two studies conducted without controls that employed an obscure definition of "high stakes". Both studies actually involved low-stakes tests administered without security protocols.<br><br>Harms caused by belief in the myth include: diverting attention from a widespread problem (at least in the US) of lax security in standardized test administration; encouraging ineffective and detrimental test preparation procedures (e.g., excessive drilling on format, learning “tricks” based on format in lieu of learning subject matter) and supporting an exploitive, predatory test preparation industry; encouraging teachers to teach to “a broader domain” (“away from the test”) – content different from the publicly mandated standards they are legally and ethically obligated to teach; encouraging numerous “wild goose chase” research studies using an unreliable low-stakes test score trend to “audit” a high-stakes test score trend; repeated declarations that a past (and contradictory) research literature does not exist; and justifying the use of value-added measures, calculated from student low-stakes test score trends, to judge teacher performance

Download Presentation

It's a myth: High stakes cause test score inflation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. It’s a myth: High stakes cause test score inflation Richard P. Phelps researchED 2017 National Conference 9 September, 2017 Chobham Academy, London, UK

  2. Educational testing in the US: early 1980s researchED, London High stakes & test score inflation 9 September, 2017

  3. Educational testing in the US: 1980s Student testing with stakes reintroduced late 1970s, early 1980s Debra P. v. Turlington “Truth in testing” laws researchED, London High stakes & test score inflation 9 September, 2017

  4. Residency in rural, poor Appalachia, 1980s Surprised by claims that state and school district scored “above average” on national tests Investigated, all US states claimed to be “above average” John J. Cannell, M.D. researchED, London High stakes & test score inflation 9 September, 2017

  5. “Welcome to Lake Wobegon, where all the women are strong, all the men are good-looking, and all the children are above average.” - Garrison Keillor, A Prairie Home Companion researchED, London High stakes & test score inflation 9 September, 2017

  6. Cannell’s suspects • Lax security • Outdated or invalid norms • Deliberate educator manipulation (i.e., cheating) researchED, London High stakes & test score inflation 9 September, 2017

  7. US Education Establishment Responds researchED, London High stakes & test score inflation 9 September, 2017

  8. “While supporting Cannell’s general finding … our analyses lead us to conclusions that are different, and certainly less sensational, than the ones he reached.” — Linn, Graue, Sanders , CRESST, 1990 “There are many reasons for the Lake Wobegon Effect, most of which are less sinister than those emphasized by Cannell.” — Linn, CRESST, 2000 researchED, London High stakes & test score inflation 9 September, 2017

  9. CRESST’s Lake Wobegon suspects Outdated or invalid norms High stakes, that induce “teaching to the test” (i.e., test coaching) under pressure researchED, London High stakes & test score inflation 9 September, 2017

  10. “We know that tests that are used for accountability tend to be taught to in ways that produce inflated scores.” — Daniel Koretz, CRESST, 1992 “Corruption of indicators is a continuing problem where tests are used for accountability or other high-stakes purposes.” — Robert Linn, CRESST, 2000 researchED, London High stakes & test score inflation 9 September, 2017

  11. CRESST counters Cannell’s Lake Wobegon study with their own, 1991 Students took test a few years. Scores rose. Then took “competing test” district had used before. Scores fell. researchED, London High stakes & test score inflation 9 September, 2017

  12. CRESST 1991 “Generalization” Study Unnamed school district Unnamed tests Neither replicable nor falsifiable A conference presentation; not peer-reviewed. researchED, London High stakes & test score inflation 9 September, 2017

  13. CRESST 1991 “Generalization” Study • 3 tests in the study • Annual NRT • Parallel form • A “competing” NRT researchED, London High stakes & test score inflation 9 September, 2017

  14. 1991 CRESST “Generalization” Study researchED, London High stakes & test score inflation 9 September, 2017

  15. 1991 CRESST “Generalization” Study School district test was only “perceived to be high stakes” researchED, London High stakes & test score inflation 9 September, 2017

  16. 1991 CRESST “Generalization” Study Study’s assumptions 1. Publication of aggregate results = “high stakes” 2. “Competing” NRTs should get same results 3. “Test coaching” improves scores 4. Low-stakes test scores are reliable and can be used to benchmark unreliable high stakes scores researchED, London High stakes & test score inflation 9 September, 2017

  17. Jim Popham “high stakes” definition 1987 1. Publication of aggregate results = high stakes? ... Such tests include the many statewide achievement tests whose results are reported by local newspapers on a school-by-school or district-by-district basis.” researchED, London High stakes & test score inflation 9 September, 2017

  18. Jim Popham “high stakes” definition 1992 1. Publication of aggregate results = high stakes? A test “subject to legal scrutiny.” Tests such as those used “for employment, licensure, or a high school graduation requirement” researchED, London High stakes & test score inflation 9 September, 2017

  19. 1. Publication of aggregate results = high stakes? Standards for Educational and Psychological Testing “High-stakes test. A test used to provide results that have important, direct consequences for examinees, programs, or institutions involved in the testing.” (p.176) “Low-stakes test. A test used to provide results that have only minor or indirect consequences for examinees, programs, or institutions involved in the testing.” (p.178) researchED, London High stakes & test score inflation 9 September, 2017

  20. 1. Publication of aggregate results = high stakes? “...tests taken to obtain admission to an educational program or taken during and at the conclusion of a program to obtain a qualification.” “…high-stakes decisions, such as whether a student will move on to the next grade level or receive a diploma.” researchED, London High stakes & test score inflation 9 September, 2017

  21. 1. Publication of aggregate results = high stakes? A high-stakes test is a test with important consequences for the test taker. Passing has important benefits, such as a high school diploma, a scholarship, or a license to practice a profession. Wikipedia researchED, London High stakes & test score inflation 9 September, 2017

  22. 2. Research: Comparability of different tests Comparable ? Not Comparable NRTs Freeman, Kuhs, Porter, Floden, Schmidt, Schwille (1983);Debra P. v. Turlington (1984); Cohen, Spillane (1993); La Marca, Redfield, Winter, Bailey, and Despriet (2000); Wainer (2011) Standards Archbald (1994); Buckendahl, Plake, Impara, Irwin (2000); Bhola, Impara, Buckendahl (2003); Phelps (2005) CRTs Massell, Kirst, Hoppe (1997); Wiley, Hembry, Buckendahl, Forte,Towles Nebelsick-Gullett (2015) researchED, London High stakes & test score inflation 9 September, 2017

  23. 3. Research: Effects of test coaching It works Significant score increase from learning format tricks Aldeman & Powers (1980) Samson (1985) Scruggs (1985) Roznowski & Bassett (1992) McMann (1994) Holmes, Keffer (1995) Camel & Chung (2002) Filizola (2008) It doesn’t work Negligible score increase Messick & Jungeblut (1981) Ellis, Konoske, Wulfeck, & Montague (1982) DerSimonian and Laird (1983) Kulik, Bangert-Drowns & Kulik (1984) Fraker (1986/1987) Halpin (1987) Whitla (1988) Snedecor (1989) Becker (1990) Smyth (1990) Moore (1991) Alderson & Wall (1992) Powers (1993) Powers & Rock (1994) Scholes, Lane (1997) Allalouf & Ben Shakhar (1998) Robb & Ercanbrack (1999) McClain (1999) Camara (1999, 2001, 2008) Stone & Lane (2000, 2003) Din & Soldan (2001) Briggs (2001) Palmer (2002) Briggs & Hansen (2004) Cankoy & Ali Tut (2005) Crocker (2005) Allensworth, Correa, & Ponisciak (2008) Domingue & Briggs (2009) researchED, London High stakes & test score inflation 9 September, 2017

  24. 4. Research: Low-stakes test reliability Not reliable student effort varies; scores easy to manipulate Rothe (1947); Jennings (1953); Uguroglu, Walberg (1979); Taylor & White (1981); Arvey, et al. (1990); Schmit, Ryan (1992); Brown & Walberg (1993); Kim, McLean (1995), Wolf, Smith (1995), Wolf, Smith, DiPaulo (1996); Schiel (1996); Sundre (1999), Sundre, Moore (2002), Sundre, Wise (2003); DeMars (2000), Wise (2006ª, 2006b), Wise, DeMars (2005, 2005, 2006, 2010), Wise, et al., (2009); Hoyt (2001); Eklof (2006, 2007, 2010); ….....etc. Reliable “no incentive to manipulate scores” Kipliinger, Linn (1992) O’Neil, Sugre, Baker (1995) * Hout, Elliot (2011) * 1 of 2 groups researchED, London High stakes & test score inflation 9 September, 2017

  25. 4. Research: Low-stakes test reliability “…for consequential exams, the average score on the motivation scale was quite high with a low standard deviation. Essentially, most of the students were displaying uniformly high levels of motivation (i.e., ceiling effect). However, for the nonconsequential groups, motivation played an important role in predicting test performance. The overall motivation scores for the no consequence groups were lower than the motivation for the consequential groups, with much greater variability.” —Cole, Bergin, Whittaker (2008), p. 612 researchED, London High stakes & test score inflation 9 September, 2017

  26. More left-out-variable bias CRESST’s Linn (2000) cites higher gains on a federal anti-poverty program’s pre-post testing over 9 months than over 12 as evidence of inflation researchED, London High stakes & test score inflation 9 September, 2017

  27. Cannell found score inflation in elementary school tests in dozens of states – none of those tests had high stakes. Cannell also found score inflation in secondary school tests in dozens of states – only one had high stakes. researchED, London High stakes & test score inflation 9 September, 2017

  28. Test Security in South Carolina: Cannel’s score-inflated test “Unlike their other two tests, … teachers are allowed to look at test booklets, … teachers may obtain test booklets before the day of testing, … booklets are not sealed, and … testing is not routinely monitored by state officials. … Outside test proctors are not used, … test questions have not been rotated every year, and … answer sheets have not been scanned for suspicious erasures or analyzed for cluster variance. … There are no state regulations that govern test security and test administration for norm-referenced testing done independently in the local school districts.” researchED, London High stakes & test score inflation 9 September, 2017

  29. Test Security in South Carolina: Tests not in Cannell’s study “South Carolina also administers a graduation exam and a criterion referenced test, both of which have significant security measures. … Teachers are not allowed to look at either of these two test booklets, … teachers may not obtain booklets before the day of testing, … the graduation test booklets are sealed, … testing is routinely monitored by state officials, … special education students are generally included in all tests, … outside test proctors administer the graduation exam, and … most test questions are rotated every year on the criterion referenced test.” researchED, London High stakes & test score inflation 9 September, 2017

  30. Cannell’s test categorizations confirmed researchED, London High stakes & test score inflation 9 September, 2017

  31. Confusions from misinformation • Tests sample from larger domains • Campbell’s Law • “Teaching to the test” & “Narrowing the curriculum” • Incentives and causes • Educators face many incentives; “high stakes” only one • No one wants to be responsible for test security researchED, London High stakes & test score inflation 9 September, 2017

  32. 1. Tests only sample larger domains "Tests are about making a measurement, and generally, tests are trying to measure something huge." — Daniel Koretz TRUE of many tests, e.g., NRTs, aptitude, IQ tests NOT TRUE of well-done standards-based tests researchED, London High stakes & test score inflation 9 September, 2017

  33. 2. Campbell’s Law — a truism "The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor." • Social indicators can be beneficial: • for understanding • monitor progress • benchmarking • setting goals • process improvements researchED, London High stakes & test score inflation 9 September, 2017

  34. 3. Teaching the test; Narrowing the curriculum researchED, London High stakes & test score inflation 9 September, 2017

  35. 4. Incentives and causes • Question: • Do high stakes present an incentive to cheat on tests? • Answer: • Of course they do researchED, London High stakes & test score inflation 9 September, 2017

  36. 5. Educators face many incentives Incentives of test “stakes” is just one researchED, London High stakes & test score inflation 9 September, 2017

  37. 6. No one inside education wishes to be responsible for test security … including test development firms. researchED, London High stakes & test score inflation 9 September, 2017

  38. Large-scale test, tight security researchED, London High stakes & test score inflation 9 September, 2017

  39. Large-scale test, lax security researchED, London High stakes & test score inflation 9 September, 2017

  40. Harms of disinformation • 1. Acceptance of low standard for research as valid • 2. Unfairly discredits useful evaluation tool • 3. Test security (in U.S.) remains shoddy • 4. Teachers given mixed messages • 5. Now spreading worldwide • 6. Corruption of Test Standards barely averted researchED, London High stakes & test score inflation 9 September, 2017

  41. 1. Acceptance of very low quality standard for popular research results Koretz, Linn studies: - no controls - secret test - secret location - secret definitions Non-replicable, Non-falsifiable Contrary evidence suppressed, sometimes even declared nonexistent, and wistleblowers discredited researchED, London High stakes & test score inflation 9 September, 2017

  42. 2. Uniquely useful evalution tool is discredited …and, in the US, the only objective measure available to the public (i.e., not under the control of insiders). researchED, London High stakes & test score inflation 9 September, 2017

  43. 3. Test security (in U.S.) remains shoddy ACT & SAT now administered statewide by schools. ACT, SAT save money, hassle, gain customers by outsourcing test security. researchED, London High stakes & test score inflation 9 September, 2017

  44. 4. Teachers given mixed messages “Teaching to the test” is unethical; Don’t do it! Teach content beyond the standards. “Teaching to the test works! You and your students will be better off if you do it! researchED, London High stakes & test score inflation 9 September, 2017

  45. 5. Standards corruption barely averted researchED, London High stakes & test score inflation 9 September, 2017

  46. 6. Disinformation spreading worldwide researchED, London High stakes & test score inflation 9 September, 2017

  47. Artificial test score gains (score inflation) are caused by neglect, incompetence, or deliberate educator manipulation, but always require means and opportunity. • Motive alone is not sufficient if test security is tight. • Means and opportunity exist only in the absence of security measures and item rotation. researchED, London High stakes & test score inflation 9 September, 2017

  48. Lessons Learned US education: Research quality standards extremely low for popular results; impossibly high for unpopular results If terms can be defined arbitrarily, and not specified, any research result is possible. Cleverly-disguised falsehoods and obfuscation can be well-rewarded in US education schools (e.g., with endowed professorships at Harvard and Stanford). researchED, London High stakes & test score inflation 9 September, 2017

  49. http://nonpartisaneducation.org/Review/Articles/v6n3.htm richard@nonpartisaneducation.org researchED, London High stakes & test score inflation 9 September, 2017

More Related