250 likes | 267 Views
This presentation discusses the distortions that can occur when using traditional vertical scales to measure growth and offers alternative methods to accurately represent student achievement and educator effectiveness.
E N D
Un-distorting Measures of Growth: Alternatives to Traditional Vertical Scales Presentation on June 19, 2005 to 25th Annual CCSSO Conference on Large-Scale Assessment By Joseph A. Martineau, Psychometrician Office of Educational Assessment & Accountability (OEAA) Michigan Department of Education (MDE)
Introduction • Measurement of growth or “progress” • Growth models • Measurement of educators’ contributions to student growth or progress • Value Added Models (VAM) • Both require vertical scales that • Measure the “same thing” along the entire scale • Have the same meaning along the entire scale
Distortions in studies of growth • Using traditional vertical scales to measure growth can result in the following distortions: • Identification of growth trajectories with little resemblance to true growth trajectories • Attribution of effects on growth to effects on initial status and vice versa • Identification of false effects on initial status or growth • Failure to detect true effects on initial status or growth • Identification of effective interventions as harmful and vice versa
Graphical demonstration of one kind of distortion in growth models Grade 5 scale mostly measures differences in number sense Grade 6 scale mostly measures differences in algebra
Graphical demonstration of one kind of distortion in growth models Vertically “equated, unidimensional” scales have to bend to accommodate both the grade-5 and grade-6 content mixes This can come out as fitting a unidimen-sional model if number sense and algebra scores are strongly correlated, but strong correlations do not alleviate distortions in measures of growth
Graphical demonstration of one kind of distortion in growth models Any given student’s true achievement may not lie near the vertical scale, so the vertical scale may be incapable of accurately representing student achievement
Graphical demonstration of one kind of distortion in growth models Therefore, the true multidimensional achievement of a student becomes projected onto the “unidimensional” vertical scale
Graphical demonstration of one kind of distortion in growth models The nearest point on the “unidimensional” vertical scale is the most likely estimate of “unidimensional” student ability
Graphical demonstration of one kind of distortion in growth models The true measure of growth and the “unidimensional” measure of growth are remarkably different The distortion can be overestimation of growth (as shown here) or under-estimation of growth This can have remarkable effects on studies of growth
Distortions in studies of value added • Using traditional vertical scales to measure educators contributions to student growth can result in the following distortions: • Mis-estimation of educator effectiveness simply because educators serve students whose growth is occurring outside the range measured well by the test • Attribution of prior educators’ effectiveness to later educators • One promise of value added is to cease to hold educators accountable for prior experiences of students • This distortion betrays that promise
Graphical demonstration of one kind of distortion in value added models Grade 5 scale mostly measures differences in number sense Grade 6 scale mostly measures differences in algebra Scale has to “bend” to accommodate both tests’ content
Graphical demonstration of one kind of distortion in value added models True average statewide scores are likely to lie close to (but not on) the vertical scale
Graphical demonstration of one kind of distortion in value added models Individual school (or teacher) average true scores are likely to lie farther off the vertical scale than statewide averages Individual school (or teacher) average true scores are likely to be quite different than the statewide averages
Graphical demonstration of one kind of distortion in value added models In this carefully chosen scenario, both the statewide averages and the average scores of a given school project onto the vertical scale at exactly the same place
Graphical demonstration of one kind of distortion in value added models Even though statewide and school averages are very different in two dimen-sions, they are estimated to be identical on the “unidimensional” score scale.
Graphical demonstration of one kind of distortion in value added models The average state growth is overestimated, the average school-X growth is underestimated, such that both are equal In a vertical-scale-based value added model, this exceptionally effective school would be identified as average Overestimation of individual school effectiveness can also result from the distortions
Graphical Demonstration • Table 1 on page 13 of the document • Interpretation • Effect size of 0.00 is equivalent to 1 part truth, no parts distortion • Effect size of 0.25 is equivalent to 4 parts truth, 1 part distortion • Effect size of 1.00 is equivalent to the results of VAM being 1 part truth, 1 part distortion. • Effect size of 2.00 is equivalent to 1 part truth, 2 parts distortion
Alternatives to TraditionalVertical Scales • Given that using vertical scales in growth-based statistical models results in distorted outcomes, where do we go from here? • Michigan has investigated several alternatives • Vertically Moderated Standard Setting • Domain-Referenced Measurement of Growth • Link only adjacent grades • Provided stronger out-of-level content representation as vertical linking items • Matrix sampling • Large number of forms • All of these are important to do, but are insufficient to resolve the distortions arising from using vertical scales in growth-based models
Alternatives to TraditionalVertical Scales • Michigan is investigating other alternatives • Additional testing • Fall and Spring • More than twice per year • Eliminates summer loss/gain problem • Completely eliminates distortions!
Alternatives to TraditionalVertical Scales • Michigan is investigating other alternatives • Additional testing • Fall and Spring • More than twice per year • Eliminates summer loss/gain problem • Completely eliminates distortions! • Yeah, whatever!
Alternatives to TraditionalVertical Scales • Michigan is investigating other alternatives • Supplement grade-level content with substantial quantities of out-of-level items • Items like those on lower grade-level tests • Items like those on higher grade-level tests • Could be done either by P&P or CBT • Implementing with CAT • Would require little additional testing because out-of-level items could inform the stopping rules • May not work with NCLB
Alternatives to TraditionalVertical Scales • Michigan is investigating other alternatives • Supplement grade-level content with substantial quantities of out-of-level items • Provides for less precise estimates of growth, but they should at least be undistorted • Administer items like those on lower and/or higher grade-level tests • Could be done either by P&P or CBT • Implementing with CAT • Would require little additional testing because out-of-level items could inform the stopping rules • May not work for NCLB because of on-grade-level requirements
Alternatives to TraditionalVertical Scales • Michigan is investigating other alternatives • More complex psychometric models • Without changing the administration model, the only way to address the distortions is to change the psychometric model • The psychometric model needs to acknowledge and exploit the multidimensional complexity of item response data • Multidimensional models can be a liability as well • Public relations (complexity of the model) • Possibility for error (complexity of the model) • Turnaround time (intensity of the analysis) • This area is promising as well as challenging
Conclusion • Growth-based statistical models using vertically scaled student achievement data are much further along than they were several years ago • Growth-based statistical models using vertically scaled student achievement data are still not robust enough to support high-stakes use • Either the test administration model or the psychometric model needs to reflect the complexity of the intended analyses • No existing methods have been proven to allow for high-stakes use of growth-based statistical models, including Value Added Models
Contact Information Joseph Martineau, Psychometrician Office of Educational Assessment & Accountability Michigan Department of Education P.O. Box 30008 Lansing, MI 48909 (517) 241-4710 martineauj@michigan.gov