410 likes | 512 Views
NCLB and Growth Models: In Conflict or in Concert?. Susan L. Rigney, United States Department of Education Joseph A. Martineau, Michigan Department of Education Presented at the MARCES conference on Longitudinal Modeling of Student Achievement College Park, MD November 7, 2005. Introduction.
E N D
NCLB and Growth Models: In Conflict or in Concert? Susan L. Rigney, United States Department of Education Joseph A. Martineau, Michigan Department of Education Presented at the MARCES conference on Longitudinal Modeling of Student Achievement College Park, MD November 7, 2005
Introduction “In response to your concerns about giving schools credit for improving student achievement, we are also considering the idea of a growth model…” Margaret Spellings 9/13/05
Author Perspectives • Sue Rigney • Education Specialist in the office of Student Assessment and School Accountability (Title I) at the U. S. Department of Education. • Primary responsibility = monitoring state compliance with the standards, assessment and accountability requirements of NCLB • Secondary responsibility = contributing to ongoing discussion, clarification and implementation of policies related to assessment and accountability.
Author Perspectives • Joseph Martineau • Psychometrician for the Michigan Office of Educational Assessment and Accountability. • Primary concerns = congruence of accountability systems with values of educational research & adequacy of statistical & psychometric methodology • His secondary concerns = philosophy and policy of accountability in terms of both practicality and feasibility • Authorship should not be construed as an endorsement of NCLB as a whole.
In conflict? CRS says • Substantial interest…in the possible use of individual/cohort growth models… Such AYP models are not consistent with certain statutory provisions of NCLB as currently interpreted by USED But, NCLB (Sec 4) says • The Secretary shall take such steps as are necessary to provide for the orderly transition to, and implementation of, programs authorized by this Act
In concert? • USED Growth Model Study Group • IES grant for longitudinal data systems • State Accountability Workbook Amendments
Types of Models • Definitions developed by a State collaborative through CCSSO (Goldschmidt et al, 2005) • Definitions • Cross-sectional models • Status Models • Improvement Models • Longitudinal Models • Growth Models • Residual Growth (RG) Models • Commonly labeled “Value Added” Models • Why we use the term RG
The Intersection of Policy and Growth Models • 3-8 Assessments Provide Longitudinal Data • Safe Harbor • Use of Improvement Index in AYP • CCSSO SCASS Activities • USED Assistant Secretary Luce
Systemic Coherence:A Standard for Evaluating Models • Three broad principles of systemic coherence • Models are consistent with policy goals • Models are integrated as a part of a consistent system of content standards, assessments, performance standards, and accountability criteria • Models are implemented in a manner consistent with the values of educational research
1. Standards-based • Assessments must cover depth and breadth • Results expressed in terms of performance levels • % Proficient is most influential component of AYP
2. All Students • Participate (95% rule) • Results reported for all • AYP = Not all Visible • Full Academic Year • Minimum n • LEP exemption for ELA test • Held to same standards • Alternate based on alternate achievement standards
3. School Improvement • Annual Measurable Objectives • Increased in 2004-05 • Adjustment for transition in 2005-06 • School accountable for subgroups • More visible in 2005-06 • Consequences • Can/should growth moderate consequences?
Consistency of Content Standards, Assessments, Performance Standards, and Accountability Criteria • Accountability based on academic indicators • Peer Review of State Assessment Systems • Alignment • Performance descriptors • Alternate assessments
Coherent Assessment System State assessments • Rational, coherent design • Relative contribution of different tests • Matrix forms equivalent • Comparability • English vs Spanish • Computer vs paper & pencil Local assessments • Aligned, equivalent, comparable results for subgroups, aggregable
Results understandable • Educators know what to do • Articulation across grades • Articulation across performance levels • A “progression matrix” that show • Proficient is different from basic because… • Proficient in third grade is different form proficient in fourth grade because… • Administrators know how to allocate resources
Consistency with Values of Educational Research • As defined by Gregory N. Derry1. • Free flow of information & Curiosity • Replicability • Thorough peer review • Improvement • Honesty and Open-mindedness • Willingness to consider multiple alternatives • Scrupulous investigations of weaknesses • Flexibility to adopt feasible improvements 1 Professor of Physics at Loyola University and author of What Science Is and How It Works (Princeton University Press, 1999)
Attributes of Systemic Coherence Applicable in this Context • Alignment of standards and assessments • The same performance standards for all • Inclusion of all student groups • Explicit tracking of achievement gaps • Appropriate statistical and psychometric models • A program of ongoing research • Consistency of reports with all other attributes
1. Alignment of Standards and Assessments • Foundation of validity of school accountability decisions • USED expects independent verification of • Full range of content standards? • Address content and process skills? • Same degree and pattern of emphasis? • Scores reflect full range of achievement? • Procedures to maintain/improve?
Alignment methods • Alignment Methodology • Webb (SCASS TILSA) • Porter (SCASS SEC) • Achieve • Buros • Methods do not address articulation across grades • JM: Current instantiations of “independent review” may underestimate alignment
2. The Same Standards for All Students • Grade-level achievement standards • Except for students with most significant cognitive disabilities (1%) • All students proficient by 2013-14 • What about growth toward proficient? • What about length of time in system? • Proposals to balance fairness toward both educators and student groups should also be a part of any plan to implement growth models for accountability purposes. Fairness toward one should not be sacrificed for fairness toward the other.
2. The Same Standards for All Students • JM: The NCLB expectation that all students will be proficient by a given date seems unreasonable. The recognition that there will always be individual differences among students (and aggregate differences across schools in their intake populations) should also be incorporated in setting policy targets. • SR: Safe harbor recognizes that adequate yearly progress may be met with less than 100% meeting annual and long-range goals. • JM: The safe harbor provision of NCLB is a good beginning, but does not fully account for these realities.
2. The Same Standards for All Students • JM: The punitive nature of NCLB consequences can actually undermine policy objectives by adding turbulence to schools serving low-achieving students. • SR: The pressures of accountability have resulted in remarkable successes (Ed Trust), and there are multiple safeguards to prevent Type I error. • JM: The multiple safeguards are an important starts, but policies encouraging more assistance in and attraction of highly effective educators to low-achieving schools is more likely to support the policy objectives. • SR: NCLB funds are available for recruitment and retention bonuses, and data indicate that states are beginning to use these funds in this way.
Implications for growth model • Expectation of same growth for all maintains achievement gap • Expectation of 12 months growth in 1 year maintains achievement gap • Expectation of normative growth maintains achievement gap
3. Inclusion of All Student Groups • Missing data means missing students • How many missing students does it take to compromise validity? • Robustness to missing data does not imply that it is OK to leave out data where it can reasonably be obtained
4. Explicitly Tracking Achievement Gaps • Closing the achievement gap is a… • Policy objective • Matter of ethics • Attainable • Tracking the achievement gap makes inequities publicly visible
4. Explicitly Tracking Achievement Gaps, continued… • Separate models from those used to track attainment of growth targets • Include in the model variables defining policy-defined subgroups • Interaction of grade with subgroup variables • Simple graphical representation of the results
5. Appropriate Statistical and Psychometric Models • Statistical concerns • Match of model to data structure • Violations of assumption • Do random effects models “cheat?” • How do we integrate results from alternate assessments? • What is the sample, and what is the population? • Different models needed for different purposes • Meeting growth targets • Tracking achievement gaps • Primary research
5. Appropriate Statistical and Psychometric Models • Statistical concerns • Are the models correlational or causal? The mandated data collection is correlations. • JM: The mandated policy uses are more causal. The descriptive statistics are used to label schools as in need of improvement, and if students are not achieving reasonable goals, it is hard to argue with this label. However, the distinction between schools in need of improvement and ineffective educators is unlikely to be either fathomed or appreciated by many people. The nature of NCLB consequences invites this unfounded interpretation. • SR: The statute provides substantial resources for professional development and instructional materials in order to help educators meet the extraordinary needs of the children they serve.
5. Appropriate Statistical and Psychometric Models, continued… • Unwarranted assumptions • No equating error • Vertical – Doran (2005) • Horizontal – not studied, but most assessments only have a few anchor items in common across years • Interval level scale • If using scale scores, most models assume equal interval measurement • Psychometrically suspect • Effects not well studied
5. Appropriate Statistical and Psychometric Models, continued… • Unwarranted assumptions, continued… • A single continuous scale on the same construct across grades (vertical or developmental scales) • Mathematical demonstrations (Martineau, 2004, in press) • We purposely build content shift into our assessments across grades • High correlations among sub-constructs do not take care of the problem • Students where growth is occurring outside the curriculum-defined range for the grade are not measured well • Effects of prior schools/grades become attributed to later schools/grades • Practically significant effects of the misattributions occur in all reasonably conceivable assessment scenarios • Empirical validation (Lockwood et al, under peer review) • Subscales of math assessment, greater variability within teacher across subscales than across teachers within subscale. • Low correlations in “value added” across subscales • The sub-content matters tremendously
5. Appropriate Statistical and Psychometric Models, continued… • Unwarranted assumptions, continued… • We need to account for equating error • We need to study the effects of the interval-level measurement assumption and either • Validate the assumption, or • Not make the assumption • We need to either • Develop psychometric models that can account for change in content across grades, or • Not assume the same content across grades • Analytical models that avoid scale assumptions • Hill’s Value Table approach (this conference) • Betebenner transition matrix approach (2005) • Standards-based interpretations, can use baseline data
6. An Ongoing Program of Research • A turbulent field (“in its adolescence,” to quote Lissitz) • Large-scale implementation in a turbulent field requires extraordinary flexibility to keep up with the state of the art • And yet, too much flexibility can thwart useful interpretation of trend data
7. Consistency of Reports with Other Attributes • Responsive to instruction? • Understandable to stakeholders? • Grounded in policy aims? • Valid & reliable?
Setting standards for growth What’s reasonable? vs What do we hope to accomplish? What’s fair?
Conclusions • Can we add growth? • Yes! • Should we add growth? • Yes, where there is an evaluative framework tied to policy objectives, a systemic approach, and alignment with the values of educational research • Must we add growth? • An option, not a requirement because of the extraordinary necessary infrastructure
Recommendations for Policymakers • Understand the basic differences between models – Run simulations with real data • Understand the limitations • Listen to practitioners • Listen to methodologists • Anticipate cost/benefits • Lack of stability corrupts meaning • Do not over-specify the details in statute • This field moves ahead quickly • Flexibility to implement advances is key
Recommendations for Accountability Implementation Staff • State Directors: give your staff time to write it up!! • Require greater detail in the Technical Manuals that allows for comprehensive review of the procedures • Explain it (as much as you can) to your legislators and Congresspersons • Challenge assumptions • Status quo is good • Change is good • Resource assumptions • Claims of proponents
Recommendations for Technical Researchers • Validity need not conflict with transparency • Validity • Maintain sufficient complexity to produce valid results • Transparency for non-technical stakeholders • Simple, but accurate reports • Grounded interpretations • Transparency for technical stakeholders • Comprehensive documentation of the entire system, including psychometric and statistical models • Facilitation of replication • Facilitation of primary research on strengths and weaknesses
Recommendations for Technical Researchers • Pay systemicattention to… • Assumptions of psychometric models • Assumptions of content standard models • Assumptions of statistical models • Think carefully about what the models can tell us and cannot tell us about instruction, curriculum, and student development • Develop simple graphical representations of the model and its important concepts for policymaker consumption • Become involved in public policy forums as a community lobbyin order to promote appropriate interpretation of data. • We cannot give our cautions, wash our hands of how the data is used, and stand on the outside of the political process
Recommendations for AllStakeholders • Realize that with all of the high stakes surrounding accountability uses of student achievement data, there are forces that can work against community interests: • Economic benefits, reputations, and other personal investments can cause proponents of specific systems to avoid scrupulous investigations of the shortcomings of those systems and/or the benefits of competing approaches • Willingness to be and accountability for being rigorously honest and open-minded about multiple approaches is an essential part of improving and evaluating growth-based accountability systems