450 likes | 546 Views
Using Hierarchical Growth Models to Monitor School Performance: The effects of the model, metric and time on the validity of inferences. Pete Goldschmidt, Kilchan Choi, Felipe Martinez.
E N D
Using Hierarchical Growth Models to Monitor School Performance: The effects of the model, metric and time on the validity of inferences Pete Goldschmidt, Kilchan Choi, Felipe Martinez UCLA Graduate School of Education & Information StudiesCenter for the Study of EvaluationNational Center for Research on Evaluation, Standards, and Student Testing THE 34TH ANNUAL NATIONAL CONFERENCE ON LARGE-SCALE ASSESSMENT June 20 – 23 2004 Boston, MA
Purpose: We use several formulations of multilevel models to investigate four related issues: one, whether (and when) the metric matters using longitudinal models to make valid inferences about school quality; two, whether different longitudinal models yield consistent inferences regarding school quality; three, the tradeoff between additional time points and missing data; and four, the stability of school quality inferences across longitudinal models using differing number of occasions We examine three types of models: Longitudinal Growth Panel Models Longitudinal School Productivity Models Longitudinal Program Evaluation Models:
Longitudinal Growth Panel Models (LGPM) Research Questions: Inferences affected by test metric (Scale Scores Vs. NCEs)? Estimates for Growth Estimates of school effects Estimates of Program effects
LGPM • Longitudinal Panel Design • Keep track of students’ achievement form one grade to the next • E.g., collect achievement scores at Grades 2, 3, 4, and 5 for students in a school • Focus on students’ developmental processes • What do students’ growth trajectories look like?
Choice of Metric: Scale Scores: Normal Curve Equivalents: • Change represents a relative position from year to year not absolute growth in achievement • Relative standing compared to a norming population • IRT-based scale scores • Vertically equated scores across grades and years • Theoretically represent growth on a continuum that can measure academic progress over time • Change from year to year is an absolute measure of academic progress
2 Student Characteristics
3 Sampling Conditions for Monte Carlo Study
4 Summary Parameter Estimates Compared
5 Summary of Estimates Compared Using Rank Order Correlations
6 Summary of Results Describing SAT-9 Reading Achievement
7 Percent Reduction in Between School Variation in Growth
11 Correlations between Estimated Coefficients – Model 4
12 Correlation Pattern between Sampling Condition and Model – Reading SAT-9 Growth
13 Comparison of relative Bias to the Effect Size of Growth
14 Relationship between Relative Bias in NCEs for Initial Status
15 Relationship between Relative Bias in NCEs for Growth
Longitudinal School Productivity Model (LSPM) Research Questions: Inferences affected by test metric (Scale Scores Vs. NCEs)? Estimates for growth Estimates of school effects Estimates of “Type A” and “Type B” effects
LSPM • Multiple-cohorts design (Willms & Raudenbush, 1989; Bryk et. al., 1998) • Monitor student performance at a school for a particular grade over years • E.g., collect achievement scores for 3rd grade students attending a school in 1999, 2000, and 2001 • Focus on schools’ improvement over subsequent years
Research Question • To what extent does the choice of the metric matter when the focus is school improvement over time (NCE vs. scale score)? • A Multiple-cohort school productivity model is used as the basis for inferences about school performance
3-level Hierarchical Model for measuring school improvement Model I : Unconditional School Improvement Model Level-1 (within-cohort) model: Yijt = βjt0 + rijt * βjt0 : estimates of performance for school j (j = 1,.., J) at cohort t (t = 0,1,2,3,4) Level-2 (between-cohort, within-school) model: βjt0 = j0 + j1Timetj + ujt * j0 : status at the first year (i.e., Timetj = 0) or initial status for school j * j1 : yearly improvement / growth rate during the span of time for school j Level-3 (between-school) model: • j0 = 00 + Vj0 * 00 : grand mean initial status • j1 = 10 + Vj1 * 10 : grand mean growth rate
Model II: Student characteristics Level-1 (within-cohort) model: Yijt = βjt0 + βjt1SPEDijt + βjt2LowSESijt + βjt3LEPijt + βjt4Girlijt +βjt5Minorityijt + rijt Level-2 (between-cohort, within-school) model: βjt0 = j0 + j1Timetj + ujt Level-3 (between-school) model: • j0 = 00 + Vj0 • j1 = 10 + Vj1
Model III: Student characteristics & School intervention indicator Level-1 (within-cohort) model: Yijt = βjt0 + βjt1SPEDijt + βjt2LowSESijt + βjt3LEPijt + βjt1Girlij4 +βjt5 Minorityijt + rijt Level-2 (between-cohort, within-school) model: βjt0 = j0 + j1Timetj + ujt Level-3 (between-school) model: • j0 = 00 + 01 LAAMPj + Vj0 • j1 = 10 + 11 LAAMPj + Vj1
Model IV: Full Model Level-1 (within-cohort) model: Yijt = βjt0 + βjt1SPEDijt + βjt2LowSESijt + βjt3LEPijt + βjt1Girlij4 +βjt5Minorityijt + rijt Level-2 (between-cohort, within-school) model: βjt0 = j0 + j1Timetj + ujt Level-3 (between-school) model: • j0 = 00 + 01(%Minorityj) + 02(%LowSESj) + 03(%LEPj) + • 04 LAAMPj + Vj0 • j1 = 10 + 11(%Minorityj) + 12(%LowSESj) + 13(%LEPj) + • 14 LAAMPj + Vj1
Comparison of Key Parameters:NCE vs. Scale Score • Type A effect: includes effects of school policies and practice, educational context, and wider social influences • Type B effect: Includes the effects of tractable policies and practices, but excludes factors that lie outside the control of the school
NCE Results vs. Scale Score Results • School Ranking (rank-order corr.) • School improvement / growth rate parameter (corr. between estimates) • Effect size (par. est. / s.d. of outcome) • Statistical significance of the effect of the school intervention indicator variable
Conclusion • NCE vs. scale score – for the purpose of measuring school improvement under the multiple-cohort design: • Differences are minimal in terms of: • school ranking • school improvement / growth rate • effect size • statistical significance of the effect of the school intervention indicator
Conclusion (cont’d) • Results are consistent across sampling conditions, models, and content area
Reduction in Standard Error (SE) for Average Growth b/t Schools
Reduction in Standard Error (SE) for Average Status b/t Schools
Program Effect on AVG GROWTH b/t Schools by Sample and Number of Occasions