1 / 28

James L. Woodworth, CREDO Hoover Institute, Stanford Wen-Juo Lo, University of Arkansas

The Impact of Selection of Student Achievement Measurement Instrument on Teacher Value-added Measures. James L. Woodworth, CREDO Hoover Institute, Stanford Wen-Juo Lo, University of Arkansas Joshua B. McGee, Laura and John Arnold Foundation Nathan C. Jensen, Northwest Evaluation Association.

derron
Download Presentation

James L. Woodworth, CREDO Hoover Institute, Stanford Wen-Juo Lo, University of Arkansas

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Impact of Selection of Student Achievement Measurement Instrument on Teacher Value-added Measures James L. Woodworth, CREDO Hoover Institute, Stanford Wen-Juo Lo, University of Arkansas Joshua B. McGee, Laura and John Arnold Foundation Nathan C. Jensen, Northwest Evaluation Association

  2. Presentation Outline • Purpose • Statistical Noise • Why it matters • Sources • Data • Methods • Results

  3. 1.Purpose 2.Statistical Noise 3.Data 4.Methods 5.Results Purpose The purpose of this paper is to present to a statistics lay population the extent to which psychometric properties of student test instruments impact teacher value-added measures.

  4. 1.Purpose 2.Statistical Noise 3.Data 4.Methods 5.Results Question What is the impact of statistical noise introduced by different test characteristics on the stability and accuracy of value-added models?

  5. 1.Purpose 2.Statistical Noise 3.Data 4.Methods 5.Results Why it matters? Below Basic Basic Proficient Advanced 5th 6th

  6. 1.Purpose 2.Statistical Noise 3.Data 4.Methods 5.Results Primary Sourcesof Statistical Noise • Test Design • Vertical Alignment • Student Sample Size

  7. 1.Purpose 2.Statistical Noise 3.Data 4.Methods 5.Results Test Design Proficiency Tests Growth Tests Questions measure across entire ability spectrum Designed to differentiate between all points on the distribution Smaller variance in CSE • Focused around proficiency point • Designed to differentiate between proficient and not proficient • Larger variance in Conditional Standard Errors (CSE)

  8. 1.Purpose 2.Statistical Noise 3.Data 4.Methods 5.Results Test Design Paper and Pencil Tests Computer Adaptive Test Larger item pool for question selection Focused around student ability point Smaller variance in CSE • Limit item pool to control length • Focused around proficiency point • Large variance in CSE

  9. 1.Purpose 2.Statistical Noise 3.Data 4.Methods 5.Results Test Design CSE Heteroskedasticity Due to Item Focusing: TAKS Reading Grade 5, 2009 CSE Range: 24 - 74 Weighted average CSE = 38.96

  10. 1.Purpose 2.Statistical Noise 3.Data 4.Methods 5.Results Vertical Alignment • Year to year alignment can impact the results of VAM • Units must be equal across test sessions • Spring-Spring VAM are most affected • Fall-Spring VAM using same test avoid much of problem • Item alignment on computer adaptive tests can impact the results of VAM

  11. 1.Purpose 2.Statistical Noise 3.Data 4.Methods 5.Results Student Sample Size • Central Limit Theorem • Larger student n provides a more stable estimate of teacher VAM. • Typical single year student n’s are 25, 50, and 100 for elementary and middle school teachers.

  12. 1.Purpose 2.Statistical Noise 3.Data 4.Methods 5.Results Question What is the impact of statistical noise introduced by different test characteristics on the stability and accuracy of value-added models?

  13. 1.Purpose 2.Statistical Noise 3.Data 4.Methods 5.Results Data Sets TAKS – Texas Assessment of Knowledge and Skills: Grade 5 Reading, 2009 Population Statistics • Proficiency test • Vertically aligned scale scores • Average yearly gain • 24 vertical scale points at “Met Expectations” • 34 vertical scale points at “Commended” • Standard Errors – Conditional Standard Errors reported by TEA for each vertical scale score • CSE Range: 24 - 74 • Weighted average CSE = 38.96 • Highly skewed distribution • High variance

  14. 1.Purpose 2.Statistical Noise 3.Data 4.Methods 5.Results Data Sets TAKS – Texas Assessment of Knowledge and Skills: Grade 5 Reading N: 323,507 μ: 701.49 σ2: 10048.30 σ: 100.24

  15. 1.Purpose 2.Statistical Noise 3.Data 4.Methods 5.Results

  16. 1.Purpose 2.Statistical Noise 3.Data 4.Methods 5.Results Data Sets MAP – Measures of Academic Progress • Growth measure • Computer Adaptive Test • Single scale • Average yearly gain • 5.06 RIT points • Standard Errors – average standard errors range 2.5 - 3.5 RIT • Slightly skewed distribution • Small variance

  17. 1.Purpose 2.Statistical Noise 3.Data 4.Methods 5.Results Data Sets MAP – Measures of Academic Progress N: 2,663,382 μ: 208.35 σ2: 161.82 σ: 12.72

  18. 1.Purpose 2.Statistical Noise 3.Data 4.Methods 5.Results Simulated Data As it is impossible to isolate true scores and error with real data, we created simulated data points. • True scores are known for all data points • Every data point was given the same growth • All iterations have the same value-added • Any deviation from expected is a function of measurement error only

  19. 1.Purpose 2.Statistical Noise 3.Data 4.Methods 5.Results Simulated Data We simulated 10,000 z-scores ~ N (0,1) From this we selected nested, random samples of n=100, n=50, n=25.

  20. 1.Purpose 2.Statistical Noise 3.Data 4.Methods 5.Results Data Generation Pre-scores = P1 = z-score • σ + Post-scores = P2 = P1 + controlled growth Controlled Growth Values: TAKS = 24 (TAKS at “Commended” = 34) vertical scale points MAP = 5.06 RIT points Simulated Growth = (P2 + (Random2 • CSE)) - (P1 + (Random1 • CSE)) Random1 and Random2 ~ N (0,1) CSE = Conditional Standard Errors as reported by TEA and NWEA

  21. 1.Purpose 2.Statistical Noise 3.Data 4.Methods 5.Results Question What is the impact of statistical noise introduced by different test characteristics on the stability and accuracy of value-added models?

  22. 1.Purpose 2.Statistical Noise 3.Data 4.Methods 5.Results Monte Carlo Simulation We ran 1,000 iterations for each simulation which was equivalent to the same students taking the test 1,000 times with the same true scores, but different levels of error. Simulated Growth = (P2 + (Random2 • CSE)) - (P1 + (Random1 • CSE)) Random1 and Random2 ~ N (0,1) CSE = Conditional Standard Errors as reported by TEA and NWEA Aggregated values by subgroup to determine average performance for each iteration. False Negative : Simulated Growth < .5 Controlled Growth False Positive: Simulated Growth > 1.5 Controlled Growth

  23. 1.Purpose 2.Statistical Noise 3.Data 4.Methods 5.Results Results

  24. 1.Purpose 2.Statistical Noise 3.Data 4.Methods 5.Results Results

  25. 1.Purpose 2.Statistical Noise 3.Data 4.Methods 5.Results Results

  26. 1.Purpose 2.Statistical Noise 3.Data 4.Methods 5.Results Results

  27. Conclusions The Growth/Error ratio is the critical variable in VAM stability. Necessary student n to achieve a stable VAM is sensitive to the Growth/Error ratio. Stable VAMs are possible even with typical classroom n’s; however, careful attention must be paid to the suitability of the student assessment instrument.

  28. Limitations No Differentiation between Student Effects, Teacher Effects, or School Effects No Environmental Effects No Interaction Terms These are all areas for additional research.

More Related