1 / 31

James J. Kemple Executive Director Research Alliance for New York City Schools October 9, 2009

Thinking Hard About Evidence: How Should Districts Assess the Impact of ARRA and “Race to the Top”?. James J. Kemple Executive Director Research Alliance for New York City Schools October 9, 2009. Organization of Talk. I. Introduction II. Design Challenges in Evaluating Programs/Progress

ghalib
Download Presentation

James J. Kemple Executive Director Research Alliance for New York City Schools October 9, 2009

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Thinking Hard About Evidence: How Should Districts Assess the Impact of ARRA and “Race to the Top”? James J. Kemple Executive Director Research Alliance for New York City Schools October 9, 2009

  2. Organization of Talk I. Introduction II. Design Challenges in Evaluating Programs/Progress III. What works? IV. Using Data Day-to-Day V. Innovation Horizon VI. Conclusion

  3. Thinking hard about data, evidence and effectiveness • The plural of anecdote in not data. • The propagation of data is not evidence. • The presence of evidence is not necessarily proof of effectiveness. • Proof of effectiveness does not necessarily mean the benefits justify the costs.

  4. Candid assessment of the state of available evidence in education reform. • Best data is on the nature of the problems. • Good evidence on the differences between high performing and low performing schools. • Much more limited proof of effective strategies for transforming low-performing schools into high-performing schools. • A focus on outcomes and not impact has left a track record of confusion and the wrong answer to the right question.

  5. How can you determine if your students will better off with a new program or policy initiative? • Insist on high-quality evidence from the vendor. • Read the literature/Check the What Works Clearinghouse. • Participate in a study. • Focus on impacts (comparisons with “counterfactuals”) not just outcomes • Compare costs and benefits.

  6. An initiative of the U.S. Department of Education’s Institute of Education Sciences. According to their website, the What Works Clearinghouse: • Produces user-friendly practice guides for educators that address instructional challenges with research-based recommendations; • Assesses the rigor of evidence on the effectiveness of interventions giving educators the tools to make informed decisions; • Develops and implements standards for reviewing and synthesizing education research; and • Provides a public and easily accessible registry of education evaluation researchers to assist with designing and carrying out rigorous evaluations. http://ies.ed.gov/ncee/wwc

  7. Measuring a Program’s Effectiveness:An Education Production Function Eist = f(Pist, NSit, Rist, Xist, eist) Eist = Output of student i in school s in year t Pist = The program under consideration NSit = Non-school inputs (e.g., “ability,” extracurriculars, educational history) Rist = School inputs under control of school Xist = School inputs NOT under control of school eist = Everything else (e.g., measurement error)

  8. Recipe for an Ideal Evaluation of a Program Step 1: Obtain PERFECT test. Step 2: Test students in fall. Step 3: Test students in spring. Step 4: Set school and non-school parameters back to original conditions. Step 5: Assign students to the program in question. Step 6: Repeat Steps 3 & 4. Step 7: Impact of Program = (Step 3 – Step 2) w/ pgm - (Step 3 – Step 2) w/o pgm

  9. The “2nd Best” Option:Replace Step 4 • Randomly assign some students, classrooms, teachers, or schools to the program in question • The others are randomly assigned to the status quo (or another program).

  10. Why methods matter • Outcomes vs. Impacts • Outcome-focused studies risk getting the wrong answer to the right question • Outcome standards risk awarding programs: • based on who they serve, rather than what they do • that operate under promising conditions, rather than use promising practices • Essential to rely on the correct counterfactual

  11. 80.4 72.9 72.2 63.3 48.6 Judging Program Impact: Compared to Whom 100 90 National Averages for Similar Students in Similar Schools Evaluation Sample 80 70 60 Percent Graduating On-Time 50 40 30 Note: National average estimates are adjusted to represent a sample with the same background characteristics as those in the Evaluation Sample. 20 10 0 Academy Non-Academy Career/Tech. General Academic

  12. 85 85 84 72 71 53 Academy Group Control Group Judging Program Impact:Context Matters 100 100 100 90 90 90 ) ) 80 80 80 ) 70 70 70 60 60 60 50 50 50 Graduation Rate (% Graduation Rates (% Graduation Rates (% 40 40 40 30 30 30 20 20 20 10 10 10 0 0 0 Program A Program B Program C

  13. Career Academy Impact on Post-Secondary EarningsAccumulates to $16,700 over 8 years; $30,00 for young men

  14. Disadvantages of Random Assignment • Not feasible when all students, classrooms, or schools, receive the program or reform in question. • Requires close collaboration between researchers, policy makers, practitioners, funders and vendors. • The larger the scope of the reform, the more difficult to implement. • Cost and implementation challenges often limit scale and, thus, limit lessons for wider range of school/student situations.

  15. Other Alternatives for Counterfactuals • Use ratings and rankings to determine assignment to new programs, reforms, or resources (regression discontinuity). • Construct carefully matched comparison groups based on characteristics associated with selection for new programs, reforms, or resources (propensity scores). • Model change or growth over time (value added).

  16. To Recap:The keys to accurate assessment of effectiveness include… • Focus on impacts not just outcomes: find a credible counterfactual. (That is, compared to what would have happened in the absence of the intervention) • Accurate outcome instruments. (e.g., perfect tests.) • Invest in learning about why and how programs work (or do not work)

  17. What do we know about what works and with what confidence? Dropout prevention: Rigorous evidence/modest impacts High school reform: Moderately rigorous evidence/modest impacts Adolescent literacy: Rigorous evidence/limited impacts Boost-up math: Almost no rigorous evidence School-to-work transition: Rigorous evidence/large impacts

  18. What do we know about what works and with what confidence? Early childhood/pre-school education: Rigorous evidence/ strong impact Early grade reading: Rigorous evidence/large impact Professional development and supports for early grade reading: Rigorous evidence/limited impact Technology in schools: Rigorous evidence/limited impact

  19. What do we know about what works and with what confidence? Class size reduction: Rigorous evidence/strong impact Academic content in after school programs: Rigorous evidence/limited impact

  20. Does a positive impact mean we should adopt the program? • After establishing potential benefits, one must consider costs; • Ideally one can then compare the cost-benefit ratios of different interventions.

  21. Present Value of Per Child Program Costs and Estimated Lifetime Compensation Across Interventions All data in 2004 dollars

  22. Lifetime Benefit to Cost Ratios Across Interventions Notes: All data in 2004 dollars; all ratios used 3% discount rate except for the Job Corps evaluation which used a rate of 4%.

  23. Using data day-to-day: From what works to what do I do? “In God we trust, everyone else bring evidence.” Data-driven instruction Multi-component accountability systems

  24. Multi-components Accountability Systems Progress: Emphasizing value-added (incorporating counterfactuals) Performance: Acknowledging levels Learning environment Qualitative review of school functioning and capacities Transparency and disclosure Measurement problems

  25. Data Driven Instruction Regular, formative assessments of specific skills Accessible data systems Teams of teachers, counselors and administrators analyzing data and developing strategies Professional development and external supports Differentiated instruction Alignment with standards and accountability

  26. The innovation horizon: ARRA and Race to the Top • Alternative certification/routes to teaching: Moderately rigorous evidence/mixed impact • Teacher induction/deployment: Almost no rigorous evidence one way or the other. • Teacher quality assessments: Almost no rigorous evidence one way or the other • Data-driven instruction: Almost no rigorous evidence one way or the other. • Extended learning time and community connected learning time: Almost no rigorous evidence one way other the other • Charters: Rigorous evidence/mixed impact

  27. Public schools with more autonomy; • In 1992 there were 2 charter schools; • Today there are over 4,100 charter schools; • Currently approximately 1,200,000 students are enrolled.

  28. Evidence from three states: Florida, North Carolina, and Texas • Use longitudinal data on individual students; • All report overall negative effects of attending a charter school on student test scores; • All report charter schools improve over time. • A newly released study finds that in FL charter school students are more likely to graduate from high school and are more likely to attend college than students attending traditional high schools.

  29. Evidence from lotteries in two cities: Chicago and New York City • Exploit fact that most charter schools must admit all student who apply or hold a lottery; • Results from Chicago suggest no gains for charter school students; • Results from New York suggest small to large yearly test score gains.

  30. Summary of Evidence on Charter Schools • Impacts on student test scores is mixed (at best); • Perhaps some evidence that the schools improve with age; • Parents may be more satisfied; • Evidence from one study that there may be positive impacts on longer-term outcomes such as high school graduation and college attendance.

  31. Cornerstones of Knowledge Building and Data-Driven Decision Making • Balancing rigor and relevance: • “In God we trust, everyone else bring evidence.” • “There are three kinds of lies: Lies, Damned-Lies, and Statistics.” • Partnership between policy makes, practitioners, experts, funders, and researchers • Mutually reinforcing commitment to school improvement and building evidence • Setting high standards of evidence: reliance on counterfactuals and mixed methods • Focusing on benefits relative to costs

More Related