1 / 35

Challenges in Conducting Reliable Education Meta-Analysis

Explore the difficulties and pitfalls of conducting high-quality meta-analyses in education research, including issues with inappropriate comparisons, selection bias, intervention quality, and outcome measures.

lsanchez
Download Presentation

Challenges in Conducting Reliable Education Meta-Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dylan Wiliam, UCL (@dylanwiliam) Why meta-analysis is reallyhard to do well in education www.dylanwiliamcenter.comwww.dylanwiliam.org

  2. Approaches to research synthesis Idealist Realist Philosophy Generate Explore Test Relation to theory Configurating Aggregating Approach to synthesis Iterative A priori Methods Quality assessment Theoretical search Exhaustive search Value contribution Avoid bias Emergent concepts Magnitude/precision Product Enlightenment Instrumental Use Gough (2012)

  3. Systematic reviews “A systematic review attempts to collate all empirical evidence that fits pre-specified eligibility criteria in order to answer a specific research question. It uses explicit, systematic methods that are selected with a view to minimizing bias, thus providing more reliable findings from which conclusions can be drawn and decisions made” (p. 6) Green, Higgins, Alderson, Clarke, Mulrow, and Oxman (2008)

  4. Meta-analysis “Many systematic reviews contain meta-analyses. Meta-analysis is the use of statistical methods to summarize the results of independent studies (Glass 1976). By combining information from all relevant studies, meta-analyses can provide more precise estimates of the effects of health care than those derived from the individual studies included within a review […] Meta-analyses facilitate investigations of the consistency of evidence across studies, and the exploration of differences across studies” Green, Higgins, Alderson, Clarke, Mulrow, and Oxman (2008)

  5. Key characteristics of systematic reviews • a clearly stated set of objectives with pre-defined eligibility criteria for studies; • an explicit, reproducible methodology; • a systematic search that attempts to identify all studies that would meet the eligibility criteria; • an assessment of the validity of the findings of the included studies, for example through the assessment of risk of bias; and • a systematic presentation, and synthesis, of the characteristics and findings of the included studies; Green, Higgins, Alderson, Clarke, Mulrow, and Oxman (2008)

  6. Problems with meta-analysis in education • Inappropriate comparisons • Aptitude x treatment interaction • The “file drawer” problem • Variations in intervention quality • Variation in population variability • Selection of studies • Sensitivity of outcome measures

  7. Inappropriate comparisons

  8. Inappropriate comparisons • Effects of interventions or associations? • Cross-level comparisons • Net effects versus gross effects

  9. Inappropriate comparisons • Effects of interventions or associations? • Cross-level comparisons • Net effects versus gross effects • “Business-as-usual” vs. alternative treatment

  10. Aptitude x treatment interactions

  11. Aptitude-treatment interaction • 113 non-formal education centres run by SevaMandir • In 56 centres, teachers were paid Rs.1,000 pcm • In 57 centres, teachers were paid • Rs.500 pcm for attendance up to 10 days plus • Rs.50 for each each day over the 10 day threshold • Attendance rate: • Fixed pay group 58% • Incentive group 79% • For the incentive group • Increase in instructional time: 32% • Increase in annual progress: 25% Duflo, Hanna, and Ryan(2012)

  12. The file-drawer problem

  13. The importance of statistical power • The statistical power of an experiment is the probability that the experiment will yield an effect that is large enough to be statistically significant. • In single-level designs, power depends on • significance level set • magnitude of effect • size of experiment • The power of most social studies experiments is low • Psychology: 0.4 (Sedlmeier & Gigerenzer, 1989) • Neuroscience: 0.2 (Burton et al., 2013) • Education: 0.4 • Only lucky experiments get published…

  14. Statistical power and effect size

  15. Variation in intervention quality

  16. Quality • Interventions vary in their • Duration • Intensity • class size reduction by 20%, 30%, or 50% • response to intervention • Collateral effects • assignment of teachers

  17. Variation in variability

  18. “It is also known, as an empirical—not definitional—fact that the standard deviation of most achievement tests in elementary school is 1.0 grade-equivalent units; hence the effect size of one year’s instruction at the elementary school level is about +1” (Glass, McGaw, & Smith, 1981 p. 103)

  19. Annual growth in achievement, by age Bloom, Hill, Black, andLipsey(2008)

  20. Sequential Tests of Educational Progress Educational Testing Service (1957)

  21. Annual achievement growth in Connecticut Wibowo, Hendrawan, and Deville (2009)

  22. Variation in variability Studies with younger children will produce larger effect size estimates Studies with restricted populations (e.g., children with special needs, gifted students) will generally produce larger effect size estimates

  23. Selection of studies

  24. Feedback in STEM subjects Ruiz-Primo and Li (2013) • Review of 9000 papers on feedback in mathematics, science and technology • Only 238 papers retained • Background papers 24 • Descriptive papers 79 • Qualitative papers 24 • Quantitative papers 111 • Mathematics 60 • Science 35 • Technology 16

  25. Classification of feedback studies Who provided the feedback (teacher, peer, self, or technology-based)? How was the feedback delivered (individual, small group, or whole class)? What was the role of the student in the feedback (provider or receiver)? What was the focus of the feedback (e.g., product, process, self-regulation for cognitive feedback; or goal orientation, self-efficacy for affective feedback) On what was the feedback based (student product or process)? What type of feedback was provided (evaluative, descriptive, or holistic)? How was feedback provided or presented (written, video, oral, or video)? What was the referent of feedback (self, others, or mastery criteria)? How, and how often was feedback given in the study (one time or multiple times; with or without pedagogical use)?

  26. Main findings

  27. Sensitivity to instruction

  28. Sensitivity of outcome measures • Distance of assessment from the curriculum • Immediate • e.g., science journals, notebooks, and classroom tests • Close • e.g., where an immediate assessment asked about number of pendulum swings in 15 seconds, a close assessment asks about the time taken for 10 swings • Proximal • e.g., if an immediate assessment asked students to construct boats out of paper cups, the proximal assessment would ask for an explanation of what makes bottles float • Distal • e.g., where the assessment task is sampled from a different domain and where the problem, procedures, materials and measurement methods differed from those used in the original activities • Remote • standardized national achievement tests. Ruiz-Primo, Shavelson, Hamilton, and Klein (2002)

  29. Impact of sensitivity to instruction Effect size Close Proximal

  30. Meta-analysis in education • Some problems are unavoidable: • Aptitude x treatment interactions • Sensitivity to instruction • Selection of studies • Some problems are avoidable: • Inappropriate comparisons • File-drawer problems • Intervention quality • Variation in variability • Unfortunately, many of the people doing meta-analysis in education: • don’t discuss the unavoidable problems, and • don’t avoid the avoidable ones

  31. Responses • The effects average out • The rank order of effects is still OK

  32. More significant challenges • “Tales not told” (Kvernbekk, 2019) • Evidence about “What worked” not “What works” • Finding conditions for the use of standardized effect size that are both justifiable and useful

  33. In the meantime… • Educators need to become “critical consumers” of educational research • Four questions • Does this solve a problem we have? • How much improvement will we get? • How much will it cost? • Will it work here?

  34. Thank You www.dylanwiliam.net

More Related