220 likes | 341 Views
Agenda:. Block Watch: Random Assignment, Outcomes, and indicators Issues in Impact and Random Assignment: Youth Transition Demonstration Who is randomized? Sample size, power, and effect size Who’s in the average?. Block Watch: Random Assignment, Outcomes, and Indicators.
E N D
Agenda: Block Watch: Random Assignment, Outcomes, and indicators Issues in Impact and Random Assignment: Youth Transition Demonstration Who is randomized? Sample size, power, and effect size Who’s in the average?
Block Watch: Random Assignment, Outcomes, and Indicators • What random assignment protocol would you use to assess the impacts of Block Watch? • What are the strengths and weaknesses of your approach? • What are the key outcomes you want to assess? What are indicators for those?
Youth Transition Demonstration Evaluation Plan: • Background on YTD evaluation plan • The basics of Impact size and significance • Power and sample size • No Shows/ Intent to Treat vs. Treatment on the Treated • Multiple Comparisons • Regression adjusted comparisons
Youth Transition Demonstration: • Targets Youth receiving disability payments to help in transition to adult life and employment • Goals: increase earnings, decrease costs, facilitate transition to self-sufficiency • Six program sites with variation in programs • Services: • Waiver of benefit decrease with earnings • Education, Job training, work placements • Case management, counseling, referral to services
YTD Evaluation: • Selected 6 sites for demonstration and evaluation • Intervention built on research from past programs and evaluations • Randomly assigned youth to treatment or control • Large sample sizes to allow identification of smaller effects and sub-group effects • Process and Impact Evaluation • Data collected from administrative files, surveys before and after program • Advisory group of experts
Sampling: • Why did they divide the list of potential participants (sampling frame) into groups of 10 for contact? • Why did they randomize 55 percent to the treatment? • Why get pre-intervention characteristics if they are randomly assigning groups?
Comparisons may be: -over time -across intervention groups with and without program; levels of intervention (“dosage”) Impact here!
Statistical significance: When can we rule out having an impact IF there is no impact? Compare 2 means from independent samples: Means: Proportions: Pooled sample variance:
Compare 2 means from independent samples: Means: Proportions: Pooled sample variance:
Compare 2 means from independent samples: Means: Proportions: Pooled sample variance:
Compare 2 means from independent samples: Means: Proportions: Pooled sample variance:
So, it’s easier to say impact is “real” (not just randomness) if: • Size of impact is larger • Variation in outcomes is small (S) • Sample sizes are larger Same factors figure into deciding how big a sample we need to find the effect if it’s there! [Power, sample size, minimally detectable effects]
Power and sample size: Given randomness, what % of time will you be able to rule out the null, IF it is NOT true (there IS an impact)?How big a sample size do you need to rule out NO effect if the program DOES have an impact? (Rossi et al p.312)
Online Calculators for Sample size and Power: • Sample size: • http://www.dssresearch.com/toolkit/sscalc/size_a2.asp • http://www.dssresearch.com/toolkit/sscalc/size_p2.asp • Power: • http://www.dssresearch.com/toolkit/spcalc/power_a2.asp • http://statpages.org/proppowr.html
Minimum Detectable Impacts:What are the smallest effects you will be able to detect given n and predicted S?
Adjustments to impact assessment: • Regression adjusted impacts decrease S and increase power by controlling for “noise” using baseline characteristics • Multiple Comparisons are a problem because randomness happens if you look long enough! • MDRC picked “primary outcomes” • Use adjustments to account for multiple comparisons
Who’s in the average? • “No shows” in treatment group didn’t get any services • Unlikely to be similar to “shows” • If drop, then may overstate potential impacts • “Intent to Treat” outcomes include outcomes for no-shows • “Treatment on the Treated” outcomes do not include no-shows • Non-response to follow-up surveys could bias impact assessments • Use administrative data available for all for key outcomes • Put resources into follow up to minimize non-response • Construct weights to make survey sample estimates comparable to baseline sample
Lessons from Summary: • Randomization is hard • Need to use power analysis to choose target sample sizes • Even randomization may not give comparable baseline characteristics • Regression may increase comparability and precision • Worry about who we have outcome information for (both control and treatment)