210 likes | 521 Views
Designing an impact evaluation :. Randomization , statistical power, and some more fun…. Designing a (simple) RCT in a couple steps. You want to evaluate the impact of something (a program, a technology , a piece of information, etc.) on an outcome .
E N D
Designing an impact evaluation: Randomization, statistical power, and some more fun…
Designing a (simple) RCT in a couple steps • You want to evaluate the impact of something (a program, a technology, a piece of information, etc.) on an outcome. Example: Evaluate the impact of free schoolmeals on pupils’sschoolingoutcomes. • You decide to do itthrough a randomizedcontrolled trial. • Why? • The questions thatfollow: • Type of randomization – Whatismostappropriate? • Unit of randomization– What do weneed to think about? • Sample size > These are the thingswewill talk about now.
I. Where to start • You have an HYPOTHESIS Example: Free meals => increasedschoolattendance => increasedamount of schooling => improved test scores. Or couldit go the otherway? • To test yourhypothesis, youwant to estimate the impact of a variable Ton an outcome Y for an individual i. In a simple regressionframework: • How couldyou do this? • Compare schoolswith free meals to schoolswith no free meals? • Compare test scores before the free meal program wasimplemented to test scores after? Yi=αi+βT+εi
II. Randomization basics • You decidedto do use a randomized design. Why?? • Randomizationremoves the selectionbias > Trick question: Does the sampleneed to berandomlysampledfromthe entire population? • Randomizationsolves the causal inference issue, by providing a counterfactual = comparison group. Whilewecan’t observe YiT and YiCat the same time, wecanmeasure the averagetreatmenteffect by computing the difference in meanoutcomebetweentwo a priori comparable groups. Wemeasure: ATE=E[YT]- E[YC]
II. Randomization basics • What to think of whendecidingon your design? • Types of randomization/ unit of randomization • Block design • Phase-in • Encouragement design • Stratification? The decisionshould come from (1) yourhypothesis, (2) yourpartner’simplementationplans, (3) the type of intervention! Example: Whatwouldyou do? • Nextstep: How manyunits? = SAMPLE SIZE. Intuition --> Why do weneedmany observations?
Remember, we’reinterested in Mean(T)-Mean(C) Wemeasure scores in 1 treatmentschool and 1 control school > Can I sayanything?
III. Sample size • But how to pick the optimal size? -> It all depends on the minimum effect size you’dwant to be able to detect. Note: Standardizedeffect sizes. • POWER CALCULATIONS link minimum effect size to design. • Theydepend on severalfactors: • The effect size youwant • Yourrandomizationchoices • The baselinecharacteristics of yoursample • The statistical power youwant • The significanceyouwant for yourestimates We’ll look intothesefactors one by one, starting by the end…
III. Power calculations(1) Hypothesistesting • Whentrying to test an hypothesis, one actually tests the nullhypothesis H0against the alternative hypothesis Ha, and tries to reject the null. H0: Effect size=0 Ha: Effect size≠0 • Two types of error are to fear:
III. Power calculations(1) Significance • SIGNIFICANCE= Probabilitythatyou’dconcludethatT has an effectwhen in factitdoesn’t. It tells youhow confident youcanbe in youranswer. (Denoted α) • Classical values: 1, 5, 10% • Hypothesistestingbasicallycomes down to testingequality of meansbetweenT and C using a t-test. For the effect to besignificant, it must bethat the t-stat obtainedbegreaterthan the t-stat of the significancelevelwanted. Or again: must begreater or equal to tα=1.96
III. Power calculations(2) Power • POWER= Probabilitythat, if a significanteffectexists, youwillfindit for a givensample size. (Denotedκ) • Classical values: 80, 90% • To achieve a power κ, it must bethat: • Or graphically… • In short: To have a high chance to detect an effect, one needsenough power, whichdepends on the standard error of the estimate of ß.
III. Power calculations(3) Standard error of the estimate • Intuition = the higher the standard error, the lessprecise the estimate, the more trickyitis to identify an effect, the higher the need for power! • Demonstration: How does the spread of a variable impact on the precision a meancomparisontest?? • Wesawthat power depended on the SE of the estimate of ß. But whatdoesthis standard errordepend on? • Standard deviation of the error (how heterogenous the sampleis) • The proportion of the population treated (Randomizationchoices) • The sample size
III. Power calculations(4) Calculations • Wenow have all the ingredients of the equation. The minimum detectableeffect (MDE) is: • As youcansee: • The higher the heterogeneity of the sample, the higher the MDE, • The lower N, the higher the MDE, • The higher the power, the lower the MDE • Power calculationsin practice, will correspond to playingwith all theseingredients to find the optimal design to satisfyyour MDE. • Optimal sample size? • Optimal portion treated?
III. Power calculations(5) More complicatedframeworks • Severaltreatments? • Whathappenswhen more than one treatment? • It all depends on whatyouwant to compare !! • Stratification? • Reduces the standard deviation • Clustered (block) design? • Whenusing clusters, the outcomes of the observations within a cluster canbecorrelated. Whatdoesthismean? • Intra-cluster correlation rhô, the portion of the total variance explained by within variance, implies an increase in overall variance. • Impact on MDE? • In short: the higher rhô, the higher the MDE (increasecanbe large)
Summary • Whenthinking of designing an experiment: • Whatisyourhypothesis? • How manytreatment groups? • What unit of randomization? • Whatis the minimum effect size of interest? • What optimal sample size considering power/budget? => Power calculations !