190 likes | 538 Views
Matching. Method Reading Group September 22, 2008. Take Home Message. Matching is a useful statistical method for analyzing available data It can accurately estimate causal affects under stringent assumptions, i.e. when treatment assignment can be fully accounted for…
E N D
Matching Method Reading Group September 22, 2008
Take Home Message • Matching is a useful statistical method for analyzing available data • It can accurately estimate causal affects under stringent assumptions, i.e. when treatment assignment can be fully accounted for… • However these assumptions are not very likely to be met in most observational studies… • Hence Imbens and Wooldridge (2008) forcefully argue that matching, propensity score and weighting should only be combined with regression techniques
Outline • Definition and Usage • Identification Assumptions • What assumptions need to be met for identification • How realistic are these assumptions • Applications: Propensity Score, Sub-classification, and Weights • Advantages • Limitations • Recipe for matching, including results from MatchIt package • Conclusion
Matching: definition and usage • Matching is a technique to impute missing potential outcomes, using only the outcomes of a “few neighbors” of the opposite treatment group, based on observed characteristics. • Matching is a method of strategic sub-sampling. Its goal is to form quasi-experimental contrasts; i.e. to non-parametrically balance the variables in X (confounding variables) across D (treatment) • Most often have been applied where: • Interest is in average treatment affect for the treated • There exists a large reservoir of potential controls
Assumptions: when can matching identify causal effects Ignorability: Treatment assignment (i.e. treatment status) is independent of the potential outcomes and any function of them, and can thus be ignored. Strong Ignorability (assumptions 1-S and 2-S pp:91): Assumption 1: Unconfoundedness Assumption 2: (Overlap)
Identification Assume we want to calculate average treatment effect for the treated; then ATT= E[Yi(1)-Yi(0)|D=1] = E[Yi(1)|D=1] - E[Yi(0) |D=1] The problem is that E[Yi(0) |D=1] is not observed. The big question is under what conditions we use observed data to estimate ATT? In observational studies we cannot assume that confounding variables have the same distribution across control and treatment groups. This means that in order to identify causal effect we must make a stronger assumption: ATT= E[Yi(1)-Yi(0)|D=1, X=x] = E[Yi(1)|D=1, X=x] - E[Yi(0) |D=1, X=x] We know that, if unconfoundedness holds: E[Yi(0)|D=1, X] = E[Yi(0)|Di=0, X] Hence: = E[Yi(1)|Di = 1,Xi = x] − E[Yi(0)|Di = 0,Xi = x] = E[Yi|Di= 1, Xi = x] − E[Yi|Di= 0,Xi= x]
Propensity Score • PS is used regularly as a non-parametric method of adjustment for treatment assignment • PS is the probability of receiving treatment given all covariates • More formally… • PS are usually estimated using logit/probit (default in MatchIt package is logit) • the message of chapter 4 is, that if (big if) we know the TRUE Propensity Scores then we can identify causal effect even in observational studies, given strong ignorability.
Usage I - PS as control variable Popular usage: Since PS balances the variables in X across D, researchers commonly use the PS as a single control to dummy treatment variable Problems: • But PS has no substantive meaning. There is no theory explaining the exact bias resulting from using single scalar estimator replacing all control variables • Linearity assumption does not necessarily hold. Around .45-.5 cases are similar across treatment assignment but this is not the case as we move away close to 1 and zero • No formal asymptotic properties have been derived for the case with the propensity score unknown (Imbens and Wooldridge 2008).
Usage II – PS as Sub-Classification • Chapter 4 demonstrates that with perfect stratification we have perfect identification • What does perfect stratification mean? Example: assume that two dummy covariates fully account for assignment of treatment (i.e. strong ignorability holds in each strata)... • Perfect stratification means: • we know exactly how to map covariates to treatment assignment • within each strata, the covariates have the same distributions for both control and treated cases (i.e. balance); all we need to test is that there is sufficient overlap… • BUT, things get complicated when X includes many covariates, and /or confounders are continuous…
Sub-classification (cont.) • If we believe that assignment can be fully accounted for by the PS then we can use the PS to divide our sample into strata, or subclasses • BUT this rarely, if ever, happens in reality, hence we need to use some judgment when constructing our strata from the PS • This process includes trade-off between bias and efficiency • Sub-classification and weighting – let’s assume that we do not have the same number of control and treated within the same stratum, how will that effect the way we calculate the treatment effect? Example • Weighting the treated population by the inverse of the propensity score recovers the expectation of the unconditional response under treatment See Imbens and Woodbridge (2008: 30)
Problems • If one can obtain consistent estimates of the TRUE propensity score, she can solve the problems created by sparseness of data • But, in observational studies we do not know the true value of the propensity score; we can only estimate it using regression methods • Things become especially problematic when covariate distribution are very different across treatment groups, implying that the PS are getting close to zero or one for some of the covariates. Here estimated PS are very sensitive to the specification of the regression model • More so, when the PS get close to zero in treatment groups and close to one for units in the control, the weights can be very large, making those units particularly influential in the estimate of the average treatment affect. This makes estimations imprecise • Imbens and Wooldridge do not recommend this method
PS as Matching • Matching estimators impute the missing potential outcomes using only the outcomes of a few “nearest neighbors” of the opposite treatment group. • Matching is judged to be successful if, for both the control and treatment groups, the distribution of the matching variables is the same (i.e. balanced) • What does the “nearest neighbor” mean? • In practice (recipe): • First order the data according to the PS from high to low • High PS scores are the first candidates for matching in the treated group. remember – we do not match according to PS scores, but according to distance • Matching can be done with or without replacement (in matching with replacement, the control group member is used as many times as she is the best match) – the consensus is to match with replacement because this improves balance
Recipe for matching using MatchIt • Check summary statistics of the covariates by treatment status: report, for each covariate, the difference in averages by treatment status, scaled by the square root of the sum of the variances (this is a scale-free measure of the difference in distributions) • Use MatchIt R package (Ho, Imai, King, and Stuart (2004)) • Note: These functions do not estimate treatment effects; they only create matched datasets • matchit function creates propensity scores; match.data function creates the dataset • Example-Let “A” be the dataset (the function will create Anew as a matched dataset on covariates x1, x2, x3, and y dependent variable): • Anew <- match.data(matchit(D~x1+x2+x3, data=A, method=“genetic”))
Recipe for matching using MatchIt • Run balance diagnostic procedures: a) Check means of covariates in treatment and control; b) Run QQ-plot between covariates in treatment and control groups • Try several different model specifications (for estimating propensity scores) and compare the balance achieved under each specification • Once you are satisfied with the balance achieved by your model/matching you can estimate treatment effects by • Difference in means: compare the mean outcomes across matched groups (fixed treatment effects) • mean(Anew$weight[Anew$D==1]*Anew$y[Anew$D==1])-mean(Anew$weight[Anew$D==0]*Anew$y[Anew$D==0] • Regression-adjusted matched estimate (heterogeneous treatment effect ): run a regression of outcome on treatment indicator and confounding covariates using weights to force sample to represent matched groups
Monte Carlo test The top number is the bias and the number in parentheses is the standard error. Each calculation is only based on 20 iterations. The true estimating equation of the propensity score is P(T=1) = logit-1(b0 + b1*x1 + b2*x2 + b3*x3). All data are matched only using x1 and x3. All models assume a fixed treatment effect (as opposed to heterogeneous).
Matching problems… • Problems: • Assuming overlap we get unbiased estimates when covariates are discrete. But there are biases when we match continuous variables • Thus far we assumed that observable characteristics account fully for the treatment assignment. If, on the other hand, in the PS estimation equation, the coefficients of the omitted covariates on the PS are quite large, we get larger biases • We are currently unable to estimate the variance of most matching estimators with commonly accepted methods • Imbens and Woodbridge recommend combining regression and PS weighting
References • Diamond and Sekhon (2008), Genetic Matching for estimating Causal Effect, http://sekhon.berkeley.edu/papers/GenMatch.pdf • Ho, Imai, King and Stuart (2007), Political Analysis, Vol (15): 199-236 • Imbens and Wooldrige (2008) Recent Developments in Econometrics of Program Evaluation, NBER WP 14251 (August 2008) • Sekhon (forthcoming), The Neyman-Rubin Model of causal Inference and Estimation via Matching Methods http://sekhon.berkeley.edu/papers/SekhonOxfordHandbook.pdf