510 likes | 636 Views
Strategies for Using Partially Valid Instrumental Variables. Dylan Small Department of Statistics, Wharton School, University of Pennsylvania. Joint work with: Paul Rosenbaum Mike Baiocchi Marshall Joffe Tom Ten Have. Overview.
E N D
Strategies for Using Partially Valid Instrumental Variables Dylan Small Department of Statistics, Wharton School, University of Pennsylvania Joint work with: Paul Rosenbaum Mike Baiocchi Marshall Joffe Tom Ten Have
Overview • Example of Instrumental Variables (IV) method: Effect of World War II military service on future earnings. • Sensitivity to unobserved biases for IV method. • Strength of IVs and sensitivity to unobserved biases: How do small studies with strong IVs compare to large studies with weak IVs? • Extended instrumental variables methods when exclusion restriction for IV is invalid.
WWII Veteran Status and Earnings • Does military service raise or lower earnings? • Angrist and Krueger (1994) studied this in context of WWII military service and 1980 earnings (using 5% public use sample of US Census). • Lower earnings? Military service in WWII interrupts education or career. • Higher earnings? Labor market might favor veterans, GI Bill increases education.
WWII Vets (76% of men) earned on average $4500 more in 1980 than Non-Vets. This is association not causation: WWII Vets might not be comparable to Non-Vets in terms of health, criminal behavior…
We created matched triples: men matched on quarter of birth, race, age, education up to 8 years and location of birth. This figure provides reason to doubt military service increases earnings by $4500. From 1924 to 1926, the proportion of veterans stayed about constant and the earnings stayed about the same. From 1926 to 1928, the proportion of veterans decreased by 50% but earnings increased, suggesting military service decreases earnings.
Unmeasured Confounding Graph is conditional on measured confounders (race, education up to 8 years, location of birth) Earnings Veteran Status Unobserved Variables
Instrumental Variables Strategy Y=Outcome W=Treatment Z=IV Extract variation in W from Z that is free of unobserved confounders and use this variation to estimate the causal effect of W on Y. Key IV Assumptions: (1) Z independent of unobserved variables; (2) Z does not have direct effect on outcome. Y:Earnings Graph is conditional on measured confounders (race, education up to 8 years, location of birth) X W: Veteran Status Z: Year of Birth Unobserved Variables X
Strength of IV • An IV is strong if encouragement has a strong effect on treatment received; An IV is weak if encouragement has only a weak effect on treatment received. • Effects of Weak IVs • Increased Variance • Increased Sensitivity to Bias
Effect of Weak IVs I: Increased Variance If Z is a weak IV, then the variance of the IV estimate will be higher because less variation in W from Z can be extracted. Y X W|X Z|X Unobserved Variables X
95% CI for effect of military service using 1926 vs. 1928 IV: (-$1,445, -$500). 95% CI for effect of military service using 1924 vs. 1926 IV: (-$10,130, $10,750)
Extended IV Methods for Addressing Violation of Exclusion Restriction • Angrist, Imbens and Rubin (1996): two key conditions for valid IV are : • IV effectively random assigned conditional on measured covariates X • No direct effect on Y (exclusion restriction). • We consider situations in which the random assignment is plausible but the exclusion restriction is not.
Instrumental Variables Strategy Y=Outcome W=Treatment Z=IV Extract variation in W from Z that is free of unobserved confounders and use this variation to estimate the causal effect of W on Y. Key IV Assumptions: (1) Z independent of unobserved variables; (2) Z does not have direct effect on outcome. Y:Earnings Graph is conditional on measured confounders (race, education up to 8 years, location of birth) X W: Veteran Status Z: Year of Birth Unobserved Variables X
Vascular access in hemodialysis • Hemodialysis • One of main treatment options in end-stage renal disease (ESRD) • Requires access to vascular system • Three main types • Catheter • Synthetic material • Native arteriovenous fistula (AVF)
Vascular access (cont’d) • Type of VA (A) partially determines dose of dialysis (DD; S) • Native AVF allows larger doses than catheter • S may affect outcomes (e.g., mortality) • VA may have effects on outcome (Y) not mediated by dose (e.g., infection) • Incomplete directed acyclic graph (DAG) of key variables
Estimand of interest • To gauge impact of type of VA, interested in overall effect • Involves both • Direct effect (A->Y) • Indirect effect (A->S->Y) • Formulate in terms of potential outcomes:
Confounding by indication • AVFs given preferentially to healthier subjects • Results in confounding by indication • Often difficult to control using standard methods based on ignorable treatment assignment • Variety of treatments of dialysis patients in which standard approaches based on ignorability lead to implausible results • Dose of dialysis choice (S) also nonignorable
Instrumental variables • Alternative approach for estimation • Need to find instrumental variable (R) • Associated with treatment of interest (A) • Independent of unmeasured confounders, i.e., shares no unmeasured common cause with outcome Y. • Has no direct effect on outcome (exclusion restriction) • Practice at which dialysis provided reasonable candidate • Used for various analyses in Dialysis Outcomes and Practice Patterns Study (DOPPS) • Large, international study with hundreds of practices • Will assume that practice (R) shares no unmeasured common causes with S or Y.
Revise DAG • Need to elaborate DAG • Include • instrument/center (R) • Measured (X) and unmeasured (U) common causes of variables of interest • Is R a valid instrument for the overall effect of A on Y?
Graphical criteria for instrument • Remove effect of treatment of interest • Check whether R independent of/D-separated from Y • Directed path R->S->Y • Criterion not satisfied • R not a valid instrument for overall effect of A • In Angrist, Imbens & Rubin framework, the problem is that R has direct effect on Y through S and hence violates the exclusion restriction.
Second Example: Return to Schooling • Y=Earnings, A=Years of Education • Unmeasured confounders: Ability, Motivation. • Card (1993) proposes as an IV, R= distance person grew up from nearest four year college. • Problem: • R also affects whether person lives in an SMSA as an adult (S) conditional on A and measured confounders X (whether lived in an SMSA growing up, region where grew up and family background variables). • There is a wage premium to living in an SMSA as an adult.
Return to Schooling DAG • R (living near college growing up) is not a valid instrument for the overall effect of A (years of schooling) on Y (earnings) because it has direct effect on Y through S (lives in SMSA as an adult).
Estimation • For estimating overall effects of A in these two problems, can’t use • Standard methods based on ignorability • Standard instrumental variables methods • Idea: Look for interactions between R and X that can serve as instruments.
Extended Instruments • Look for component of X that interacts with R to affect A but not Y directly. • Card proposes family income as component of X that • Interacts with R to affect A : college proximity is a factor that lowers costs of higher education, consequently it has a bigger effect on a poorer family • Does not directly effect S nor Y: the direct earnings effect of living near a college or the direct effect on living in an SMSA does not vary by family background. R*X
Two-step approach • Estimate joint effect of A, S on Y • Estimate effect of A on S • Combine to obtain overall effect • In systems of linear models, overall effect is sum of • Direct effect of A: ψA • Indirect effect of A: ψSΦA
Two-step approach (1st step) • Yaspotential outcome • Model for joint effect: • Yas=Y00+aψA+sψS • Rank-preserving/deterministic formulation • Model for observables • E*=Best Linear Predictor • E*(Y|X,R)=E*(YAS|X,R)= E*(Y00|X,R,X*R)+E*(A|X,R,X*R)ψA+E*(S|X,R,X*R)Ψs • Identifiability requires that E*(Y00|X,R,X*R), E*(A|X,R,X*R) and E*(S|X,R) not collinear. • One way: Assume E*(Y00|X,R,X*R) only depends on X. Then we need one component of X that interacts with R to affect A. • Another way: Assume E*(Y00|X,R,X*R) depends on X and R but not X*R. Then we need at least two components of X that interacts with R to affect. • Estimation by two stage least squares. Regress A and S on X, R and X*R. Regress Y on
Two-step approach (2nd step) • Under assumptions • Effect of A on S confounded • R not instrument for effect of A on S • Consider alternative • Linear model for joint effect of R, A • Sra=S00+rΦR+aΦA • Model for observables • E*(S|X,R)=E*(S00|X,R,X*R)+RΦR+ E*(A|X,R,X*R)ΦA • Can estimate by 2SLS under the assumption that E*(S00|X,R,X*R) does not depend on X*R (uncheckable) and that X*R affects A. • Regress A on X, R, X*R. Regress S on , X, R. R*X
Summary • The IV method can be a powerful strategy for observational studies when there are confounders that are hard to measure and there is a “random” encouragement to receive treatment. • When encouragement is not actually random, it is important to do a sensitivity analysis. • Strong IVs are much less sensitive to bias. • When the exclusion restriction might be violated, developed extended IV methods that use X*R as IVs.
Papers • Small, D.S. and Rosenbaum, P.R. (2008), “War and Wages: The Strength of Instrumental Variables and Their Sensitivity to Unobserved Biases,” Journal of the American Statistical Association, 103, 924-933. • Joffe, M. M., Small, D.S., Brunelli, S., Ten Have, T.R., and Feldman, H. I. (2008), "Extended Instrumental Variables Estimation for Overall Effects," International Journal of Biostatistics, 4. • Baiocchi, M., Small, D.S., Lorch, S.A. and Rosenbaum, P.R. (2010), “Building a Stronger Instrument in an Observational Study of Perinatal Care for Premature Infants,” Journal of the American Statistical Association, 105, 1285-1296 • e-mail: dsmall@wharton.upenn.edu
Alternative estimands • Assumed that interested in overall effect • Vascular Access (VA) inevitably affects Dose of Dialysis (DD) • Type of VA limits possible dose • However, may be possible to alter DD • Interested in • Effect of DD • Effect of VA if affects DD in different fashion from under current practice
Alternative estimands (cont’d) • Show altered effect, new intervention on DAG • Formulate in terms of potential outcomes • Contrast for different levels of treatment
Alternative estimands (cont’d) • Defining intervention on S • Individualize target levels of S • e.g., base on maximum tolerated DD • Insufficient information in established databases (e.g, DOPPS) • Set target level of S based on A, covariates X • Currently little information to set target levels • Available covariate information may be insufficient to determine whether particular DD feasible for individual
Alternative estimands (cont’d) • Defining intervention on S • Speculate about feasible interventions on S at aggregate level • Consider effects of A on S under those interventions; i.e., propose value for ΦA* • Compute overall effect from component effects: ψA+ψSΦA* • Perform sensitivity analysis for values of ΦA*
One-step approach • Estimator of effect of A on S does not require either standard ignorability or IV • Can we do same for overall effect of A on Y? • Remove S from graph, redraw diagram • Graph identical to original graph removing Y • Use same methods of estimation for effect of A on S R*X R*X