480 likes | 494 Views
Learn about Instrumental Variables (IV) analysis for estimating causal effects, including leveraging pseudo-randomization, identifying unmeasured confounding, and utilizing natural experiments. Understand the assumptions, estimation methods, and implications of IV analysis in observational studies.
E N D
IV Analysis Stefan Walter Dept of Epidemiology and Biostatistics UCSF swalter@psg.ucsf.edu
U Causality from IV analysis X Y IV methods can consistentlyestimatetheaverage causal effect of anexposureonanoutcomeeven in thepresence of unmeasuredconfounding! Instrumental Variable estimation uses theunconfoundedcomponent of thevariance (whichisdeterminedbytheinstrument Z) in theexposure X to estimatetheeffect of X onoutcome Y. Itestimatestheeffect of treatmentamongthosewhoreceivetreatmentbecause of theinstrument. … iftheinstrumentisvalid …
IV Analyses: leverage (pseudo-)randomization Use pseudo-randomization as an instrument to estimate the effect of a phenotype on the outcome. Example instruments: • Randomization in an RCT • Before/after policy change, e.g., labeling rules, pharmacy rules, especially if not implemented universally • Physician preference • Distance to service provider • Any characteristic that makes patient ineligible for treatment but does not otherwise affect outcome Natural experiment (Z) Phenotype (X) Disease (Y) Unmeasured Confounders
Instrumental Variable Analysis • Causal diagram representing the assumptions for genetic IV analyses to estimate the effect of BMI on anxiety. The causal diagram follows the rules for directed acyclic graphs (DAG) • 1) the genotype affects BMI; • 2) the genetic instrumental variables do not influence the outcome except via BMI; and • 3) there are no common causes of genotype and cognition. UnmeasuredConfounder Cognition BMI Gene
Link to RCTs U Z – Randomization X – Treatment U – Unmeasured Confounding / Selection Y – Outcome X Y Z We value RCTs so much because we are relatively confident that randomization fulfills the assumptions for a valid IV.
Estimation: Many options Relation Instrument Outcome (ITT) βIV = = Relation Instrument Treatment(Adherence)
2 Stage Least Squares (2SLS) • Calculatethepredictedvalues of theexposure. (ZX) eg linear regression of BMI on IV 1st Stage: E(X|Z) = phênotype=b0+b1Z+ bkOther Covariates • Use thepredictedvalue to explaintheoutcome. (X(Z)Y) eg linear regression of cognitiononpredictedBMI 2nd Stage: E(Y|E(X|Z))=g0+g1 E(X|Z) + gkOther Covariates g1 isthe IV2SLSestimate (Local Average Treatment Effect - LATE) (Angrist, Imbens, Rubin, 1993,p.19) swalter@psg.ucsf.edu
Control Function Approach (Tchetgen Tchetgen and Vansteelandt, 2013) swalter@psg.ucsf.edu
Practice Session with simulated data • GeneratetheUniverse: z1<-sample(c(0,0,0,0,0,1,1,1,2,2),10000, replace=T) z2<-sample(c(0,0,0,0,0,1,1,1,2,2),10000, replace=T) z3<-sample(c(0,0,0,0,0,1,1,1,2,2),10000, replace=T) e1<-rnorm(10000, sd=3) e2<-rnorm(10000, sd=3) U<-rnorm(10000, sd=1) A<-27+0.6*z1+0.3*z2+0.1*z3+2*U+e1 Y<-30+1.5*A+4*U+e2 swalter@psg.ucsf.edu
2SLS Y<-30+1.5*A+4*U+e2 summary(lm(A~z1)) #beta = 0.6 summary(lm(A~z2)) #beta = 0.3 summary(lm(A~z3)) #beta = 0.1 summary(lm(A~z1+z2+z3)) summary(lm(Y~A)) #beta = 2.2 #twostep pred1<-predict(lm(A~z1)) summary(lm(Y~pred1)) #beta = 1.5 #Control Function IV res1<-summary(lm(A~z1))$residual summary(lm(Y~A+res1)) #beta = 1.5 swalter@psg.ucsf.edu
2 Stage Least Squares (2SLS): in SAS, Stata, R • Procsyslin • Ivreg2 • tsls (Local Average Treatment Effect - LATE) (Angrist, Imbens, Rubin, 1993,p.19) swalter@psg.ucsf.edu
IV assumptions • Assumption IV.1: • Z and exposure A are associated • Z has a causal effect on A • Z and A share common causes • Assumption IV.2: • Z affects the outcome Y only through A. • no direct effect of Z on Y (“exclusion restriction”) • Assumption IV.3: • Z does not share common causes with the outcome Y, orallcommon causes controlled • no confounding for the effect of Z on Y . • Assumption IV.4: Most popular option: There are no defiers • This assumption is sometime described as a “monotonicity assumption” • no individual in the population who would be exposed, i.e. A = 1 under Z = 0, but would be unexposedunder Z = 1. • In an RCT, this would be a person who would do the exact opposite of what he/she is told to. swalter@psg.ucsf.edu
The IV estimate is not necessarily the population average causal effect. Whose Causal Effect is it? Classify people based on their treatment under either value of the IV/randomly assigned treatment. What the person will do if assigned to Experimental Treatment B: Take B Take A What the person will do if assigned to Control Treatment A: Take A Take B
Whose Causal Effect? What the person will do if assigned to experimental treatment: Classify people based on their treatment under either value of the IV/randomly assigned treatment. Never-takers do not contribute to any outcome differences between the IV=0 and IV=1 group. Take experimental Take control Take control What the person will do if assigned to control treatment: Always-takers do not contribute to any outcome differences between the IV=0 and IV=1 group. Take experimental
Overview • IV analysis with outcome • IV analysis in case-control studies • IV analysis with survival outcomes • IV analysis in R swalter@psg.ucsf.edu
IV Analysis with binary outcome • Traditionally: use 2SLS with a linear probability model • Problem: no restriction on the space of a valid probability (0<=P<=1) • … might not be a problem when using genetic variants as instruments given that they explain so Little that the estimate will hardly ever be out off bounds • Solution: use a link function: log, logit, probit swalter@psg.ucsf.edu
IV Analysis with binary outcome IV Analysis with logit link swalter@psg.ucsf.edu
IV for survival outcomeIV for Cox Proportional Hazards Model swalter@psg.ucsf.edu
Two Sample IV designs • Using published data only: • The effect of BMI on Late Onset Alzheimer´s Disease • The effect of Type 2 Diabetes on Late Onset Alzheimer´s Disease
Inverse Variance Weighted IV of separate samples (Burgess et al. 2013) Burgess, Stephen, Adam Butterworth, and Simon G. Thompson. "Mendelian randomization analysis with multiple genetic variants using summarized data." Genetic epidemiology 37.7 (2013): 658-665. Geneticvariantk, k = 1, . . . , K is associated with an observed Xkmean change in the risk factor per additional variant allele with standard error σXkand an observed Ykmean change in the outcome per allele with standard error σYk swalter@psg.ucsf.edu
Inverse Variance Weighted IV: Effect of BMI on Dementia swalter@psg.ucsf.edu
The Model Dementia Related Phenotypes
BMI on Dementia Mukherjee, Shubhabrata, et al. "Genetically predicted body mass index and Alzheimer's disease–related phenotypes in three large samples: Mendelian randomization analyses." Alzheimer's & Dementia 11.12 (2015): 1439-1451.
Split Sample IV mrozb<-as.data.frame(cbind(Y,A,U, z1,z2,z3)) pred1<-predict(lm(A~z1)) summary(lm(Y~pred1)) #beta = 1.5 mrozvs<-mrozb[sample(1:10000, 500),] a<-coef(lm(A~z1+z2+z3, data=mrozb)) mrozvs$GRS<-apply(sweep(mrozvs[c("z1", "z2","z3")],MARGIN=2,c(a[2:4]),`*`),1,function(x) sum(x, na.rm=T)) summary(lm(Y~GRS, data=mrozvs)) #beta = 0.77 swalter@psg.ucsf.edu
Inverse Variance Weighted IV (external data) ### BurgessApproach coeftest(lm(A~z1)) 0.516322 0.046800 coeftest(lm(A~z2)) 0.342791 0.046807 coeftest(lm(Y~z1)) 0.84012 0.11495 coeftest(lm(Y~z2)) 0.66891 0.11469 Xk<-c(0.516,0.343) Xkse<-c(0.047,0.047) Yk<-c(0.840,0.669) Ykse<-c(0.115,0.115) sum(Xk*Yk*Ykse^-2)/sum(Xk^2*Ykse^-2) #InverseVarianceWeighted IV (1/sum(Xk^2*Ykse^-2))^0.5 swalter@psg.ucsf.edu
Inverse Variance Weighted IV (external data) library(meta) sum(Xk[1]*Yk[1]*Ykse[1]^-2)/sum(Xk[1]^2*Ykse[1]^-2) #1.626 (1/sum(Xk[1]^2*Ykse[1]^-2))^0.5 #0.223 sum(Xk[2]*Yk[2]*Ykse[2]^-2)/sum(Xk[2]^2*Ykse[2]^-2) #1.950 (1/sum(Xk[2]^2*Ykse[2]^-2))^0.5 #0.0.335 metagen(c(1.626,1.950),c(0.223, 0.335)) #identical --> no heterogeneity, Instrument OK swalter@psg.ucsf.edu
X Y Z U Doubting Instruments: major biases similar to critiques of RCTs • Do they have other pathways to the outcome? • Unblinded trials: controls become demoralized • Is there a common cause of the instrument and the outcome? • Trials: unfair random assignment • Do they actually affect anyone’s exposure? • Trials: nobody adheres to assignment U2 X Y Z G X Y U U X Y Z U
Evaluating the assumptions • Constraints implied by theory • Over-identification tests • Stratification-based tests (similar to over-identification) • IV inequality constraints • Negative controls • Independent from known confounders • Egger tests
The End …. •
What to do with a binary exposure? • In genetic IV, convert the binary exposure to the probability scale by reweighting the predicted probability from a first stage model swalter@psg.ucsf.edu
Two Stage Least Squares swalter@psg.ucsf.edu
IV Analysis with binary outcome IV Analysis with log link swalter@psg.ucsf.edu
IV Analysis with binary outcome IV Analysis with log link swalter@psg.ucsf.edu
IV Analysis with binary outcome IV Analysis with logit link swalter@psg.ucsf.edu
IV Analysis with binary outcome swalter@psg.ucsf.edu
IV for survival outcomeIV for Cox Proportional Hazards Model swalter@psg.ucsf.edu
IV for survival outcomeIV for Cox Proportional Hazards Model swalter@psg.ucsf.edu
IV for survival outcomeIV for Cox Proportional Hazards Model swalter@psg.ucsf.edu
IV for survival outcomeIV for Aalen Additive Hazards Models swalter@psg.ucsf.edu
IV for survival outcomeIV for Aalen Additive Hazards Models swalter@psg.ucsf.edu
IV for survival outcomeIV for Aalen Additive Hazards Models swalter@psg.ucsf.edu
IV for survival outcomeIV for Aalen Additive Hazards Models swalter@psg.ucsf.edu
Practice Session with simulated data • GeneratetheUniverse: z1<-sample(c(0,0,0,0,0,1,1,1,2,2),10000, replace=T) z2<-sample(c(0,0,0,0,0,1,1,1,2,2),10000, replace=T) z3<-sample(c(0,0,0,0,0,1,1,1,2,2),10000, replace=T) e1<-rnorm(10000, sd=3) e2<-rnorm(10000, sd=3) U<-rnorm(10000, sd=1) A<-27+0.6*z1+0.3*z2+0.1*z3+2*U+e1 Y<-30+1.5*A+4*U+e2 swalter@psg.ucsf.edu
2SLS summary(lm(A~z1)) #beta = 0.6 summary(lm(A~z2)) #beta = 0.3 summary(lm(A~z3)) #beta = 0.1 summary(lm(A~z1+z2+z3)) #beta = 0.7 summary(lm(Y~A)) #beta = 2.2 #twostep pred1<-predict(lm(A~z1)) summary(lm(Y~pred1)) #beta = 1.5 #Control Function IV res1<-summary(lm(A~z1))$residual summary(lm(Y~A+res1)) #beta = 1.5 swalter@psg.ucsf.edu
2SLS #ivreg 2 script http://diffuseprior.wordpress.com/tag/over-identification/ #ivreg2(form,endog,iv,data,digits) mroz<-as.data.frame(cbind(Y,A,z1,z2,z3)) ivreg2(form=Y ~ A ,endog="A",iv=c("z1","z2","z3"),data=mroz) mrozs$res1<-summary(lm(A~z1,data=mrozb))$residual summary(lm(Y~A+res1, data=mrozs)) #beta = 1.5 swalter@psg.ucsf.edu
With another assumption can say: Whose Causal Effect? Classify people based on their treatment under either value of the IV/randomly assigned treatment. What the person will do if assigned to Experimental Treatment B: Take B Take A What the person will do if assigned to Control Treatment A: Take A Take B
Whose Causal Effect? What the person will do if assigned to experimental treatment: Classify people based on their treatment under either value of the IV/randomly assigned treatment. Never-takers do not contribute to any outcome differences between the IV=0 and IV=1 group. Take experimental Take control Take control What the person will do if assigned to control treatment: Always-takers do not contribute to any outcome differences between the IV=0 and IV=1 group. Take experimental
Egger Regression: treat each IV estimate as an element in a meta-analysis • Under the InSIDE (instrument strength independent of direct effects) assumption, bias converges to zero. • Regress the Z-Y associations on the Z-X associations. The intercept is an estimate of average pleiotropy and the slope is an estimate of the true causal effect under InSIDE. Bowden, Davey Smith, and Burgess, Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression, IJE 2015