1 / 33

Confounding adjustment: Ideas in Action -a case study

Confounding adjustment: Ideas in Action -a case study. Xiaochun Li, Ph.D. Associate Professor Division of Biostatistics Indiana University School of Medicine. Outline. Description of the data set Quantity to be estimated Summary of baseline characteristics Approaches to data analyses

aliya
Download Presentation

Confounding adjustment: Ideas in Action -a case study

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Confounding adjustment: Ideas in Action -a case study Xiaochun Li, Ph.D. Associate Professor Division of Biostatistics Indiana University School of Medicine

  2. Outline • Description of the data set • Quantity to be estimated • Summary of baseline characteristics • Approaches to data analyses • Results • Discussion

  3. Simulation Setup Linder Center datadescribed andanalyzed in Kereiakes et al. (2000) • 6 month follow-up data on 996 patients who • underwent an initial Percutaneous Coronary Intervention (PCI) • were treated with “usual care” alone or usual care plus a relatively expensive blood thinner (IIB/IIIA cascade blocker • has10 variables • Y: 2 outcomes, mort6mo (efficacy) and cardcost (cost) • X: 1 treatment variable, and 7 baseline covariates, stent, height, female, diabetic, acutemi, ejecfrac and ves1proc

  4. Baseline characteristics

  5. The “LSIM10K” dataset Simulation data set was based on the Linder Center data • 17 copies of the clustered Lindner data, with fudge factors added to ejfract and hgt, and some clipping • same correlation among covariates, same clustering patterns • Contains the values of 10 simulated variables for 10,325 hypothetical patients • To simplify analyses, the data contain no missing values. • Details and dataset available from Bob’s website

  6. What do we want to estimate? The population average treatment effect (ATE), i.e., E(Y1) - E(Y0) Y1 and Y0 are conterfactual outcomes In plain words: what if scenarios The expected response if treatment had been assigned to the entire study population minus the expected response if control had been assigned to the entire study population

  7. Baseline covariate balanceassessment

  8. Visualizing overall imbalance Deep blue = high values C T

  9. Analytical Methodsfor confounding adjustment The following methods were applied to lsim10k • Outcome regression adjustment (OR) • Propensity score (PS) stratification • Inverse-probability-treatment-weighted (IPTW) • Doubly robust estimation • Matching by • Mahalonobis distance • PS only

  10. Analysis of mort6mo OR model for mort6mo : treatment indicator (trtm) main effect terms for all seven covariates quadratic terms for both height and ejfract Residual deviance: 2410.4 on 10323 degrees of freedom PS model: saturated model for the five categorical covariates (main effects and interaction terms up to fifth-order) main effects and quadratic terms for height and ejfract

  11. Covariates Balance Evaluations based on PS Quintiles

  12. Stent

  13. Female

  14. Diabetic

  15. Acutemi

  16. Ves1proc

  17. Heightstrata 2 (0.95 cm) and 3 (-1.50cm)

  18. Height • Existence of residual confounding after adjusting for PS quintiles • The within-stratum between-group height difference mean s.d. p • Stratum 2: 0.949 0.44 0.032 • Stratum 3: -1.497 0.43 0.0005

  19. Ejfractstrata 1 (0.81), 2 (-1.32) and 3 (-0.72)

  20. Ejfract • Existence of residual confounding after adjusting for PS quintiles • The within-strata between-group height difference mean s.d. p-value • Stratum 1: 0.812 0.41 0.0475 • Stratum 2: -1.322 0.33 7.38e-5 • Stratum 3: -0.721 0.32 0.025

  21. PS Stratification • Residual confounding within strata • In PS stratification method, height and ejfract are further adjusted stratum specific • Treatment effect • Height, ejfract main effects and their quadratic terms

  22. Results – mort6mo True △=-0.036 Results of all methods are consistent, providing evidence of treatment effectiveness at preventing death at 6 months.

  23. ps model: same as before Analysis of cardcost cardcost model of CA with PS stratification: • stratum specific • Treatment effect • Height, ejfract main effects and their quadratic terms cardcost model: treatment indicator (trtm) main effect terms for all seven covariates quadratic terms for both height and ejfract

  24. Model checking – ORAdjusted R-squared: 0.0386

  25. Model checking – OR (log transformed)Adjusted R-squared: 0.0693

  26. Results – cardcost

  27. Discussion • All methods give consistent results on the 2 outcomes • All PS based results have similar variance except IPTW1 • IPTWs depend on approx. correct PS model • OR depends on approx. correct outcome model • DR is a fortuitous combination of OR and IPTW: depends on one of models being right • Nonparametric models of either models may be an alternative to parametric models

  28. Double Robustness • wrong PS model: adjust for one covariate ‘acutemi’ only • wrong OR model for card cost: adjust for the treatment indicator ‘trtm’ and the ‘acutemi’ covariate By “right”, we mean approximately.

  29. Propensity score estimation • The majority applications in literature use a parametric logistic regression model that assume covariates are linear and additive on the log odds scale • May include selected interactions and polynomial terms • Accurate PS estimation is impeded by • High dimensional covariates – which ones should we de-confound? • Unknown functional form – how do they relate to the treatment selection • PS model misspecification can substantially bias the estimated treatment effect • Nonparametric approach is flexible to accommodate nonlinear/non-additive relationship of covariates to treatment assignment, e.g., trees

  30. Generalized Boosted Models (GBM) to estimate the propensity score function Friedman, 2001; Madigan and Ridgeway, 2004; McCaffrey, Ridgeway, and Morral, 2004 R package: twang Regression tree model to predict cardcost Ripley, 1996; Therneau and Atkinson, 1997 R package: rpart Nonparametric regression techniques

  31. Generalized Boosted Models (GBM) • A multivariatenonparametric regression technique • Sum of a large set of simple regression trees modelling log-odds • gbm finds mle of g(x)=log(p(x)/(1-p(x)), p(x)=P(T=1|x) • Predict treatment assignment from a large number of pretreatment covariates – adaptively choose them • Nonlinear • No need to select variables • Can model complex interactions • Invariant to monotone transformations of x • E.g, same PS estimates whether use age, log(age) or age2 • Outperforms alternative methods in prediction error

  32. Results – cardcostnonparametric approach

  33. Future research • People try quintiles, deciles for propensity score stratification – need data driven approach (based on bias-variance tradeoff) for number of strata • Model selection: PS model, and outcome model • Nonparametric estimation of models may be intuitive, but not clear about the properties of the causal estimates • Nonparametric caveat: still need to define a set of “confounders” based on knowledge of causal relationship among treatment, outcome and covariates rather than conditioning indiscriminatly on all covariates that have associations with treatment and outcome

More Related