150 likes | 285 Views
Methods for Analyzing Data from Global Cohort Collaborations: Causal F rameworks and Efficient Estimators . Maya Petersen, MD PhD works.bepress.com/maya_petersen Divisions of Biostatistics and Epidemiology, School of Public Health, University of California, Berkeley.
E N D
Methods for Analyzing Data from Global Cohort Collaborations: Causal Frameworks and Efficient Estimators Maya Petersen, MD PhD works.bepress.com/maya_petersen Divisions of Biostatistics and Epidemiology, School of Public Health, University of California, Berkeley
Global Cohort Collaborations • Unique opportunity to learn how to optimize HIV care delivery in practice • Real-world settings • Big samples • High quality longitudinal data on lots of variables • We can be ambitious! • Analyses can and should directly target complex policy and clinical questions • Novel methods are needed in order to • Translate complex questions into statistical problems • Provide rigorous answers
Causal Models/Counterfactuals • Tool for translating a research question into a statistical estimation problem • Target analysis at the question you care about • Lots of questions can’t be translated into a coefficient in a regression model • Tool for ensuring that assumptions are • Explicit • Interpretable to those able to evaluate plausibility
Some examples of causal research questions that can be defined using counterfactuals ….
Marginal Structural Models • Specify relationship between an exposure and the expectation of a counterfactual outcome • Ex: How would mortality differ under immediate versus delayed switch to second line therapy following immunologic failure? • Useful for questions about • Cumulative effects of longitudinal exposures/sequential decisions • Casual dose response curves
Dynamic Regimes • Rules for assigning treatment in response to a subject’s observed past • Ex: How would availability of routine HIV RNA testing to guide switch (as compared to CD4s only) affect mortality? • Many key questions involve dynamic regimens • Good medicine and good policy require understanding how best to respond to new data • Helps ensure that our questions are realistic and supported by the data
Direct and Indirect Effects • How much of an exposure’s effect is mediated by a specific pathway? • Ex. How does implementing a task sharing program affect patient outcomes, and how much of this effect is mediated through individual enrollment in the program? • Useful for investigating: • Why an intervention did (or didn’t) work • Unintended consequences/spill over effects
Statistical methods to estimate these counterfactual quantities A growing tool box….
The statistical challenge • Novel statistical methods are needed to provide the best possible answers to these questions • Standard parametric regression not sufficient • Data are complex: Many variables measured at potentially informative intervals over long periods • Need estimators that are • Robust: Avoid introducing bias • Efficient: Maximize precision
Inverse probability weighting • Estimate how exposure depends on the observed past • Use this estimate to reweight the data • Limitations • You have to do a good job estimating your weights • Subject to bias and high variance with strong confounding
Parametric longitudinal G-formula • Estimate everything else about the data-generating process • Use these estimates to set up simulations • Limitations • You have to model essentially the whole data generating process correctly • Inference can be tricky
“Efficient double robust” methods • Minimize bias due to model misspecification • Maximize precision of effect estimates • Targeted Maximum Likelihood Estimation • New results for longitudinal effect estimation • Often reduces both bias and variance • Software coming soon….
Data-adaptive estimation • Which variables to adjust for and how? • Misspecifiedparametric model -> Bias • If build model ad hoc, susceptible to “evaluation pressure” • Need formal tools that can handle these settings • A priori specified algorithms for learning from data • Ex. Super Learner • Library of data adaptive algorithms • Internal cross validation to choose how to combine them
Current Methods Research • Hierarchical data • Interventions at the clinic and individual level • Interference/Spill over • An individual’s outcome is affected by the exposures of other individuals • Estimation and Inference with small sample sizes • For many implementation questions, the clinic rather than the individual may be the independent sampling unit
Acknowledgements • Collaborations with • IeDEA-SA and IeDEA-EA • Mark van der Laan, UC Berkeley • Elvin Geng, UCSF • This talk based on huge body of research • Theoretical work by Robins, Pearl, van der Laan, Hernan, others… • Applied work by many