1 / 28

28 Mai 2019

TraumaMatrix SéMINAIRe – Inférence causale. 28 Mai 2019. Causal inference on observational data. Estimate the effect of tranexamic acid treatment on TBI. Imke Mayer & Teresa Alves de Sousa Statistics and applied maths. Context / Objectives. Approach. Outputs.

eleanorab
Download Presentation

28 Mai 2019

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TraumaMatrix • SéMINAIRe – Inférence causale 28 Mai 2019

  2. Causal inference on observational data Estimate the effect of tranexamic acid treatment on TBI Imke Mayer & Teresa Alves de Sousa Statistics and applied maths Context / Objectives Approach Outputs • Estimate the effect of tranexamic acid (TA) on the in-ICU mortality among patients with traumatic brain injury (TBI), based on the observational database TraumaBase • Goal 1: estimate average treatment effect as difference in percentage points between mortality rates in treatment and control groups • Challenge: Real world data is incomplete and missing values occur almost everywhere • Goal 2: Estimate heterogeneous treatment effects → decision support • Translate causal a priori into a causal graph (confounding, potential mediators, biases) • Develop treatment effect estimator that handles incomplete confounders and leverages informative missingness: based on random forests: handles missing values and mixed data • From average treatment effect to heterogeneous treatment effect: cluster the observations based on similarities or classify the observations them by lesion type and/or severity • Double robust estimation augments propensity score approach: • Use more information related to traumatic brain injury • Robust to model misspecification • No evidence for rejecting null hypothesis of no effect of TA on in-ICU mortality among TBI patients • Heterogeneity: Differentiate w.r.t. pre-treatment characteristics or severity and/or type of lesion

  3. Identify the problem and relevant variables • Causal graph

  4. Causal inference on observational data Estimate the effect of tranexamic acid treatment on TBI Context / Objectives Approach Outputs • Estimate the effect of tranexamic acid (TA) on the in-ICU mortality among patients with traumatic brain injury (TBI), based on the observational database TraumaBase • Goal 1: estimate average treatment effect as difference in percentage points between mortality rates in treatment and control groups • Challenge: Real world data is incomplete and missing values occur almost everywhere • Goal 2: Estimate heterogeneous treatment effects → decision support • Translate causal a priori into a causal graph (confounding, potential mediators, biases) • Develop treatment effect estimator that handles incomplete confounders and leverages informative missingness: based on random forests: handles missing values and mixed data • From average treatment effect to heterogeneous treatment effect: cluster the observations based on similarities or classify the observations them by lesion type and/or severity • Double robust estimation augments propensity score approach: • Use more information related to traumatic brain injury • Robust to model misspecification • No evidence for rejecting null hypothesis of no effect of TA on in-ICU mortality among TBI patients • Heterogeneity: Differentiate w.r.t. pre-treatment characteristics or severity and/or type of lesion

  5. Handle missing values • Percentage of missing values in the selected variables

  6. Missing values in the relevant variables

  7. Causal inference Estimate the effect of a treatment/intervention on a target variable • Complete case analysis: eliminate all incomplete observations • In our study, we have 7487 observations and 40 variables, not a single observation is complete for these 40 variables! • Often the complete case approach leads to biased estimates when the data is not missing completely at random Imputation: complete the observations with plausible values • Attention to correctly estimate the level of uncertainty due to the missing values, use multiple imputation • Can only be used with uninformative missing values, i.e. the fact that a variable is missing does not tell us anything about the missing value (counter-example: rich people tend to keep silent about their revenue) Likelihood-based approaches • Expectation Maximization (Wei Jiang’s and Manuel Pichon’s seminar on hemorrhagic shock prediction) Tree-based estimation exploiting missingness information directly in the models • Random trees and a special encoding of missingness (Missing Incorporated in Attributes, MIA) • Flexible method: can be used on quantitative and categorical data • Generic method: Does not require model specifications such as “logistic propensity model” or “linear target model” • Handling missing values

  8. Causal inference Estimate the effect of a treatment/intervention on a target variable • Missing values can bias the ATE estimation considerably! • Yet, in practice, missing values rarely addressed more closely (before or during the analyses) • Even on small synthetic examples, missing values can screw up the estimations • Very careful handling of missing values in causal inference is required • Need to adjust/extend the main working hypothesis (unconfoundedness) to missing values • Heads up!

  9. Causal inference on observational data Estimate the effect of tranexamic acid treatment on TBI Context / Objectives Approach Outputs • Estimate the effect of tranexamic acid (TA) on the in-ICU mortality among patients with traumatic brain injury (TBI), based on the observational database TraumaBase • Goal 1: estimate average treatment effect as difference in percentage points between mortality rates in treatment and control groups • Challenge: Real world data is incomplete and missing values occur almost everywhere • Goal 2: Estimate heterogeneous treatment effects → decision support • Translate causal a priori into a causal graph (confounding, potential mediators, biases) • Develop treatment effect estimator that handles incomplete confounders and leverages informative missingness: based on random forests: handles missing values and mixed data • From average treatment effect to heterogeneous treatment effect: cluster the observations based on similarities or classify the observations them by lesion type and/or severity • Double robust estimation augments propensity score approach: • Use more information related to traumatic brain injury • Robust to model misspecification • No evidence for rejecting null hypothesis of no effect of TA on in-ICU mortality among TBI patients • Heterogeneity: Differentiate w.r.t. pre-treatment characteristics or severity and/or type of lesion

  10. Causal inference Estimate the effect of a treatment/intervention on a target variable • Estimate the Average Treatment Effect (ATE) on • Experimental data • Treatment and control groups are identical w.r.t. pre-treatment features • Take difference of means of the target variable in both groups: average(Ytreated) – average(Ycontrol) • Observational data • Treatment bias/Confounding: treatment is given conditionally on pre-treatment features • Emulate an experiment by adjusting for confounding Precursor study: Help designing experiments (formulate question, inclusion criteria, etc.)

  11. Causal inference Estimate the effect of a treatment/intervention on a target variable • For an individual i, we are interested in the individual treatment effect. • Target variable Yi. • Two worlds that cannot coexist – only one can be observed: • is the value of the target if the individual gets the treatment . • is the value of the target if the individual does not get the treatment. • Individual treatment effect, , is never observed! • But we can estimate and by taking averages over the individuals. • And then estimate the Average Treatment Effect (ATE). • Average Treatment Effect

  12. Causal inference Estimate the effect of a treatment/intervention on a target variable • Unconfoundedness: we observe enough information to capture the confounding, i.e. we can adjust the bias due to non-random treatment assignment. • Propensity score: probability of receiving treatment (T), given the pre-treatment variables (X) • Inverse propensity score weights (IPW) estimator: reweight observations by the inverse of their probability of being assigned to their group • Adjusts for treatment bias • Makes the two groups comparable/similar on the pre-treatment variables • In our study, X contains information that allow to evaluate the risk of hemorrhagic shock (red nodes on the causal graph) • Propensity scores and inverse-propensity-weighted estimation

  13. Causal inference Estimate the effect of a treatment/intervention on a target variable • Propensity score: probability of receiving treatment (T), given the pre-treatment variables (X) • Estimate the propensity score? • Assume a model and then fit the model on the data, e.g. logistic regression • Propensity scores and inverse-propensity-weighted estimation Treated (T=1) e(X) ê(X) Control (T=0)

  14. Causal inference Estimate the effect of a treatment/intervention on a target variable • Estimate the propensity score? • Assume a model and then fit the model on the data • But, what if the model is not good, i.e. the relationship between T and X is different than assumed? • Then and are bad estimations! • Wrong model/a priori on the relationship between T and X → biased ATE estimation • Propensity scores and inverse-propensity-weighted estimation Treated (T=1) e(X) ê(X) Control (T=0) X

  15. Causal inference Estimate the effect of a treatment/intervention on a target variable • Solution to model mis-specification: construct double robust estimators by using more (a priori) information for the target variable Y. • Model the target Y as a function of confounders X (and other covariates Z). • e.g. linear model: • Theory tells us: if either the propensity scores or the target Y are correctly modelled then we can estimate the ATE without bias and with smaller variance than IPW estimate. • In our study, Z contains predictors of the severity of the TBI (blue nodes in the causal graph) • Double robust estimation

  16. Causal inference Estimate the effect of a treatment/intervention on a target variable Double robust estimation For estimation of propensity scores ê(X) For estimation of target Y

  17. Causal inference Estimate the effect of a treatment/intervention on a target variable • Theory tells us: if either the propensity scores or the target Y are correctly modelled then we can estimate the ATE without bias and with smaller variance than IPW estimate. • Double robust approach allows flexible learning of propensity model and target model using powerful (blackbox) methods (deep networks, random forests, etc.) • WITHOUT HARMING THE INTERPRETABILITY OF THE ESTIMATOR: • Double robust estimation

  18. Causal inference on observational data Estimate the effect of tranexamic acid treatment on TBI Context / Objectives Approach Outputs • Estimate the effect of tranexamic acid (TA) on the in-ICU mortality among patients with traumatic brain injury (TBI), based on the observational database TraumaBase • Goal 1: estimate average treatment effect as difference in percentage points between mortality rates in treatment and control groups • Challenge: Real world data is incomplete and missing values occur almost everywhere • Goal 2: Estimate heterogeneous treatment effects → decision support • Translate causal a priori into a causal graph (confounding, potential mediators, biases) • Develop treatment effect estimator that handles incomplete confounders and leverages informative missingness: based on random forests: handles missing values and mixed data • From average treatment effect to heterogeneous treatment effect: cluster the observations based on similarities or classify the observations them by lesion type and/or severity • Double robust estimation augments propensity score approach: • Use more information related to traumatic brain injury • Robust to model misspecification • No evidence for rejecting null hypothesis of no effect of TA on in-ICU mortality among TBI patients • Heterogeneity: Differentiate w.r.t. pre-treatment characteristics or severity and/or type of lesion

  19. Causal inference on the effect of tranexamic acid treatment on TBI • Preliminary results for the ATE ← imputation ← likelihood ← random forest ← imputation ← imputation Double robust Inverse-propensity weighting Difference in % points between mortality rate in treated and in control group

  20. R-miss-tastic: more details and examples for analyses with missing values • https://rmisstastic.netlify.com • Theoreticalandpracticaltutorials • Populardatasets • Bibliography • Workflows (in R) Image source: etsy.com

  21. References • Hernán, M. A. and Robins, J. M. (2019). CausalInference. Chapman & Hall/CRC. • Imbens, G. W. and Rubin, D. B. (2015). Causalinference in statistics, social, andbiomedicalsciences. Cambridge University Press. • Lederer, D. J., Bell, S. C., Branson, R. D., Chalmers, J. D., Marshall, R., Maslove, D. M., Ost, D. E., Punjabi, N. M., Schatz, M., Smyth, A. R., et al. (2019). Control ofconfoundingandreportingofresults in causalinferencestudies. guidanceforauthorsfromeditorsofrespiratory, sleep, andcritical care journals. Annalsofthe American Thoracic Society, 16(1):22–28. • Textor, J., Hardt, J., and Knüppel, S. (2011). Dagitty: a graphicaltoolforanalyzingcausaldiagrams. Epidemiology, 22(5):745.

  22. Causal inference Estimate the effect of a treatment/intervention on a target variable • Unconfoundedness: we observe enough information to capture the confounding, i.e. we can adjust the bias due to non-random treatment assignment. • Propensity score: probability of receiving treatment (T), given the pre-treatment variables (X) • Inverse propensity score weights (IPW) estimator: reweight observations by the inverse of their probability of being assigned to their group • In our study, X contains information that allow to evaluate the risk of hemorrhagic shock (red nodes on the causal graph) • Propensity scores and inverse-propensity-weighted estimation

  23. Causal inference Estimate the effect of a treatment/intervention on a target variable • Solution to model mis-specification: construct double robust estimators by using more (a priori) information for the target variable Y. • Model the target Y as a function of confounders X (and other covariates Z). • e.g. linear model: • Theory tells us: if either the propensity scores or the target Y are correctly modelled then we can estimate the ATE without bias. • In our study, Z contains predictors of the severity of the TBI (blue nodes in the causal graph) • Double robust estimation

  24. Causal inference on observational data Estimate the effect of tranexamic acid treatment on TBI Context / Objectives Approach Outputs • Estimate the effect of tranexamic acid (TA) on the in-ICU mortality among patients with traumatic brain injury (TBI), based on the observational database TraumaBase • Goal 1: estimate average treatment effect as difference in percentage points between mortality rates in treatment and control groups • Challenge: Real world data is incomplete and missing values occur almost everywhere • Goal 2: Estimate heterogeneous treatment effects → decision support • Translate causal a priori into a causal graph (confounding, potential mediators, biases) • Develop treatment effect estimator that handles incomplete confounders and leverages informative missingness: based on random forests: handles missing values and mixed data • From average treatment effect to heterogeneous treatment effect: cluster the observations based on similarities or classify the observations them by lesion type and/or severity • Double robust estimation augments propensity score approach: • Use more information related to traumatic brain injury • Robust to model misspecification • No evidence for rejecting null hypothesis of no effect of TA on in-ICU mortality among TBI patients • Heterogeneity: Differentiate w.r.t. pre-treatment characteristics or severity or type of lesion

  25. Heterogeneity • Method: Use the Hierarchical Clustering technique to identify groups of patients • It performs a classical clustering operation… • 1st step - Grouping patients by their characteristics

  26. Heterogeneity • … iteratively, dividing each cluster in two smaller clusters in each iteration: • 1st step - Grouping patients by their characteristics

  27. Heterogeneity • Preliminary Results: patients divided in 3 groups, with nearly 0% deaths by TBI in 1st group, 30% in 2nd group and 45% of in 3rd group • 1st step - Grouping patients by their characteristics

  28. Heterogeneity • Besides type of TBI, it is the level of lactates, blood pressure, AIS externe and hemorrhagic choc that mostly drive allocation to the 3rd group • 1st step - Grouping patients by their characteristics

More Related