1 / 151

DAGs intro with exercises 8h (reordered 4.5+3.5) DAG=Directed Acyclic Graph

DAGs intro with exercises 8h (reordered 4.5+3.5) DAG=Directed Acyclic Graph. Hein Stigum http://folk.uio.no/heins/ courses. Nov-15. Nov-15. Nov-15. Nov-15. Nov-15. H.S. H.S. H.S. H.S. 1. 1. 1. 1. 1. Agenda. Background DAG concepts Causal thinking, Paths Analyzing DAGs

Thomas
Download Presentation

DAGs intro with exercises 8h (reordered 4.5+3.5) DAG=Directed Acyclic Graph

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DAGs intro with exercises 8h(reordered 4.5+3.5)DAG=Directed Acyclic Graph Hein Stigum http://folk.uio.no/heins/ courses Nov-15 Nov-15 Nov-15 Nov-15 Nov-15 H.S. H.S. H.S. H.S. H.S. 1 1 1 1 1

  2. Agenda • Background • DAG concepts • Causal thinking, Paths • Analyzing DAGs • Examples • DAGs and stat/epi phenomena • Mediation, Matching, Mendelian randomization, Selection • More on DAGs • Limitations, problems Exercises Nov-15 Nov-15 Nov-15 Nov-15 Nov-15 H.S. H.S. H.S. H.S. H.S. 2 2 2 2 2

  3. Background • Potential outcomes: Neyman, 1923 • Causal path diagrams: Wright, 1920 • Causal DAGs: Pearl, 2000 H.S.

  4. Why causal graphs? • Estimate effect of exposure on disease* • Problem • Association measures are biased • Causal graphs help: • Understanding • Confounding, mediation, selection bias • Analysis • Adjust or not • Discussion • Precise statement of prior assumptions *DAGs not applicable for prediction models H.S.

  5. Causal versus casual CONCEPTS (Rothman et al. 2008; Veieroed et al. 2012 H.S.

  6. god-DAG Causal Graph: Node = variable Arrow = cause E=exposure, D=disease DAG=Directed Acyclic Graph Read of the DAG: Causality = arrow Association = path Independency = no path Estimations: E-D association has two parts: ED causal effect keep open ECUD bias try to close Conditioning (Adjusting): E[C]UD Time  Nov-15 Nov-15 Nov-15 H.S. H.S. H.S. 6 6 6

  7. Association and Cause 3 possible causal structure Association (reverse cause) E D + more complicated structures Nov-15 Nov-15 H.S. H.S. H.S. 7 7

  8. A confounder induces an association between its effects Conditioning on a confounder removes the association Condition = (restrict, stratify, adjust) Paths Simplest form Causal confounding, (exception: see outcome dependent selection) Confounder idea + A common cause Adjust for smoking Smoking Smoking + + + + Yellow fingers Lung cancer Yellow fingers Lung cancer Nov-15 Nov-15 Nov-15 Nov-15 H.S. H.S. H.S. H.S. 8 8 8 8

  9. Conditioning on a collider induces an association between its causes “And” and “or” selection leads to different bias Paths Simplest form Collider idea • or • + and Two causes for selection to study Selected subjects Selected Selected + + + + Yellow fingers Lung cancer Yellow fingers Lung cancer Nov-15 Nov-15 Nov-15 Nov-15 H.S. H.S. H.S. H.S. 9 9 9 9

  10. Mediator • Have found a cause (E) M effect indirect • How does it work? • Mediator (M) • Paths direct effect D E Use ordinary regression methods if: linear model and no E-M interaction. Otherwise,need new methods H.S.

  11. Causal thinking in analyses H.S.

  12. Regression before DAGs Use statistical criteria for variable selection Risk factors for D: Report all variables in the model as equals Association Both can be misleading! C E D H.S.

  13. Statistical criteria for variable selection - Want the effect of E on D(E precedes D) - Observe the two associations C-E and C-D - Assume statistical criteria dictates adjusting for C (likelihood ratio, Akaike (赤池 弘次) or 10% change in estimate) C E D The undirected graph above is compatible with three DAGs: C C C E D E D E D Confounder 1. Adjust Mediator 2. Direct: adjust 3. Total: not adjust Collider 4. Not adjust Conclusion: The data driven method is correct in 2 out of 4 situations Need information from outside the data to do a proper analysis DAGs variable selection: close all non-causal paths H.S.

  14. Reporting variable as equals: Association versus causation Use statistical criteria for variable selection Risk factors for D: Report all variables in the model as equals Association Causation Base adjustments on a DAG C C Report only the E-effect or use different models for each variable E D E D Symmetrical Directional C is a confounder for E-D C is a confounder for ED E is a confounder for C-D E is a mediator for CD Westreich& Greenland 2013 H.S.

  15. Exercise: report variables as equals? Risk factors for Fractures Interpret as effect of: Diabetes adjusted for all other vars. Phy. act. adjusted for all other vars. O B P D E Obesity adjusted for all other vars. fractures diabetes 2 physical activity bone density obesity Bone d. adjusted for all other vars. P is a confounder for E→D, but is E a confounder for P→D? Which effects are reported correctly in the table? 5 min H.S.

  16. Exercise: Stratify or not Want the effect of action(A, exposure/treatment) on disease (D). Have stratified on C. Make a guess at the population effect of A on D Calculate the population effect of A on D What is the correct analysis (and RR)? OBS several answers possible! Population = crude Stratified = adjusted for C C A D 10 min Hernan et al. 2011 H.S.

  17. Summing up so far Nov-15 Associations visible in data. Causation from outside the data. Data driven analyses do not work. Need causal information from outside the data. Reporting table of adjusted associations is misleading. Simpson’s paradox: causal information resolves the paradox. Nov-15 H.S. H.S. H.S. 17 17

  18. Paths The Path of the Righteous Nov-15 Nov-15 H.S. H.S. H.S. 18 18

  19. Path definitions Path: any trail from E to D(without repeating itself) Type: causal, non-causal State: open, closed Four paths: Goal: Keep causal paths of interest open Close all non-causal paths Nov-15 H.S. H.S. 19

  20. Four rules 1. Causal path: ED (all arrows in the same direction) otherwise non-causal Before conditioning: 2. Closed path: K (closed at a collider, otherwise open) Conditioning on: 3. a non-collider closes: [M] or [C] 4. a collider opens: [K] (or a descendant of a collider) H.S.

  21. ANALYZING DAGs H.S.

  22. Confounding examples Nov-15 Nov-15 Nov-15 Nov-15 Nov-15 H.S. H.S. H.S. H.S. H.S. 22 22 22 22 22

  23. Vitamin and birth defects Is there a bias in the crude E-D effect? Should we adjust for C? What happens if age also has a direct effect on D? Bias This is an example of confounding Question: Is U a confounder? No bias Nov-15 Nov-15 Nov-15 H.S. H.S. H.S. 23 23 23

  24. Exercise: Physical activity and Coronary Heart Disease (CHD) • We want the total effect of Physical Activity on CHD. • Write down the paths. • Are they causal/non-causal, open/closed? • What should we adjust for? 5 minutes Nov-15 Nov-15 H.S. H.S. 24 24

  25. Intermediate variables Direct and indirect effects Nov-15 Nov-15 Nov-15 Nov-15 Nov-15 H.S. H.S. H.S. H.S. H.S. 25 25 25 25 25

  26. Exercise: Tea and depression • Write down the paths. • You want the total effect of tea on depression. What would you adjust for? • You want the direct effect of tea on depression. What would you adjust for? • Is caffeine an intermediate variable or a variable on a confounder path? 10 minutes Hintikka et al. 2005 H.S.

  27. Exercise: Statin and CHD D U C E lifestyle CHD statin cholesterol 10 minutes Nov-15 • Write down the paths. • You want the total effect of statin on CHD. What would you adjust for? • If lifestyle is unmeasured, can we estimate the direct effect of statin on CHD (not mediated through cholesterol)? • Is cholesterol an intermediate variable or a collider? H.S. H.S. 27

  28. Direct and indirect effects So far: Controlled (in)direct effect Causal interpretation: linear model and no E-M interaction New concept: Natural (in)direct effect Causal interpretation also for: non-linear model ,E-M interaction Controlled effect = Natural effect if linear model andno E-M interaction Hafemanand Schwartz 2009; Lange and Hansen 2011; Pearl 2012; Robins and Greenland 1992; VanderWeele2009, 2014 H.S.

  29. Mixed Confounder, collider and mediator Nov-15 Nov-15 Nov-15 Nov-15 Nov-15 H.S. H.S. H.S. H.S. H.S. 29 29 29 29 29

  30. Diabetes and Fractures We want the total effect of Diabetes (type 2) on fractures Questions: Paths ←→? More paths? B a collider? V and P ind? Mediators Confounders H.S.

  31. Drawing DAGs Nov-15 Nov-15 Nov-15 H.S. H.S. H.S. 31 31 31

  32. Technical note on drawing DAGs • Drawing tools in Word (Add>Figure) • Use Dia • Use DAGitty • Hand-drawn figure. H.S.

  33. Direction of arrow C Smoking Does physical activity reduce smoking, or does smoking reduce physical activity? ? E Phys. Act. D Diabetes 2 H Health con. C Smoking Maybe an other variable (health consciousness) is causing both? E Phys. Act. D Diabetes 2 H.S.

  34. Time C Smoking Does physical activity reduce smoking, or does smoking reduce physical activity? ? E Phys. Act. D Diabetes 2 C Smoking -5 Smoking measured 5 years ago Physical activity measured 1 year ago E Phys. Act. -1 D Diabetes 2 Nov-15 H.S. H.S. 34

  35. Drawing a causal DAG Start: E and D 1 exposure, 1 disease add: [S] variables conditioned by design add: C-s all common causes of 2 or more variables in the DAG C C must be included common cause V may be excluded exogenous M may be excluded mediator K may be excluded unless we condition V E D M K H.S.

  36. Exercise: Drawing survivor bias 10 minutes • We what to study the effect of exposure early in life (E) on disease (D) later in life. • Exposure (E) decreases survival (S) in the period before D (deaths from other causes than D). • A risk factor (R) reduces survival (S) in the period before D. • The risk factor (R) increases disease (D). • Only survivors are available for analysis (look at Collider idea). Draw and analyze the DAG Nov-15 H.S. H.S. 36

  37. Real world examples Nov-15 Nov-15 Nov-15 H.S. H.S. H.S. 37 37 37

  38. Endurance training and Atrial fibrillation Tobacco Alcohol consumption Cardiovascular factors * Socioeconomic Status ** BMI Diabetes Atrial fibrillation Endurance training Genetic disposition Hyperhyreosis Health *** consciousness Height Gender Age Long-distance racing *Hypertension, heart disease, high cholesterol ** Socioeconomic status: Education, marital status *** Unmeasured factors (Blue: Mediators, red: confounders, violet: colliders) Several arrows missing! Myrstad et al. 2014b

  39. Methods to remove confounding H.S.

  40. Methods to remove confounding Method Action DAG effect C Condition: Restrict, Stratify, Adjust Close path E D C Cohort matching, Propensity Score InverseProbabilityTreatmentWeighting Remove CE arrow E D C Case-Control matching? Other methods? Remove CD arrow E D H.S.

  41. Matching:Cohort vs Case-Control H.S.

  42. Matching in cohort, binary E Matching: For every exposed person with a value of C, find an unexposed person with the same value of C C S E D • selected based on E and C • E independent of C after matching  All open paths between C and E must C C→E C→[S]←E sum to “null” E D Unfaithful DAG Cohort matching removes confounding Cohort matching is not common, except in propensity score matching Mansourniaet al. 2013; Shahar and Shahar2012 H.S.

  43. Matching in Case Control, binary D Matching: For every case with a value of C, find a control with the same value of C C S E D • selected based on D and C • D independent of C after matching  All open paths between C and D must C S sum to “null” E D Case-Control matching does not removes confounding, unless E→D=0 (or C→E=0)  must adjust for C in all analyses Case-Control matching common, may improve precision (Mansournia et al. 2013; Shahar and Shahar2012 H.S.

  44. Inverse probability weighting H.S.

  45. Marcov decomposition • DAG implies: Joint probability can be factorized into the product of conditional distributions of each variable given its parents C E D Pearl 2000 H.S.

  46. Inverse Probability Weighting C Have observed data distribution: Factorization: E D C Want the RCT distribution: Factorization: E D Can reweight the observed data with weights: to obtain the RCT distribution IPW knocks out arrows in the DAG H.S.

  47. Odds of Treatment weighting (OT) Previous weighting targeted the Average Causal Effect Want now the effect among the exposed (treated) instead • is the probability of treatment Can reweight the observed data with weights: to obtain the RCT distribution among the exposed • =1 if E=1 • = • =OT if E=0 McCaffrey et al. 2004 H.S.

  48. Marginal Structural Model C DAG for the reweighted pseudo data E D MSM: The expected value of a counterfactual outcome D under a hypothetical exposure e: effect Veieroed, Lydersen et al. 2012. Daniel, Cousens et al. 2013. Rothman, Greenland et al. 2008 HS

  49. Outcome versus exposure modeling H.S.

  50. Methods to remove confounding How can we remove confounding? C B A What variables should be involved? C- A- B- • Close ECD by conditioning • (ordinary) regression model yes no E D maybe outcome modeling E(D| E,C) • Remove the EC arrow, binary E • Propensity score matching • Inverse probability weighting exposure modeling P(E|C) Combine exposure and outcome modeling: doubly robust models

More Related