1 / 27

Causal Inference因果推论

Causal Inference因果推论 Of Intermediate 中级 Phenotypes 表型 and Biomarkers 生物标记 in Rheumatoid Arthritis 风湿性关节炎 [An Application of Machine Learning 机器学习 Techniques to Genetic Epidemiology 遗传流行病学] Wentian Li 李问天 , Ph.D Feinstein Institute for Medical Research Genetic Association

Download Presentation

Causal Inference因果推论

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Causal Inference因果推论 Of Intermediate 中级 Phenotypes表型and Biomarkers 生物标记 in Rheumatoid Arthritis 风湿性关节炎 [An Application of Machine Learning 机器学习 Techniques to Genetic Epidemiology 遗传流行病学] Wentian Li 李问天, Ph.D Feinstein Institute for Medical Research Wentian Li, North Shore LIJ Health System

  2. Genetic Association • Association 相关 is not equivalent to causal 因果的 relationship • Wrinkle-Cancer risk association does not mean one causes 导致 another • Age is a confounding factor 混杂因素 Wentian Li, North Shore LIJ Health System

  3. When do we need to know cause and effect? • Rarely discussed in genetic analysis because genotype is always the cause 原因, and phenotype is always the effect 效果 • In epidemiology 流行病学 factor 因素-disease 疾病 association can belong to three situations (1) factor is a cause; (2) reverse causality; (3) a third confounding factor • For two intermediate phenotypes (biomarkers), causal arrow can point either way Wentian Li, North Shore LIJ Health System

  4. Causal Inference in Machine Learning • Large text database (e.g. google) • Observational data (no controlled experiment, and no other approaches to determine causality) • Two-point association indeed cannot be used to claim causality • The key is a third variable, as well as conditional 条件的 association based on the third variable Wentian Li, North Shore LIJ Health System

  5. Wentian Li, North Shore LIJ Health System

  6. Wentian Li, North Shore LIJ Health System

  7. Data Mining and Knowledge Discovery (2000) v4, pp.163-192 Wentian Li, North Shore LIJ Health System

  8. An Example Wentian Li, North Shore LIJ Health System

  9. Cooper’s Local Causality Discovery (LCD) Rule • Six assumptions: 1.database completeness. 2. discrete variables. 3. Bayesian network model (directed acyclic 非环式的 graph: no loops). 4…. 5. no selection bias. 6. valid statistical testing. • Three variables: x,y,z • Hidden 潜在的 variable is allowed (but not in the dataset) • Determine three correlations: unconditional C(x,y), C(y,z) and conditional C(x,z|y) Wentian Li, North Shore LIJ Health System

  10. Between two variables, there are only 6(4) causal relationships (allowing confounding variable) confounding no relationship confounding+causing causing NO NO confounding plus rev causing Reverse causing Wentian Li, North Shore LIJ Health System

  11. Number of causal relationships among three variables • 6x6x6=216 possibilities • 4x4x6=96 if x is not caused by either y or z (but can receive an arrow from a hidden variable) [Cooper’97 paper] • 2x2x6=24 if x doesn’t even receive an arrow from hidden confounding variables [Li and Wang, unpublished] Wentian Li, North Shore LIJ Health System

  12. Given a causal model… • Unconditional 无条件 association between any two variables can be determined by whether they are connected by a path • Conditional 条件的 association can be determined by the so-called “d-separation” rule Wentian Li, North Shore LIJ Health System

  13. “CCC” causal inference rule (Cooper version) if C(x,y)+, C(y,z)+, but C(x,z|y)-, then there are only three possible causal models: x => y => z x <= h => y => z h =>x => y =>z (Silverstein et al. version) if C(x,y)+, C(y,z)+, C(x,z)+, but C(x,z|y)-, C(x,y|y)+, C(y,z|x)+, then... Wentian Li, North Shore LIJ Health System

  14. In a three-way correlated set If one of the variable (x) is not an effect (only a cause) AND If correlation is lost between x and z conditionally, THEN y causes z x: gene y,z: two intermediate phenotypes Wentian Li, North Shore LIJ Health System

  15. The use of a not-a-effect variable has an amazing parallel in epidemiology • Called “instrumental variable” • Martjin Katan’s idea on cholesterol 胆固醇 cancer 癌症 association: he proposed to use a genotype (apoliprotein 载脂蛋白 E) as the third variable (Lancer 1986, i:507-508) • Katan did not use conditional correlation • This idea is now called “Mendelian randomization” Wentian Li, North Shore LIJ Health System

  16. Wentian Li, North Shore LIJ Health System

  17. Rheumatoid Arthritis (RA) • An autoimmune 自我免疫的 disease • Chronic inflammation 炎症 of joints 关节 • Three times more likely to occur in women than men • Age of onset 40-60 • Twin 双胞胎 concordance rates: 12-15% for MZ单合子,单卵双生, 5% for DZ 异卵双生 • Genetic and environmental (e.g. smoking) risk factors Wentian Li, North Shore LIJ Health System

  18. MHC/HLA: the main genetic contribution of RA • MHC (Major Histocompatibility Complex主要组织相容性复合体) or HLA (Human leukocyte antigens 人类白血球抗原): HLA-DRB1 gene on chromosome 6 (6p21.3) • The RA associated alleles are HLA-DRB1*0401, *0404, *0408 (Caucasian), not *0402, *0403, *0407 • In Asian population, different DRB1 alleles are associated with RA (e.g. *0405, *0901) • A group of DRB1 risk alleles are called “shared epitope” (SE) 共同表位, or rheumatoid epitope, code position 70-74 amino acids in the third hypervariable region Wentian Li, North Shore LIJ Health System

  19. Two Auto-antibodies are strongly associated with RA: RF and anti-CCP • RF (rheumatoid factor 类风湿因子): 80% of RA patients are RF positive • anti-CCP (anti-cyclic citrullinated peptide antibody 抗环瓜氨酸肽抗体,抗CCP抗体): even better predictor of RA in early stage • HLA-DRB1, RF, anti-CCP are all associated with the RA disease, and they are associated with each other. CCC rule can be applied! 张利方,阎有功,黄前川,等, “抗环瓜氨酸肽抗体在类风湿性关节炎诊断中的应用”, 免疫学杂志,2004,20:52-57 Wentian Li, North Shore LIJ Health System

  20. Q: Between RF and anti-CCP, which one is the cause and which is the effect? Wentian Li, North Shore LIJ Health System

  21. 1723 Caucasian RA patients anti-CCP positive anti-CCP negative Wentian Li, North Shore LIJ Health System

  22. Association between RF and DRB1 genotype is lost conditional on anti-CCP Wentian Li, North Shore LIJ Health System

  23. By the CCC rule, anti-CCP is the cause, RF is the effect Or, anti-CCP is upstream and RF is downstream in a pathway Wentian Li, North Shore LIJ Health System

  24. Discussions/Issues • There are evidences that RA patients become anti-CCP positive before becoming RF positive • The three-way correlation might be lost in normal controls (here we have a “case-only” analysis) • In-between anti-CCP and RF, other factors are possible (so the cause-effect may not be direct) • It is not clear where the smoking factor comes in (could be an intriguing analysis with smoking data!) Wentian Li, North Shore LIJ Health System

  25. MR needs a not-an-effect variable (gene) Conditional association is not used Only need a counter example (e.g. Apo E2 samples have low cholesterol, but NOT high cancer risk) LCD needs a variable that is not a cause Conditional association is used Complete information of (G, IP, D) trio for all samples (e.g. Apo genotype, cholesterol level, cancer status) Revisit Katan’s “Mendelian Randomization” (MR) by LCD[Wang, Li, unpublished] Wentian Li, North Shore LIJ Health System

  26. Co-Authors • Mingyi WANG (Zhejiang Univ, Computer Science Department, causal inference) • Patricia Irigoyen, Peter Gregersen (North Shore LIJ, RA data) Wentian Li, North Shore LIJ Health System

  27. Wentian Li, North Shore LIJ Health System

More Related