470 likes | 485 Views
Explore the integration of gene expression data and genotype data for identifying causal genes of Chronic Fatigue Syndrome. Use likelihood-based model selection and biological interpretation for insights.
E N D
Integration of Expression Data and Genotype Data: Application of Chronic Fatigue Syndrome Data EunJee Lee1, Seoae Cho1, Taesung Park2 1 Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Korea 2 Department of Statistics, Seoul National University, Seoul, Korea
Contents • INTRODUCTION • Needs for Integration of Data • METHOD • Integration of Gene expression data and Genotype data • Likelihood based model selection • Test for identifying causal genes of disease • Biological Interpretation • RESULT • Application to Chronic Fatigue Syndrome data • SUMMARY AND CONCLUSION
Introduction New Technology Central Dogma DNA Genotype Data (SNP polymorphism) SNP data analysis mRNA Gene expression Data DNA microarray analysis Protein Protein Expression Data Phenotype Disease
Introduction • Analysis between Gene expression Data and Disease in Chronic Fatigue Syndrome data • One Way ANOVA , CyberT, SAM test • Result • No Significant Result!! • Limitation • Chronic Fatigue Syndrome is complex disease • One type of data may represent only partial information for a disease. • It would be quite useful to combine both types of data.
DNA mRNA Protein Phenotype Introduction • Integration of expression and Genotype data. • Questions of the causality of gene expression level • Needs to identify the causal relationships
ANOVA Logistic Regression Integration of Expression and Genotype data • Causal Model • Reactive Model • Independent Model STEP 1 Causality Model Selection Logistic Regression Two Way ANOVA STEP 2 Test Gene Ontology Enrichment Pathway Enrichment STEP 3 Biological Interpretation
mRNA SNP Disease mRNA SNP Disease mRNA SNP Disease STEP1 Model Selection • Models for causality (Schadt et al. 2005) • Causal Model • Reactive Model • Independent Model
STEP1 Model Selection • Likelihood-based Causality Model Selection (LCMS) (Schadt et al. 2005) • Likelihood-based causality model selection test that uses conditional correlation measures to determine which relationship among traits is best supported by the data. • Likelihood associated with each of the models are constructed and maximized with respect to the model parameters, and the model with the smallest AIC value is identified as the model best supported by the data.
: SNP genotype • : mRNA level • : Disease STEP1 Model Selection • Causal Model • Joint Probability • Likelihood
: SNP genotype • : mRNA level • : Disease STEP1 Model Selection • Reactive Model • Joint probability • Likelihood
: SNP genotype • : mRNA level • : Disease STEP1 Model Selection • Independent Model • Joint Probability • Likelihood
mRNA SNP Disease STEP2 Identify causal genes for a disease • Causal Model • Logistic Regression • Model represents the probability of getting disease represents the genotype of one SNPs represents the gene expression values of DNA microarray represents the interaction effect between SNPs and DNA microarray
mRNA SNP Disease STEP2 Identify causal genes for a disease • Reactive Model • Two Way ANOVA • Model represent expression levels represents the effect of the SNPs genotypes represents the effect of disease groups represents the interaction effect
mRNA SNP Disease STEP2 Identify causal genes for a disease • Independent Model • Test use One type of Data • Logistic Regression • Model represent SNP genotype, represents frequency of getting Disease • detecting SNP in linkage with disease loci. • One way ANOVA • Model represents gene expression level, represents SNP genotype • identifying SNP regulating gene expression level.
STEP 3 Biological Interpretation • Enrichment Study of Gene Ontology • Enrichment Study of Pathway
Application to Chronic Fatigue Syndrome Data • Expression data For Chronic Fatigue Syndrome Data • mRNA level in Mononuclear cell • Expression level of 20160 genes are shown
Application to Chronic Fatigue Syndrome Data • Pre-processing of Gene expression Data • Filtering • Quantile Normalization • Significant level • FDR control using Benjamini and Hochberg method (Benjamini,Y. and Hochberg,Y. 1995 ) for multiple testing correction • 5% FDR
Application to Chronic Fatigue Syndrome Data • CFS vs. Nonfatigued Groups • STEP 1. Model Seletcion • STEP 2. Test for Identifying causal Genes • STEP 3. Biological Intepretation • CFS-MDDm vs Nonfatigued Groups • STEP 1. Model Seletcion • STEP 2. Test for Identifying causal Genes • STEP 3. Biological Intepretation
Application to CFS and Nonfatigued Groups • STEP 1. MODEL SELECTION Logistic Regression Two- way ANOVA Independent Test
Application to CFS and Nonfatigued Groups • STEP 2. Test for identifying key driver genes • rs258750 in NR3C1 gene • Independent Model has the significant Results.
mRNA SNP Disease Application to CFS and Nonfatigued Groups • Gene expression level and Genotype variation • rs258750 regulates expression level of 166 genes. rs258750 AA rs258750 AG/GG
Application to CFS and Nonfatigued Groups The evidence of neuroendocrine regulation of immunity Webster and Tonelli Annu.Rev.Immunol.2002.
Application to CFS and Nonfatigued Groups • STEP 2. Test for identifying key driver genes • rs2918419 in NR3C1 gene • Independent model is selected mostly
? mRNA SNP mRNA Disease SNP Disease Application to CFS and Nonfatigued Groups • SNPs except rs258750 in NR3C1 gene • Gene expression level and Genotype variation : no significant results. • Genotype variation and Disease : six SNPs in NR3C1 gene are significant
Chr 7 NR3C1 Application to CFS and Nonfatigued Groups rs258750 rs6188 rs852977 rs860458 rs2918419 rs1866388 CFS • Glucocorticoid Receptor • regulates glucocorticoid levels in blood • The level of glucocorticoid in Hypothalamic-pituitary-adrenal(HPA) axis has a significant effect on fatigue (Chaudhuri. Et al. The LANCET)
Application to Chronic Fatigue Syndrome Data • CFS vs. Nonfatigued Groups • STEP 1. Model Seletcion • STEP 2. Test for Identifying causal Genes • STEP 3. Biological Intepretation • CFS-MDDm vs Nonfatigued Groups • STEP 1. Model Seletcion • STEP 2. Test for Identifying causal Genes • STEP 3. Biological Intepretation
Application to CFS-MDDm and Nonfatigued Groups • STEP 1. Model Selection ANOVA Logistic Regression Two Way ANOVA Logistic Regression
Application to CFS-MDDm and Nonfatigued Groups • STEP 2. Test for identifying key driver genes • rs933271 in COMT gene • Independent Model
? mRNA SNP mRNA Disease SNP Disease Application to CFS-MDDm and Nonfatigued Groups • rs933271 and rs5993882 in COMT gene • Independent Model • Genotype variation and Gene expression level • No significant result • Genotype variation and disease
Application to CFS-MDDm and Nonfatigued Groups • STEP 2. Test for identifying key driver genes • rs6188 in NR3C1 gene • Genes in reactive model has many significant results 234
Application to CFS-MDDm and Nonfatigued Groups • STEP 3 Biological Interpretation • Gene Ontology Enrichment Study of results of Tests(GOstats)
Application to CFS-MDDm and Nonfatigued Groups • Pathway Enrichment Study (from in BioCarta) • Agrin in Postsynaptic Differentiation
Application to CFS-MDDm and Nonfatigued Groups • Eicosanoid metabolism -Eicosapentaenoic acid-rich essential fatty acid supplementation in chronic fatigue syndrome associated with symptom remission and structural brain changes. Int J Clin Pract. 2004 Mar;58(3):297-9. -The use of eicosapentaenoic acid in the treatment of chronic fatigue syndrome.Prostaglandins Leukot Essent Fatty Acids. 2004 Apr;70(4):399-401. Review. -Determination of fatty acid levels in erythrocyte membranes of patients with chronic fatigue syndrome.Nutr Neurosci. 2003 Dec;6(6):389-92. -Eicosanoids and essential fatty acid modulation in chronic disease and the chronic fatigue syndrome.Med Hypotheses. 1994 Jul;43(1):31-42. Review.
Application to CFS-MDDm and Nonfatigued Groups • Actin regulation Dysregulated expression of tumor necrosis factor in chronic fatigue syndrome: interrelations with cellular sources and patterns of soluble immune mediator expression.Clin Infect Dis. 1994 Jan;18 Suppl 1:S147-53.
Application to CFS-MDDm and Nonfatigued Groups • Other significant pathway (in BioCarta) • Biosynthesis of cystein in mammals • Biosynthesis of Threonine and methionine • Inactivation of Gsk3 AKT cause accumulation of b-catein in Alveolar Macrophages • Basic Mechinisms of SUMOylaation • Catabolic pathways for Methionine, isoleucine, threonine and valine • ALK in cardiac myocytes • Overview of telomerase RNA component gene hTerc transcriptional regulation • Biosynthesis of neurotransmitters
Chr22 COMT rs933271 rs5993882 Nonfatigued CFS-MDDm Application to CFS-MDDm and Nonfatigued Groups Chr 7 NR3C1 rs6188 rs852977 CFS-MDDm
Summary and Conclusion • CFS and CFS-MDDm has causal relationships, and different pathway to provoke the disease
Chr 7 NR3C1 CFS and Nonfatigued Groups rs258750 rs6188 rs852977 rs860458 rs2918419 rs1866388 CFS
Chr22 COMT rs933271 rs5993882 Nonfatigued CFS-MDDm CFS-MDDm and Nonfatigued Groups Chr 7 NR3C1 rs6188 rs852977 CFS-MDDm
Summary and Conclusion • Advantage • The causal relationships between Gene expression levels ,Genetic variation and disease for identifying causal genes of a disease • Limitation • Complicated causal Models • Future Analysis • More complicated causal models such as feedback Models • Develop Sophisticated method for other possible models of Causality • Integration method adding Protein Data
Reference • Schadt.E.E,et al , Integrating Genotype and Gene expression Data to Identify Key Drivers of Complex Traits. Nat.Genetics.37,2005. • Rolf H.Adler,Chronic fatigue syndrome(cfs),SWISS MED WKLY,2004:134:268-176 • V.Tusher, R.Tibshirani, G.Chu, Significance analysis of microarrays applied to the ionizing radiation response, PNAS, 2001, 98:5177-5121 • Baldi,P.,Long,AD, Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes, Bioinformatics 17,2001,509-519 • Jeanette I. Webster, Leonardo Tonelli, Neuroendocrine regulation of Immunity, Annu.Rev.Immunol.2002.20:125-63
Reference • Principles of Neural Scienece, Kandel, Schwartz and Jessell, Fourth edition • http://gostat.wehi.edu.au/ • http://snubi.org • http://www.biocarta.com/ • Benjamini,Y. and Hochberg,Y.(1995). Controlling the False Discovery Rate, A Practical and Powerful Approach to Multiple Testing. Journal of Royal statistical Society Series B, 57(1), 289-300 • A.Chaudhuri, P.O.Behan, Fatigue in neurological disorders, THELANCET,363,2004,978-988