1 / 25

Outline

Gene, pathway and network frameworks to identify epistatic interactions of single nucleotide polymorphisms derived from GWAS data. Yu Liu, Sean Maxwell, Tao Feng , Xiaofeng Zhu, Robert C Elston , Mehmet Koyutürk , Mark R Chance

selima
Download Presentation

Outline

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Gene, pathway and network frameworks to identify epistatic interactions of single nucleotide polymorphisms derived from GWAS data Yu Liu, Sean Maxwell, Tao Feng, XiaofengZhu, Robert C Elston, Mehmet Koyutürk, Mark R Chance From The International Conference on Intelligent Biology and Medicine (ICIBM) Nashville, TN, USA. 22-24 April 2012

  2. Outline • Background • Problem & Motivation • Proposed Solution • Experiment &Results • Comment

  3. Background • Terminology • Nucleotide: 4 basic parts of DNA (A, C, G, T) • Single-nucleotide polymorphisms (SNP):  a single nucleotide differences at the same location of the DNA sequence.

  4. Background Typical GWAS Analysis Patient Sample (Case) Step 1 Compare difference across chromosome Sequences between case and control Healthy Sample (Control) Step 2 Measure the additive contribution of each SNPs to genetic risk one at a time Chromosome Sequences Chromosome Sequences

  5. Problem & Motivation • Problem 1:Heritability • Disease associated SNPs found are not consistent with estimated heritability. • Gene-gene interaction may be the reason to explain the inconsistency • Problem 2: Statistical significance • The p-value has to be extremely low (< 10-13) due to multiple hypothesis testing corrections • Very few interactions can pass such strict thresholds

  6. Problem & Motivation • Problem 3: Burden of Exhaustive Search • To reduce burden, there are 2 limited approach: • Heuristic / Two-step (screen-testing) • It may miss true interactions • Gene set enrichment analysis (pathway driven) • All the genes in the pathway are consider as equal • Cannot reveal the discrete structure of potential relationships of mechanistic interest.

  7. Proposed Solution • Solution: Build biological frameworks to reduce search space (Attribute Selection) • SNPs for pairwise test are selected based on the 4 areas of biological knowledge: • Gene / Pathway / Disease Specific Network /eSNP. • Selected SNPs are pairwise tested exhaustively • The approach to used to study type II diabetes

  8. Gene-SNP Assignment • Relations between SNPs and genes • If a SNP X is located: • Within the gene Y or • The upper / lower 20kb region of the gene Y • Then, SNP X is related to gene Y. • Notice: • SNPs and Genes are having many to many mapping • Position Dataset • Gene: UCSC table browser • SNP: WTCCC dataset

  9. SNP Pair Testing • Disease association of single SNP / SNP pairs are measured using logistic regression • P-Values are: • Corrected using Bonferroni method and • Significant if it is lower than the threshold 0.05 • The p-value of single SNPs would be calculated in a similar way

  10. Gene Based Interaction Search • Procedure: • Map each SNPs to their corresponding genes. • For each gene G, perform tests between any 2 SNPs mapped to G.

  11. Pathway Based Interaction Search • Procedure: • Map each SNPs to their corresponding genes. • For each pathway, record down all genes participated • Perform pairwise tests between SNPs mapped to different genes that appeared in each pathway

  12. Network Based Interaction Search • Procedure • Form a seed (disease associated) gene set that have interactions with others. • Form a gene-gene interactions set based on interaction among all genes • From a Steiner tree based on data above. • Steiner point: Unknown genes, Other point: Seed genes • Perform pairwise tests on any SNPs mapped to the genes that appears in the network

  13. Network Based Interaction Search

  14. eSNPBased Interaction Search • Procedure: • Match eSNPs and gene from association data of previous study and public database after p-value filtering • Perform pairwise tests between any eSNPs and SNPs in the genes

  15. Dataset Used • Disease Studied: Type II Diabetes • Dataset used: WTCCC dataset • WTCCC predefined disease associated SNPs: • rs9465871, rs4506565, and rs9939609 • Significant P-value for SNP pair having those SNPs • Can’t justify whether it is true association • Predefined SNPs may affect the results • Those SNPs are removed • Focus on discovering non-significant SNPs

  16. Gene Based Interaction Experiment • 9 SNP Pairs with significant p-value are selected • P-Value of each SNP is not very significant

  17. Gene Based Interaction Search • ZFAT, NDST3, C9orf3 are not known to be related to Type II diabetes in previous study. • New Discovery! • PPM1A is a important gene for insulin-signaling. • In IRS (insulin regulated signaling) pathway • Dephosphorylate and negatively regulate MAP kinases • A proof for author’s approach

  18. Pathway Based Interaction Experiment • 655 pathways are considered • 1 statistically significant SNPs pairs detected: • PPARA & CDC6 are not in the same pathway • rs1130199 is located at the overlapped 20kb region around PPARA and CDC6. • rs1130199 is associated to both PPARA and CDC6.

  19. Pathway Based Interaction Experiment • RARA: • Present in two pathways with PPARA • Lipid metabolism Toxicity pathway • Nuclear receptor transcription pathway. • Segments of CDC6 and RARA have strong associations.

  20. Pathway Based Interaction Experiment • PPARA: • Nuclear transcription factor which affects cell proliferation, cell differentiation and immune responses. • Associate with diabetic “microvascular” complications • Interacts with PPARGC1A and disease associated PPARG • CDC6: • Affect DNA replicationand the early steps of DNA replication • No association with CDC6 or RARA reported. • New Discovery

  21. Network Based Interaction Experiment • 354 seed genes and 99 normal gene formed the network. • 1 statistically significant SNPs pairs detected: • rs1130199 is located at the overlapped 20kb region around PPARA and CDC6. • rs1130199 is associated to both PPARA and CDC6 • Two pairs of SNPs shared 1 SNP: rs41433646

  22. Network Based Interaction Experiment • RBM19, OLFML2B are not known to be related to Type II diabetes in previous study. • New Discovery! • ATF6: • Activates transcription factor unfolded gene • Its polymorphisms are reported to be associated with diabetes in various populations

  23. Merging the Results • Merging the results from pathway and network analysis for pairwise testing • 1 SNP rs4253764 was detected in both analyses. • rs1130199 is not found in sub-network • rs2490429, rs41433646 are not found in pathway • 1 more SNP pairwise interaction is found • 4 out of 6 SNP pairs are found (highly connected) • The network is a new discovery!

  24. eSNPBased Interaction Experiment • 1 SNP pair was detected with significant p-value • Observation • rs12517663 can affects the expression of KLHDC4 • KLHDC4 is not known to be related to Type II diabetes in previous study. • New Discovery!

  25. Comment • Advantages: • It can reduce the search space significantly • SNPs are selected for interaction test using biological knowledge. • The SNPs are annotated during the process • Limitation: • The use of Steiner tree has an assumption: • All the disease associated gene are interconnected with each other. • This assumption may not be true

More Related