250 likes | 411 Views
Gene, pathway and network frameworks to identify epistatic interactions of single nucleotide polymorphisms derived from GWAS data. Yu Liu, Sean Maxwell, Tao Feng , Xiaofeng Zhu, Robert C Elston , Mehmet Koyutürk , Mark R Chance
E N D
Gene, pathway and network frameworks to identify epistatic interactions of single nucleotide polymorphisms derived from GWAS data Yu Liu, Sean Maxwell, Tao Feng, XiaofengZhu, Robert C Elston, Mehmet Koyutürk, Mark R Chance From The International Conference on Intelligent Biology and Medicine (ICIBM) Nashville, TN, USA. 22-24 April 2012
Outline • Background • Problem & Motivation • Proposed Solution • Experiment &Results • Comment
Background • Terminology • Nucleotide: 4 basic parts of DNA (A, C, G, T) • Single-nucleotide polymorphisms (SNP): a single nucleotide differences at the same location of the DNA sequence.
Background Typical GWAS Analysis Patient Sample (Case) Step 1 Compare difference across chromosome Sequences between case and control Healthy Sample (Control) Step 2 Measure the additive contribution of each SNPs to genetic risk one at a time Chromosome Sequences Chromosome Sequences
Problem & Motivation • Problem 1:Heritability • Disease associated SNPs found are not consistent with estimated heritability. • Gene-gene interaction may be the reason to explain the inconsistency • Problem 2: Statistical significance • The p-value has to be extremely low (< 10-13) due to multiple hypothesis testing corrections • Very few interactions can pass such strict thresholds
Problem & Motivation • Problem 3: Burden of Exhaustive Search • To reduce burden, there are 2 limited approach: • Heuristic / Two-step (screen-testing) • It may miss true interactions • Gene set enrichment analysis (pathway driven) • All the genes in the pathway are consider as equal • Cannot reveal the discrete structure of potential relationships of mechanistic interest.
Proposed Solution • Solution: Build biological frameworks to reduce search space (Attribute Selection) • SNPs for pairwise test are selected based on the 4 areas of biological knowledge: • Gene / Pathway / Disease Specific Network /eSNP. • Selected SNPs are pairwise tested exhaustively • The approach to used to study type II diabetes
Gene-SNP Assignment • Relations between SNPs and genes • If a SNP X is located: • Within the gene Y or • The upper / lower 20kb region of the gene Y • Then, SNP X is related to gene Y. • Notice: • SNPs and Genes are having many to many mapping • Position Dataset • Gene: UCSC table browser • SNP: WTCCC dataset
SNP Pair Testing • Disease association of single SNP / SNP pairs are measured using logistic regression • P-Values are: • Corrected using Bonferroni method and • Significant if it is lower than the threshold 0.05 • The p-value of single SNPs would be calculated in a similar way
Gene Based Interaction Search • Procedure: • Map each SNPs to their corresponding genes. • For each gene G, perform tests between any 2 SNPs mapped to G.
Pathway Based Interaction Search • Procedure: • Map each SNPs to their corresponding genes. • For each pathway, record down all genes participated • Perform pairwise tests between SNPs mapped to different genes that appeared in each pathway
Network Based Interaction Search • Procedure • Form a seed (disease associated) gene set that have interactions with others. • Form a gene-gene interactions set based on interaction among all genes • From a Steiner tree based on data above. • Steiner point: Unknown genes, Other point: Seed genes • Perform pairwise tests on any SNPs mapped to the genes that appears in the network
eSNPBased Interaction Search • Procedure: • Match eSNPs and gene from association data of previous study and public database after p-value filtering • Perform pairwise tests between any eSNPs and SNPs in the genes
Dataset Used • Disease Studied: Type II Diabetes • Dataset used: WTCCC dataset • WTCCC predefined disease associated SNPs: • rs9465871, rs4506565, and rs9939609 • Significant P-value for SNP pair having those SNPs • Can’t justify whether it is true association • Predefined SNPs may affect the results • Those SNPs are removed • Focus on discovering non-significant SNPs
Gene Based Interaction Experiment • 9 SNP Pairs with significant p-value are selected • P-Value of each SNP is not very significant
Gene Based Interaction Search • ZFAT, NDST3, C9orf3 are not known to be related to Type II diabetes in previous study. • New Discovery! • PPM1A is a important gene for insulin-signaling. • In IRS (insulin regulated signaling) pathway • Dephosphorylate and negatively regulate MAP kinases • A proof for author’s approach
Pathway Based Interaction Experiment • 655 pathways are considered • 1 statistically significant SNPs pairs detected: • PPARA & CDC6 are not in the same pathway • rs1130199 is located at the overlapped 20kb region around PPARA and CDC6. • rs1130199 is associated to both PPARA and CDC6.
Pathway Based Interaction Experiment • RARA: • Present in two pathways with PPARA • Lipid metabolism Toxicity pathway • Nuclear receptor transcription pathway. • Segments of CDC6 and RARA have strong associations.
Pathway Based Interaction Experiment • PPARA: • Nuclear transcription factor which affects cell proliferation, cell differentiation and immune responses. • Associate with diabetic “microvascular” complications • Interacts with PPARGC1A and disease associated PPARG • CDC6: • Affect DNA replicationand the early steps of DNA replication • No association with CDC6 or RARA reported. • New Discovery
Network Based Interaction Experiment • 354 seed genes and 99 normal gene formed the network. • 1 statistically significant SNPs pairs detected: • rs1130199 is located at the overlapped 20kb region around PPARA and CDC6. • rs1130199 is associated to both PPARA and CDC6 • Two pairs of SNPs shared 1 SNP: rs41433646
Network Based Interaction Experiment • RBM19, OLFML2B are not known to be related to Type II diabetes in previous study. • New Discovery! • ATF6: • Activates transcription factor unfolded gene • Its polymorphisms are reported to be associated with diabetes in various populations
Merging the Results • Merging the results from pathway and network analysis for pairwise testing • 1 SNP rs4253764 was detected in both analyses. • rs1130199 is not found in sub-network • rs2490429, rs41433646 are not found in pathway • 1 more SNP pairwise interaction is found • 4 out of 6 SNP pairs are found (highly connected) • The network is a new discovery!
eSNPBased Interaction Experiment • 1 SNP pair was detected with significant p-value • Observation • rs12517663 can affects the expression of KLHDC4 • KLHDC4 is not known to be related to Type II diabetes in previous study. • New Discovery!
Comment • Advantages: • It can reduce the search space significantly • SNPs are selected for interaction test using biological knowledge. • The SNPs are annotated during the process • Limitation: • The use of Steiner tree has an assumption: • All the disease associated gene are interconnected with each other. • This assumption may not be true