370 likes | 499 Views
Weighted Interaction SNP Hub (WISH) network method for building genetic networks for complex diseases and traits using whole genome genotype data.
E N D
Weighted Interaction SNP Hub (WISH) network method for building genetic networks for complex diseases and traits using whole genome genotype data Author: Lisette JA Kogelman, Haja N Kadarmideen (Department of Veterinary Clinical and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen) Presented By: Kester lee
Agenda • Background • Motivation & Objective • Methodology • Results • Discussion
Nucleotide(1) • 4 chemicals that compose any DNA sequence • A nucleotide is represented by a character{A, G, T, C}
Nucleotide(2) • Nucleotide can form hydrogen bonds among themselves: • Due to their chemical structures • Guanine has a strong tendency bounding to Cytosine • Adenine has a strong tendency bounding to Thymine Adenine and Thymine Guanine & Cytosine
Deoxyribonucleic acid • 2 single strand DNA forms a double strand DNA through hydrogen bond • Double strand DNA has a double helix structure Nucleotides are connected to form a singe strand DNA through a backbone Knowing the nucleotide sequence of 1 strand automatically know the nucleotide sequence of another strand!!
Single nucleotide polymorphism(1) • A nucleotide variation that is: • Occurring at a specific location on DNA sequence among all the individuals of a population • Processing only 2 possible alleles • Major Allele: High frequency • Minor Allele: Low frequency
Single nucleotide polymorphism(2) • For pair chromosome, a SNP: • Is a pair of alleles • Has 3 possible states: • Major Allele & Major Allele • Minor Allele & Major Allele • Minor Allele & Minor Allele
Genome-wide Association Study 3 billion nucleotides (characters) Next Generation DNA Sequencing Compare GWAS aims to find the associations between genetic variations and observable traits. Case (Trait ✔) Control (Trait ✘)
Motivation • Current GWA studies only identify low risk genes and explain a small part of the predicted heritability. • Under the multiple-testing problem, statistical test are not effective and many potential SNPs are missed • Existing approach failed to explain the cause of complex diseases and traits in terms of genetic variation interaction • Curse of dimensionality:Analyzing nth SNP-SNP interaction requires an exponential increase of the sample size
Objective • Develop an GWAS analysis framework that can • Model SNP-SNP interactions as a scale-free network • Discover high interaction area in the network • Associate SNP high interaction area with genes and pathways
Scale-free Network • It is a network model that has adegree distributionfollowing power law • The number of vertices that have k connection with other vertices is: • Where γis a empirical parameter • It is used as the SNP network model because: • It is the gene network model in adopted by WCGNA • GWAS is similar to gene expression profile analysis • It may be a nice model for SNP-SNP interaction also.
Hierarchical Clustering • Hierarchical Clustering would produces a set of nested clusters organized as a hierarchical tree • It can be visualized as a dendrogram A dendrogram A Visualization of the clusters
Approach Overview Their approach is high influenced by a gene network construction algorithm – WGCNA
SNP Selection • Only a subset of SNPs are selected for constructing SNP-SNP network • Reduce the time and space complexity • For example, space of storing the adjacent matrix • Only SNPs with the following 2 properties are selected • A genome-wide significant p-value • A significant variation (minor allele frequency) across the population
SNP-SNP Connection Strength(1) • WISE included 2 measurements on Pair-wise SNP-SNP connection strength • Approach 1: Epistemic Interaction Network Where: γ is the trait, ε is the error term, μ, C1, C2, C3 are coefficients and SNPi, SNPj are the encoded genotypes of the SNP • After performing regression on every SNP pair, C3 is used as the connectivity measurement
SNP-SNP Connection Strength(2) • Approach 2: Genomic Correlation Where: SNPi, SNPjrepresent the encoded genotypes of the SNP i and j respectively cov(x, y) represents the covariance between x and y and can be calculated by: E[(x – E(x)) (x – E(x))]
SNP Connectivity • WISE defines the connectivity of a SNP as its sum of connection strengths between itself and all other SNPs • If a SNP has a high SNP connectivity, it is considered as a hub SNP • SNP used in the network construction can be limited by applying a threshold on the connectivity of the SNP
Topological Overlap Measure • Topological Overlap Measure(TOM) between SNPi and SNPj is calculated by: Where: Nij is the no. of SNPs which are connected to SNPi and SNPj Aij is the adjacency between SNPi and SNPj ci, cjis the SNP connectivity of SNPi and SNPj respectivity
Network Construction • WISE constructs a SNP-SNP network through: • Forming a adjacency(connection strength) matrix of the SNPs • Raising the adjacency matrix to a power of K • Applying the Topological Overlap Measure (TOM) to measure the relation between each pair of SNPs • Using dissimilarity TOM (1-TOM) to build dendrogram
Module Identification and Selection • WISE identify SNP modules in the network by: • Applying dynamic tree cut algorithm to splits the dendrogram into SNP modules • WISE select SNP modules by: • GMAT • Calculate eigenSNP(1st principle component) of each module • Choose the module with eigenSNP that have high correlation with other eigenSNP or with trait • Gene Ontology Enrichment (Based on GOEAST) • SNP module that have enriched GO-terms with a p-value < 0.05 is selected
Experiment Overview • The experiment on WISE was conduced under these 3 different configurations:
Dataset Source • Real Dataset: • Species: F2 pig resource population • Number of SNP: 39704 • Phenotype(trait): Carcass weight • Synthetic Dataset: • Generated by authors themselves • Number of SNPs: 1000 • Genotype distribution: normal distribution, range(-1,1) • Phenotypes distribution: N(0,1) • Pairwiseepistatic interaction distribution: Follow assumption stated in the paper “Epistasis and its contribution to genetic variance components”
Experiment 1 • Parameters: • Power of the adjacency matrix: 5 • Number of SNP used in building dendrogram: 1500 • After applying dynamic tree cut algorithm: • 23 SNP modules are identified • In each SNP Module, there are ≥ 30 SNPs • After testing the modules with GMAT, 3 modules are identified: • Blue module • Cyan module • Turquoise module
Experiment 1(2) Low connectivity has a high frequency Frequency drop exponentially as connectivity drop. Scale-free topology index is 0.92 ↓ It is a scale-free network Histogram of SNP connectivity. The log-log plot of the histogram
Experiment 1(3) At the horizontal and vertical axis, the SNPs are assigned to modules and each module have a separate colour. SNP-SNP interaction are concentrated in the modules ↓ The clustering is done effectively Heat diagram of dendrogram build in experiment 1
Experiment 1(5) • NCBI2R and is applied to associate the SNP in the modules to the genes and pathway
Experiment 2 • Parameters: • Power of the adjacency matrix: 4 • After applying dynamic tree cut algorithm: • 10 SNP modules are identified A Clear clustering of the simulated SNPs A small number of highly connected SNPs ↓ Network follows scale-free topology criterion.
Experiment 2 • Parameters: • Power of the adjacency matrix: 4 • After applying dynamic tree cut algorithm: • 10 SNP modules are identified A Clear clustering of the simulated SNPs A small number of highly connected SNPs ↓ Network follows scale-free topology criterion.
Experiment 3 • Parameters: • Power of the adjacency matrix: 4 • Number of SNP used in building dendrogram: 955 • After applying dynamic tree cut algorithm: • 5 SNP modules are identified • In each SNP Module, there are ≥ 25 SNPs • After testing the modules with GMAT, 1 modules are identified: • Red module • PI3K-Akt signalling pathway, synaptic vesicle cycle • Cell growth and insulin resistance
Dendrogrambuild in experiment 3. The SNPs are assigned to modules and each module have a separate colour
Discussion • Contribution • A WGCNA based approach in detecting SNP-SNP interaction. • Adopted new connection strength measurement • The concept of SNP modules is new • Pitfall • No comparison with existing SNP-SNP interaction detection program like MDR • Number of SNP in F2 pig resource population is not large enough to reveal the scalability of this approach under human dataset