1 / 37

Agenda

Weighted Interaction SNP Hub (WISH) network method for building genetic networks for complex diseases and traits using whole genome genotype data.

idra
Download Presentation

Agenda

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Weighted Interaction SNP Hub (WISH) network method for building genetic networks for complex diseases and traits using whole genome genotype data Author: Lisette JA Kogelman, Haja N Kadarmideen (Department of Veterinary Clinical and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen) Presented By: Kester lee

  2. Agenda • Background • Motivation & Objective • Methodology • Results • Discussion

  3. Background

  4. Nucleotide(1) • 4 chemicals that compose any DNA sequence • A nucleotide is represented by a character{A, G, T, C}

  5. Nucleotide(2) • Nucleotide can form hydrogen bonds among themselves: • Due to their chemical structures • Guanine has a strong tendency bounding to Cytosine • Adenine has a strong tendency bounding to Thymine Adenine and Thymine Guanine & Cytosine

  6. Deoxyribonucleic acid • 2 single strand DNA forms a double strand DNA through hydrogen bond • Double strand DNA has a double helix structure Nucleotides are connected to form a singe strand DNA through a backbone Knowing the nucleotide sequence of 1 strand automatically know the nucleotide sequence of another strand!!

  7. Single nucleotide polymorphism(1) • A nucleotide variation that is: • Occurring at a specific location on DNA sequence among all the individuals of a population • Processing only 2 possible alleles • Major Allele: High frequency • Minor Allele: Low frequency

  8. Single nucleotide polymorphism(2) • For pair chromosome, a SNP: • Is a pair of alleles • Has 3 possible states: • Major Allele & Major Allele • Minor Allele & Major Allele • Minor Allele & Minor Allele

  9. Genome-wide Association Study 3 billion nucleotides (characters) Next Generation DNA Sequencing Compare GWAS aims to find the associations between genetic variations and observable traits. Case (Trait ✔) Control (Trait ✘)

  10. Motivation & Objective

  11. Motivation • Current GWA studies only identify low risk genes and explain a small part of the predicted heritability. • Under the multiple-testing problem, statistical test are not effective and many potential SNPs are missed • Existing approach failed to explain the cause of complex diseases and traits in terms of genetic variation interaction • Curse of dimensionality:Analyzing nth SNP-SNP interaction requires an exponential increase of the sample size

  12. Objective • Develop an GWAS analysis framework that can • Model SNP-SNP interactions as a scale-free network • Discover high interaction area in the network • Associate SNP high interaction area with genes and pathways

  13. Scale-free Network • It is a network model that has adegree distributionfollowing power law • The number of vertices that have k connection with other vertices is: • Where γis a empirical parameter • It is used as the SNP network model because: • It is the gene network model in adopted by WCGNA • GWAS is similar to gene expression profile analysis • It may be a nice model for SNP-SNP interaction also.

  14. Hierarchical Clustering • Hierarchical Clustering would produces a set of nested clusters organized as a hierarchical tree • It can be visualized as a dendrogram A dendrogram A Visualization of the clusters

  15. Methodology

  16. Approach Overview Their approach is high influenced by a gene network construction algorithm – WGCNA

  17. SNP Selection • Only a subset of SNPs are selected for constructing SNP-SNP network • Reduce the time and space complexity • For example, space of storing the adjacent matrix • Only SNPs with the following 2 properties are selected • A genome-wide significant p-value • A significant variation (minor allele frequency) across the population

  18. SNP-SNP Connection Strength(1) • WISE included 2 measurements on Pair-wise SNP-SNP connection strength • Approach 1: Epistemic Interaction Network Where: γ is the trait, ε is the error term, μ, C1, C2, C3 are coefficients and SNPi, SNPj are the encoded genotypes of the SNP • After performing regression on every SNP pair, C3 is used as the connectivity measurement

  19. SNP-SNP Connection Strength(2) • Approach 2: Genomic Correlation Where: SNPi, SNPjrepresent the encoded genotypes of the SNP i and j respectively cov(x, y) represents the covariance between x and y and can be calculated by: E[(x – E(x)) (x – E(x))]

  20. SNP Connectivity • WISE defines the connectivity of a SNP as its sum of connection strengths between itself and all other SNPs • If a SNP has a high SNP connectivity, it is considered as a hub SNP • SNP used in the network construction can be limited by applying a threshold on the connectivity of the SNP

  21. Topological Overlap Measure • Topological Overlap Measure(TOM) between SNPi and SNPj is calculated by: Where: Nij is the no. of SNPs which are connected to SNPi and SNPj Aij is the adjacency between SNPi and SNPj ci, cjis the SNP connectivity of SNPi and SNPj respectivity

  22. Network Construction • WISE constructs a SNP-SNP network through: • Forming a adjacency(connection strength) matrix of the SNPs • Raising the adjacency matrix to a power of K • Applying the Topological Overlap Measure (TOM) to measure the relation between each pair of SNPs • Using dissimilarity TOM (1-TOM) to build dendrogram

  23. Module Identification and Selection • WISE identify SNP modules in the network by: • Applying dynamic tree cut algorithm to splits the dendrogram into SNP modules • WISE select SNP modules by: • GMAT • Calculate eigenSNP(1st principle component) of each module • Choose the module with eigenSNP that have high correlation with other eigenSNP or with trait • Gene Ontology Enrichment (Based on GOEAST) • SNP module that have enriched GO-terms with a p-value < 0.05 is selected

  24. Experiment

  25. Experiment Overview • The experiment on WISE was conduced under these 3 different configurations:

  26. Dataset Source • Real Dataset: • Species: F2 pig resource population • Number of SNP: 39704 • Phenotype(trait): Carcass weight • Synthetic Dataset: • Generated by authors themselves • Number of SNPs: 1000 • Genotype distribution: normal distribution, range(-1,1) • Phenotypes distribution: N(0,1) • Pairwiseepistatic interaction distribution: Follow assumption stated in the paper “Epistasis and its contribution to genetic variance components”

  27. Experiment 1 • Parameters: • Power of the adjacency matrix: 5 • Number of SNP used in building dendrogram: 1500 • After applying dynamic tree cut algorithm: • 23 SNP modules are identified • In each SNP Module, there are ≥ 30 SNPs • After testing the modules with GMAT, 3 modules are identified: • Blue module • Cyan module • Turquoise module

  28. Experiment 1(2) Low connectivity has a high frequency Frequency drop exponentially as connectivity drop. Scale-free topology index is 0.92 ↓ It is a scale-free network Histogram of SNP connectivity. The log-log plot of the histogram

  29. Experiment 1(3) At the horizontal and vertical axis, the SNPs are assigned to modules and each module have a separate colour. SNP-SNP interaction are concentrated in the modules ↓ The clustering is done effectively Heat diagram of dendrogram build in experiment 1

  30. Experiment 1(4)

  31. Experiment 1(5) • NCBI2R and is applied to associate the SNP in the modules to the genes and pathway

  32. Experiment 2 • Parameters: • Power of the adjacency matrix: 4 • After applying dynamic tree cut algorithm: • 10 SNP modules are identified A Clear clustering of the simulated SNPs A small number of highly connected SNPs ↓ Network follows scale-free topology criterion.

  33. Experiment 2 • Parameters: • Power of the adjacency matrix: 4 • After applying dynamic tree cut algorithm: • 10 SNP modules are identified A Clear clustering of the simulated SNPs A small number of highly connected SNPs ↓ Network follows scale-free topology criterion.

  34. Experiment 3 • Parameters: • Power of the adjacency matrix: 4 • Number of SNP used in building dendrogram: 955 • After applying dynamic tree cut algorithm: • 5 SNP modules are identified • In each SNP Module, there are ≥ 25 SNPs • After testing the modules with GMAT, 1 modules are identified: • Red module • PI3K-Akt signalling pathway, synaptic vesicle cycle • Cell growth and insulin resistance

  35. Dendrogrambuild in experiment 3. The SNPs are assigned to modules and each module have a separate colour

  36. Discussion

  37. Discussion • Contribution • A WGCNA based approach in detecting SNP-SNP interaction. • Adopted new connection strength measurement • The concept of SNP modules is new • Pitfall • No comparison with existing SNP-SNP interaction detection program like MDR • Number of SNP in F2 pig resource population is not large enough to reveal the scalability of this approach under human dataset

More Related