1 / 38

Understanding GWAS SNPs

Understanding GWAS SNPs. Xiaole Shirley Liu Stat 115/215. GWAS SNPs. Association <> Causal What ’ s the most likely causal SNP / Gene in LD with the genotyped SNP? Use functional genomics to identify the disease tissue of origin What ’ s the SNP doing in non-coding regions? RSNPs.

pules
Download Presentation

Understanding GWAS SNPs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Understanding GWAS SNPs Xiaole Shirley Liu Stat 115/215

  2. GWAS SNPs • Association <> Causal • What’s the most likely causal SNP / Gene in LD with the genotyped SNP? • Use functional genomics to identify the disease tissue of origin • What’s the SNP doing in non-coding regions? RSNPs

  3. Use Literature & Pathway Information to Identify Putative Causal SNPs / Genes

  4. Each Gene has an NCBI Page

  5. Especially Bibliography

  6. And Pathways

  7. Literature Mining Terms • Corpus: Collection of documents. E.g.all papers in PubMed • Term frequency: Number of times a word appears in a document. E.g. “polymerase” appeared 41 times in a paper • Document frequency: Number of documents a word appears in. E.g. 1234x papers has the word “transcription” • Collection frequency: Total number of times a word appears in a corpus. E.g. “transcription” appeared 6789X times in all of PubMed indexed papers • Stop words: Words in the corpus that contribute little to meaning. E.g. to, is, an • Stemming: Group together different variations of the same word. E.g. activate vs. activated vs. activating

  8. Documents Represented as Vectors • A document is summarized as a vector of word counts. • Each dimension contains the number of times a word appears. • Can calculate similarity between two documents by comparing their vectors • ”Our analysis includes comparison of amino acid environments with random control environments as well as with each of the other amino acid environments.” acid 2 amino 2 analysis 1 comparison 1 control 1 environments 2 […] our 1

  9. Comparing Two Documents • Intuitive comparison between two papers  correlation coefficient of their word occurrence vectors • Correlation measures the strength of linear relationship between two random variables a = c(1, 3, 5, 1, 8, 20, 0, 0, 0, 3, 1) b = c(2, 3, 4, 0, 10, 25, 1, 0, 2, 4, 3) c = c(2, 0, 1, 10, 2, 4, 7, 1, 5, 0, 8) cor(a, b) 0.985615 Correlated cor(b, c) -0.110328 Not correlated

  10. Term Weighting Considerations • Give different terms different weight • Global weight • Document frequency

  11. Term Weighting Considerations • Give different terms different weight • Global weight • Document frequency: Fewer documents, more weight: log(N / df). E.g. progesterone vs gene • Local weight • Term frequency

  12. Term Weighting Considerations • Give different terms different weight • Global weight • Document frequency: Fewer documents, more weight: log(N / df). E.g. progesterone vs gene • Local weight • Term frequency: More frequent, more weight: 1 + log(tf). E.g. progesterone: 10 times in paper1 vs 3 in paper2 • Document length

  13. Term Weighting Considerations • Give different terms different weight • Global weight • Document frequency: Fewer documents, more weight: log(N / df). E.g. progesterone vs gene • Local weight • Term frequency: More frequent, more weight: 1 + log(tf). E.g. progesterone: 10 times in paper1 vs 3 in paper2 • Document length: Less weight for longer document. E.g. paper1 200 pages vs paper2 3 pages

  14. Evaluate Related of Papers • Related Articles • Similarity between two documents: all terms (local wt1 × local wt2 × global wt) • Pre-computed related articles for each citation • Rank ordered by relevance

  15. GRAIL: Gene Relationships Across Implicated Loci Raychaudhuri et al PLOS Genetics 2009

  16. GRAIL: Gene Relationships Across Implicated Loci

  17. GRAIL: Gene Relationships Across Implicated Loci

  18. GRAIL: Gene Relationships Across Implicated Loci

  19. GRAIL on Height SNPs

  20. GRAIL on Crohn’s Disease • Use literature / pathways to identify potential causal gene • Find likely reproducible SNP hits, and increase statistical power

  21. GWAS SNPs • Association <> Causal • What’s the most likely causal SNP / Gene in LD with the genotyped SNP? • Use functional genomics to identify the disease tissue of origin • What’s the SNP doing in non-coding regions? RSNPs

  22. Identifying Causal Cell-type for Complex Disease • E.g. Rheumatoid Arthritis (RA) • Many cell types implicated over the years, ranging from neutrophils, synoviocytes, and all classes of lymphocytes! • It is difficult to establish causality complex phenotypes in human • Use expression data: Comprehensive and unbiased, publicly available

  23. Immunological Genome Project • Start with a list of disease SNPs • Find genes near the SNP that are specifically expressed in a cell type • Identify cell types that have many such genes ... more than expected by chance

  24. Identifying Causal Cell-type for Complex Disease From Expression • Negative control: simulation from random set of SNPs • P-value: proportion of simulations exceeding the observed enrichment Hu et al, American Journal of Human Genetics, 2011

  25. GWAS SNPs • Association <> Causal • What’s the most likely causal SNP / Gene in LD with the genotyped SNP? • Use functional genomics to identify the disease tissue of origin • What’s the SNP doing in non-coding regions? eQTL and RSNPs

  26. eQTL • eQTL: use expression as phenotype • Are there SNPs that are associated with expression changes? • Heritable genetic variation for transcription levels

  27. RSNPs • A SNP influences TF binding, affecting downstream (disease-related) gene expression

  28. eQTL and RSNPs • eQTL: use expression as phenotype • Are there SNPs that are associated with expression changes? • Heritable genetic variation for transcription levels • RSNP: regulatory SNP • Much of the influential variation is located cis- to the coding locus • In humans, mouse, and maize, 35%-50% of the genetic basis for intraspecific differences in transcription level are cis- to the coding locus (e.g. Morley et al. 2004; Schadtet al. 2003; Stranger et al. 2005; Cheung et al. 2005, etc.).

  29. Huang et al, Nat Genet 2014

  30. RSNPs from GWAS • Enriched in regulatory sequences (promoters and enhancers) that are identified through histone mark ChIP-seq or DNase-seq Maurano et al, Science 2012

  31. Highest Correlated Genes of Distal DHSs Harboring GWAS Variants

  32. Trans-Effect of Cis-SNPs • Three risk loci for ESR1, MYC, and KLF4 • Effect on TF expression is small, but much strong when looking at the expression of their downstream target genes Li et al, Cell 2013

  33. Useful Tools to Understand RSNPs • Identify putative TFs whose binding might be influences by SNPs based on ENCODE ChIP-seq / DNase-seq data

  34. Understanding GWAS SNPs • Association <> Causal • Use literature and pathways to identify the putative causal SNP / Gene in LD with the genotyped SNP • Use (cell-type specific) expression and epigenomics to: • Identify the disease tissue of origin • Identify regulatory SNPs that affect TF binding and influence the expression of important downstream disease genes

  35. Acknowledgement • SoumyaRaychaudhuri • ManolisDermitzakis

More Related