1 / 29

ENCODE Journal Club David Vandenbergh Marta Byrska -Bishop Jan 8, 2013

Impact of functional information on understanding (genetic) variation ENCODE thread 12 + a bit more. ENCODE Journal Club David Vandenbergh Marta Byrska -Bishop Jan 8, 2013. Titles of the 3 papers. An integrated encyclopedia of DNA elements in the human genome. ENCODE Project Consortium.

lindley
Download Presentation

ENCODE Journal Club David Vandenbergh Marta Byrska -Bishop Jan 8, 2013

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Impact of functional information on understanding (genetic) variationENCODE thread 12 + a bit more ENCODE Journal Club David Vandenbergh Marta Byrska-Bishop Jan 8, 2013

  2. Titles of the 3 papers An integrated encyclopedia of DNA elements in the human genome. ENCODE Project Consortium. 2. Annotation of functional variation in personal genomes using RegulomeDB. Boyle AP et al. 3. Linking disease associations with regulatory information in the human genome. Schaub MA et al.

  3. Preliminary points We will try to avoid duplicating what has already been covered except as refresher background. We will devote a significant amount of time to Maurano et al. (Stamatoyannopoulos), “Systematic Localization of Common Disease-Associated Variation in Regulatory DNA,” Science 337:1190, 2012. Novices (undergrads and grad students) maybe be helped by following 4 presentations by Mike Pazin made at a satellite meeting to the 2012 ASHG conference: How to display and download ENCODE data Using USCS genome browser sessions for ENCODE data analysis Displaying and downloading ENCODE data Using HaploReg and RegulomeDB to mine ENCODE data

  4. The major reason this thread is of interest: There are thousands of genetic associations that have not been followed up by functional analyses because we did not know what to propose to study. “Notably, 88% of associated SNPs are either intronicor intergenic.74” • ENCODE provides possible functions to test in future studies: • RNA transcription • Protein coding (we all knew this one already) • Chromatin structure (up to 12 different histone mod’s and DHSS) • DNA methylation • Chromosome interacting regions But it works the other way too. If you have a region of interest (TFBS), you can find SNPs that have been associated with disease.

  5. Allele-specific gene expression can be inferred from ENCODE reads. Paternal Bias e.g. Cor T e.g. A or G Paternal Bias Maternal Bias Consortium paper, Fig 8a

  6. Global test of SNPs associated with disease: They are significantly enriched for being located in TFBS’s and DHS’s GWAS SNPs are particularly enriched in the segmentation classes associated with enhancers and TSSs across several cell types (Supp Fig. 2, section M). Consortium paper, Fig 10a

  7. An example: SNPs associated with Crohn’s disease from a genomic desert are associated with TFBSs and DHSs Consortium paper, Fig 10c

  8. Annotation of functional variation in personal genomes using RegulomeDB Boyle AP et al.

  9. Challenge ~95% of known variants w/in sequenced human genomes and 88% of GWAS variants fall outside of coding regions and have been difficult to interpret Solution Use ENCODE regulatory information to identify functional variants RegulomeDB Boyle AP et al.

  10. Functional SNPs Any SNP that appears in a region identified as associated with a biochemical event in at least one ENCODE cell line. SNPs overlapping coding and non-coding transcripts SNPs overlapping potentially regulatory regions eg. ChIP-seq peaks and Dnase I-hypersensitive sites Schaub MA et al.

  11. RegulomeDB • Database which integrates a large collection of regulatory information • Approach that enables the functional assignment of regulatory information • onto any set of variants Boyle AP et al.

  12. Web-based interface at http://www.RegulomeDB.org enter lists of SNPs Protein Binding Motifs Chromatin Structure eQTLs Histone Modifications Related Data Supplemental Figure 2. Boyle AP et al.

  13. RegulomeDB scoring system based on functional confidence of a variant confidence Boyle AP et al.

  14. Validation of the scoring system: GWAS SNPs are enriched for heuristic categories Table S1.

  15. RegulomeDB in action: Regulatory variation across 69 individuals 19,124,349 SNVs total in 69 individuals 3,870,827 SNVs on average per person ~56% SNVs fall w/in at least one RegulomeDB annotation 9% SNVs affects protein coding sequences alone This variant directly affects binding of NFKB Figure 1. Boyle AP et al.

  16. Average frequency of SNVs in each RegulomeDB category and feature Supplemental Table 2. Figure 2. Boyle AP et al.

  17. Regulatory annotation of an individual genome Figure 2. 0.01% Gencode v7 PolyPhen-2 2% 0.07% Loss of function on both alleles RegulomeDB Figure 3. 0.69% 98% 1.9% Many variants are predicted to affect regulatory elements in noncoding regions, but they likely have smaller effect sizes than coding variants. Boyle AP et al.

  18. Conclusion RegulomeDB is a powerful tool that scores variants to help distinguish functional variants from a large pool and provides a small set of putative sites with testable hypotheses as to their function Boyle AP et al.

  19. Linking disease associations with regulatory information in the human genome Schaub MA et al.

  20. Genome-wide Association Studies (GWAS) Major challenge: Most detected associations point to larger regions of correlated variants. Which SNP in the LD block has a biological link with the phenotype? Schaub MA et al.

  21. RegulomeDB Associate GWAS annotations with functional data Schaub MA et al.

  22. Identification of functional SNPs – using LD information to integrate GWAS with ENCODE data and eQTLs Figure 1. Schematic overview of the functional SNP approach Schaub MA et al.

  23. Many GWAS SNPs overlap ENCODE data Analysis: Used GENCODE v7 and RegulomeDB to annotate 5,694 curated associations from the NHGRI GWAS catalog (total of 4,724 distinct SNPs associated with a total of 470 different phenotypes) 44.8% 58% 81% Figure 2. Proportions of associations for different types of functional data. Schaub MA et al.

  24. The associated SNP reported in GWAS is not the most likely to play a biological role in the phenotype ENCODE data can be used to compare between multiple functional SNPs that are in LD with a lead SNP (two-step approach) Schaub MA et al.

  25. Associated regions are significantly enriched for functional SNPs • A subset of 2,364 lead SNPs compared to 100 random matched SNP sets Figure 3. Schaub MA et al.

  26. Examining transcription factor occupancy sites overlapping lead SNPs Associated SNPs can be grouped to search for patterns at the phenotype level. Height & CTCF Prostate cancer & AR Figure 4. Schaub MA et al.

  27. Specific example: novel functional SNP rs7163757 Lead SNP Type 2 diabetes Figure 5. Schaub MA et al.

  28. Functional information and linkage disequilibrium patterns support the implication of rs1333047 in coronary artery disease. 9p21 region gene desert Lead SNP coronary artery disease Novel association Figure 6. Schaub MA et al.

  29. Conclusions 1. Genome-wide experimental data sets generated by ENCODE can be successfully used to provide putative functional annotations for GWAS SNPs. 2. Majority of known GWAS associations overlap a functional region or are in strong LD with a SNP overlapping a functional region. 3. In most cases there is more functional evidence supporting another SNP in strong LD with the lead SNP than the lead SNP itself Schaub MA et al.

More Related