1 / 30

RNA- Seq as a Discovery Tool

RNA- Seq as a Discovery Tool. Julia Salzman. Deciphering the Genome. Power of RNA- Seq : Quantification and Discovery. RNA Isoform specific gene expression. G ene fusions. Overlooked RNA structural variants. Salzman, Gawad, Wang Lacayo, Brown, 2012. Paired-end RNA-Seq.

gerodi
Download Presentation

RNA- Seq as a Discovery Tool

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RNA-Seq as a Discovery Tool Julia Salzman

  2. Deciphering the Genome

  3. Power of RNA-Seq: Quantification and Discovery • RNA Isoform specific gene expression • Gene fusions • Overlooked RNA structural variants Salzman, Gawad, Wang Lacayo, Brown, 2012

  4. Paired-end RNA-Seq Matched sequences are obtained for each library molecule GGAC…..GCCT CTTC…..GAAG Data: millions of 70-150+ bp A/C/G/T sequences

  5. Part 1: Isoform Specific Expression

  6. Example: Paired-end Data Aligned Some reads are informative about isoform-specific expression

  7. Paired-end RNA-Seq for RNA Isoform Specific Gene Expression Exon 4 Exon 1 • Since the size distribution of library molecules is known, inferred insert lengths can be used to increase statistical power and inference Rnpep Goal: estimate the expression of each isoform? Nontrivial : we only observe fragments of sequences

  8. Insert Length Distributions Insert lengths of entire library (pooled) can be calculated and used to precisely estimate the distribution of sizes of cDNA in the library: Sequenced molecule length 100 200 300 Base pairs

  9. Paired-end RNA-Seq Model • Compute genome-wide insert length distribution Sequenced molecule length 100 200 300 Base pairs • Mapped to Isoform 1  length 150 • Mapped to Isoform 2  length 90 Salzman, Jiang, Wong 2011

  10. Using PE for quantification is statistically more powerful • PE model is a statistical improvement over naïve models and has optimal information reduction • “Information” gain using PE Sequencing • Overall, using “mate pair” information, more power, but sometimes experimental artifacts can effect results

  11. Paired-end Size Distributions are Foundation for Tophat and other PE-RNA Seq Algorithms • Summary and Problems: • rely on a reference • assume uniformity of size distributions in library • over look biases’ Rep.1 Rep.2

  12. Part 2: Gene Fusions

  13. Recurrent Gene Fusions in Cancer • A handful of recurrent fusions in solid tumors • PAX8 -PPARγ fusion (thyroid cancer) • EML4-ALK fusion (non small cell lung cancer) • TMPRSS2-ERG family fusion (prostate cancer) Not Genome-wide • More to be learned by unbiased study of RNA

  14. Fusion Discovery • 2 flavors • Totally “de novo” discovery • Search for any RNA fragments out of order with respect to the reference genome– not necessarily coinciding with exon boundaries • Noisy • Discovery with a reference database • Discover fusions at annotated exon boundaries (protein coding) and better statistical checks • Misses some fusions

  15. Reference Approach • Search for gene fusions with exon A in gene 1 spliced to exon B of gene 2 Exon A Exon B

  16. Algorithm (with respect to reference) • Remove all PE reads consistent with the reference • Identify gene pairs PE reads where (read1, read2) map to (gene1, gene2) • Find PE reads of the form: (gene A, gene A-B junction) Exon A Exon B

  17. Paired-End RNA-Seq for Gene Fusions in Ovarian Tumors • Paired-end sequencing of poly-A selected RNA from 12 late stage tumors– genome wide search • Top hit of our algorithm : ESRRA-C11orf20 • Isoform-specific estimation: ESRRA and the fusion are expressed at roughly equal magnitude (Salzman, Jiang, Wong) ESRRA Fusion Salzman et al, 2011 C11orf20 ESRRA Fusion C11orf20

  18. Part 3: Exploratory Analysis of RNA Rearrangements

  19. Bioinformatic Analysis • Thousands of exon scrambling events in RNA from human leukocytes and cancer samples Inconsistent with the reference genome! Wildtype genome: DNA Canonical transcript

  20. Potential Biological Mechanisms for RNA Rearrangements

  21. Analysis of Leukocyte Data • Exons in ‘scrambled’ (non-increasing) order with respect to canonical exon order • Thousands of genes with evidence of exon scrambling • Naïve estimate of fractional abundance of scrambled read rate: all read rate (per transcript)

  22. 100s of Transcripts with High Fractions of Scrambled Isoforms Canonical Isoform < 25% 100s of genes Scrambled Isoform > 75% 100s of transcripts from B cells, stem cells and neutrophils have >50% copies from scrambled isoform

  23. What Models Can Explain Exon Scrambling in RNA?

  24. Model 1 to Explain RNA Exon Scrambling

  25. Model 1 Prediction A subset of genes have evidence of tandem duplication in mRNA Can be made statistically precise Model 1 is statistically inconsistent with vast majority of data 2000- 1000- 100 - Transcripts with evidence Against Model 1 For Model 1

  26. Alternative Model Model and data are consistent

  27. Mining RNA-Seq Data for Evidence Consistent with Circular RNA? • In poly-A depleted samples, expect to seestrong evidence of scrambled exons (circular RNA) • In poly-A selected samples, expect to seelittle evidence of scrambled exons (circular RNA)

  28. Poly-A Depleted Samples Enriched for Scrambled Exons Align all reads to a custom database

  29. Summary of RNA-Seq for NGS • RNA-Seq can be used for discovery • Tophat and other fusion/splicing algorithms gives a broad picture • May have significant noise • Miss important features of RNA expression

  30. Currently, all published/downloadable algorithms will miss identifying circular RNA! (feel free to contact me for the algorithm to identify circular RNA!) • In poly-A depleted samples, expect to seestrong evidence of scrambled exons (circular RNA) • In poly-A selected samples, expect to seelittle evidence of scrambled exons (circular RNA)

More Related