1 / 69

Introduction to RNA- seq

Introduction to RNA- seq. Joel Parker, Ph.D. Why mRNAseq ?. Measurement of differential expression There are at least four compelling reasons for choosing mRNA- seq instead of microarray based technologies Specificity of what is being measured Reduced technical (batch) bias

mgilbert
Download Presentation

Introduction to RNA- seq

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to RNA-seq Joel Parker, Ph.D.

  2. Why mRNAseq? • Measurement of differential expression • There are at least four compelling reasons for choosing mRNA-seq instead of microarray based technologies • Specificity of what is being measured • Reduced technical (batch) bias • Increased dynamic range and log ratio (FC) estimates • More sensitive detection of genes, transcripts, and differential expression • Other reasons • Detection of expressed SNVs • Detection of fusions and other structural variations • No transcriptome definition is needed • No probes need to be designed or manufactured • Cost (will soon be equivalent on a per assay basis with microarray)

  3. Why mRNAseq? – Reduced Bias Cell types separate biologically CD19 CD8CD14CD4

  4. Why mRNAseq? – Reduced Processing Bias Client’s miRNAseq samples sequenced on 4 different machines at 2 different sites at different times over several months with no apparent bias in the top principal components GAIIx HS-01HS-02 HS-IL

  5. Library preparation mRNA RNA Capture Enrichment via hybridization Total RNA Depletion of rRNA via hybridization Blood, MT, etc

  6. PMID: 24888378

  7. mRNA

  8. Sequencing parameters Read Length Trapnell et al., Nature Biotechnology31,46–53(2013) Precision = PPV; Recall = Sensitivity

  9. Detection is Dependent on Depth PMID: 24888378

  10. Liu et al., Bioinformatics (2014) 30 (3): 301-304.

  11. Computational Processing • Technical variation (batch effects) from library preparation and sequencing are small, and the sequencing strategy directs the level of repeatability and detection, especially depth • The raw results of sequencing require significant computational processing • Alignment : Maximizing unambiguous alignments; Alignment of reads that cross exon junctions; Ex: Bowtie, BWA, TopHat, Mapsplice, STAR, . . • Abundance estimation : Gene or transcript; Handling alignments that are ambiguous in the transcriptome; Ex: Sailfish, RSEM, Cufflinks, MISO, Salmon, IsoEM, IsoInfer, Rseq, . . . • Normalization of read counts : Minimizing bias due to variation in number of clusters available; Ex: Total count (RPM), Upper quartile, quantile, density • Different algorithmic and computational strategies, reference genome and transcriptome definition, impact performance much more than SE vs. PE, 50 bp vs. 100 bp.

  12. Alignment BWA, Bowtie alignment to transcriptome X X X X X X Trinity, Trans-Abyss X X X X Transcriptome Count alignments

  13. Example Concordant Gene V2 V1 http://www.broadinstitute.org/igv/

  14. Example Discordant 1 Gene V2 V1

  15. Example Discordant 2 Gene V2 V1

  16. Alignment TopHat, MapSplice, STAR Trinity, Trans-Abyss

  17. Alignment Comparison Engstrom et al., Nature Methods 10, 1185-1191 (2013)

  18. Alignment Comparison Splice Junction Accuracy Engstrom et al., Nature Methods 10, 1185-1191 (2013)

  19. Computational Processing • Technical variation (batch effects) from library preparation and sequencing are small, and the sequencing strategy directs the level of repeatability and detection, especially depth • The raw results of sequencing require significant computational processing • Alignment : Maximizing unambiguous alignments; Alignment of reads that cross exon junctions; Ex: Bowtie, BWA, TopHat • Abundance estimation : Gene or transcript; Handling alignments that are ambiguous in the transcriptome; Ex: Sailfish, RSEM, Cufflinks, MISO, IsoEM, IsoInfer, Rseq, . . . • Normalization of read counts : Minimizing bias due to variation in number of clusters available; Ex: Total count (RPM), Upper quartile, quantile, density • Different algorithmic and computational strategies, especially the transcriptome definition, impact performance much more than SE vs. PE, 50 bp vs. 100 bp.

  20. Multireads: Reads Mapping to Multiple Genes/Transcripts HTSeq << PMID: 28784092 Wang X, Wu Z, Zhang X. Isoform abundance inference provides a more accurate estimation of gene expression levels in RNA-seq. J BioinformComput Biol. 2010 Dec;8 Suppl 1:177-92. PubMed PMID: 21155027.

  21. Multireads: Reads Mapping to Multiple Genes/Transcripts 350 200 1 Long 150 100 300 2 Medium Multireads 50 200 3 Short Unique Relative abundance for these genes, f1, f2, f3 N

  22. Approach 1: Ignore Multireads 350 200 1 Long 150 100 300 2 Medium 50 200 3 Short Relative abundance for these genes, f1, f2, f3 Nagalakshmi et. al. Science. 2008 Marioni, et. al. Genome Research 2008

  23. Approach 1: Ignore Multireads 350 200 1 Long 150 100 300 2 Medium 50 200 3 Short • Over-estimates the abundance of genes with unique reads • Under-estimates the abundance of genes with multireads • Not an option at all, if interested in isoform expression N

  24. Approach 2: Allocate Fraction of Multireads Using Estimates From Uniques 350 200 1 Long 150 100 300 2 Medium 50 200 3 Short Relative abundance for these genes, f1, f2, f3 Ali Mortazavi, et. al. Nature Methods 2008 Sailfish, RSEM,Cufflinks N

  25. PMID: 20436464 Cufflinks

  26. RSEM • Li and Dewey, 2011 • PMID: 21816040 θirepresents the probability that a fragment is derived from transcript i A) PE isoform; B) PE gene; C) SE isoform; D) SE gene

  27. Salmon Novelties • Streaming variationalBayes (VB) inference combined with batched VB or EM • Lightweight alignment through maximal exact matches • Transcript / gene abundance inference is abstracted from the alignment step [RSEM also permits this; sam-xlate in https://github.com/mozack/ubu/wiki]

  28. Repeatability & Detection by Isoform Database • Larger reference transcriptomes result in reduced repeatability (left), but increased detection (right) • Detection - 73% of RefSeq, 66% of UCSC, and 52% of Ensembl

  29. Computational Processing • Technical variation (batch effects) from library preparation and sequencing are small, and the sequencing strategy directs the level of repeatability and detection, especially depth • The raw results of sequencing require significant computational processing • Alignment : Maximizing unambiguous alignments; Alignment of reads that cross exon junctions; Ex: Bowtie, BWA, TopHat • Abundance estimation : Gene or transcript; Handling alignments that are ambiguous in the transcriptome; Ex: Sailfish, RSEM, Cufflinks, MISO, IsoEM, IsoInfer, Rseq, . . . • Normalization of read counts : Minimizing bias due to variation in number of clusters available; Ex: Total count (RPM), Upper quartile, quantile, density • Different algorithmic and computational strategies, especially the transcriptome definition, impact performance much more than SE vs. PE, 50 bp vs. 100 bp.

More Related