1 / 25

RNA- seq library prep introduction

RNA- seq library prep introduction. NESCent Academy. Outline. Methodologies and history RNA- seq challenges Library preparation methods Common queries Validation Spike-in and future-proofing your work. Gene expression. RNA sequencing. Generate cDNA, fragment, size select, add linkers.

caelan
Download Presentation

RNA- seq library prep introduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RNA-seq library prep introduction NESCent Academy

  2. Outline • Methodologies and history • RNA-seq challenges • Library preparation methods • Common queries • Validation • Spike-in and future-proofing your work

  3. Gene expression

  4. RNA sequencing Generate cDNA, fragment, size select, add linkers Isolate RNAs Samples of interest Condition 1 (normal colon) Condition 2 (colon tumor) Sequence ends Map to genome, transcriptome, and predicted exon junctions 100s of millions of paired reads 10s of billions bases of sequence Downstream analysis

  5. Metholologies for RNA-Seq studies • Mapping transcription start sites • Strand-specific RNA-Seq • Characterization of alternative splicing patterns • Gene fusion detection • Targeted approaches using RNA-Seq • Small RNA profiling • Direct RNA sequencing • Profiling low-quantity RNA samples

  6. Pre NGS Transcriptomics • Hybridization-based approaches • Genomic tiling microarrays • Fluorescently labelled cDNA with microarrays • Sequence-based approaches • Sanger sequencing of cDNA or EST libraries • Serial analysis of gene expression (SAGE) • Cap analysis of gene expression (CAGE) • Massively parallel signature sequencing (MPSS)

  7. RNA-seq

  8. Challenges • RNAs consist of small exons that may be separated by large introns • Mapping reads to genome is challenging • The relative abundance of RNAs vary wildly • 105 – 107 orders of magnitude • Since RNA sequencing works by random sampling, a small fraction of highly expressed genes may consume the majority of reads • Ribosomal and mitochondrial genes • RNAs come in a wide range of sizes • Small RNAs must be captured separately • PolyA selection of large RNAs may result in 3’ end bias • RNA is fragile compared to DNA (easily degraded) • Bacterial samples may need to be depleted of rRNA

  9. Rubbish in = Rubbish out

  10. RNA-seq library prep methodologies • Two main routes for mRNA-seq preparation • Illumina TruSeq prep • Script-seq • Generally Script-seq is our favourite

  11. RNA Illumina Tru-Seq library prep 2 days for 8 samples Size selection step Adaptor ligation and standard library preparation 5ug of total RNA ~$100 per sample Not strand-specific

  12. Script-seq method 2 hours for 12 samples < 1ug of RNA ~$150 per sample Strand-specific

  13. DNA library preparation: RNA fragmentation and DNA fragmentation compared a | Fragmentation of oligo-dT primed cDNA (blue line) is more biased towards the 3' end of the transcript. RNA fragmentation (red line) provides more even coverage along the gene body, but is relatively depleted for both the 5' and 3' ends. Note that the ratio between the maximum and minimum expression level (or the dynamic range) for microarrays is 44, for RNA-Seq it is 9,560. The tag count is the average sequencing coverage for 5,000 yeast ORFs. b | A specific yeast gene, SES1 (seryl-tRNAsynthetase), is shown.

  14. Common questions: How much library depth is needed for RNA-seq? • My advice. Don’t ask this question if you want a simple answer… • Depends on a number of factors: • Question being asked of the data. Gene expression? Alternative expression? Mutation calling? • Tissue type, RNA preparation, quality of input RNA, library construction method, etc. • Sequencing type: read length, paired vs. unpaired, etc. • Computational approach and resources • Identify publications with similar goals • Pilot experiment • Good news: 1/8th -1 lane of recent Illumina HiSeq data should be enough for most purposes

  15. Coverage versus depth

  16. Common questions: What mapping strategy should I use for RNA-seq? • Depends on read length • < 50 bp reads • Use aligner like BWA and a genome + junction database • Junction database needs to be tailored to read length • Or you can use a standard junction database for all read lengths and an aligner that allows substring alignments for the junctions only (e.g. BLAST … slow). • Assembly strategy may also work (e.g. Trans-ABySS) • > 50 bp reads • Spliced aligner such as TopHat or Trinity

  17. Common questions: how reliable are expression predictions from RNA-seq? • Are novel exon-exon junctions real? • What proportion validate by RT-PCR and Sanger sequencing? • Are differential/alternative expression changes observed between tissues accurate? • How well do differential expression values correlate with qPCR? • 384 validations • qPCR, RT-PCR, Sanger sequencing • See ALEXA-Seq publication for details: • Also includes comparison to microarrays • Griffith et al. Alternative expression analysis by RNA sequencing. Nature Methods. 2010 Oct;7(10):843-847.

  18. Common questions: How many replicates? • As many as you can afford • Tophat/Cufflinks statistics work best with three or more biological replicates

  19. Validation (qualitative) 33 of 192 assays shown. Overall validation rate = 85%

  20. RNA-seqvs Microarray

  21. Spike-in controls • How can you identify limits of detection and ensure your data can be compared to future platforms or new library prep methods? (e.g. How does Oxford Nanopore compare to Illumina sequencing?) • Spike-in RNA to your total RNA which has a known concentration • http://tools.invitrogen.com/content/sfs/manuals/4455352C.pdf • Cost - $20 per sample

  22. RNA-seq spike-in protocol

  23. Assessing lower limit of detection

  24. Assessing fold change response

  25. Take home • Good quality total RNA of 1-10ug • Have 3 or more biological replicates • Unless you have good reason, use a Script-seq type protocol • Use a standard spike-in as an internal control and to ensure samples can be compared across platforms • Don’t forget to validate key findings with qPCR!

More Related