1 / 34

RNA sequencing – a basic introduction

RNA sequencing – a basic introduction. ALLBIO – Scilife – UPPNEX – BILS course 12 -16 May 2014. Maja Molin , PhD Dept. of Medical Biochemistry and Microbiology, Uppsala University. Overview. Lecture Historical perspective – “past” and present techniques

redford
Download Presentation

RNA sequencing – a basic introduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RNA sequencing – a basic introduction ALLBIO – Scilife – UPPNEX – BILS course 12 -16 May 2014 MajaMolin, PhD Dept. of Medical Biochemistry and Microbiology, Uppsala University

  2. Overview • Lecture • Historical perspective – “past” and present techniques • An RNAseq experiment consist of many steps • Design experiment • Purify RNA • Prepare libraries • Sequence • Analysis Exercise RNA seq analysis using the de novo assembler Trinity

  3. Historical perspective – “past” and present techniques • “Past” • Sequencing -> Sanger sequencing of cDNA libraries • Limitations in the number of sequences • Redundancy due to highly expressed genes • Read length about 800bp -> poor in full-length • Prone to indelerrors • Global quantifications -> Expression microarrays • Sequences have to be known • Incomplete annotations • No discovery of novel transcripts • Hybridization-based method, problems with SNPs, Indels • Noise • Signal intensity is used to calculate the expression level of the gene

  4. Historical perspective – “past” and present techniques • Present • Sequencing -> Next-Gen Sequencing technologies • Several different platforms, Illumina, SOLiD, Ion Torrent, 454, PacBio • Short reads • Full-length transcripts • High dynamic range • Strand-specific sequencing • Sequencing errors are mostly substitutions • Applications • Global differential expression analysis • Characterization of alternative splicing, polyadenylation, transcription • Discovery of novel transcripts • SNP finding • RNA editing • Allelic gene expression

  5. An RNA seq experiment consist of many steps 1. Design experiment 2. Purify RNA 3. Prepare libraries 4. Sequence 5. Analysis • Design experiment • Is the primary aim qualitative or quantitative? • Qualitative/Annotation: • identify expressed transcripts, • exon/intron boundaries, TSS, poly-A sites. Sequence reads must cover the transcripts evenly, including both ends. Coverage depends on library prep and seq. depth • Quantitative/DGE: • meassure differences in gene expression, alternative splicing, TSS and poly-A sites between ≥2 groups Must accurately measure the counts of transcripts and the variances assoc. with the counts. Replicates are essential! • Other objectives? SNP finding, allelic gene expression, RNA editing? • Which sequencing technology, Illumina, SOLiD, Ion Torrent, 454, PacBio? http://rnaseq.uoregon.edu/

  6. An RNA seq experiment consist of many steps 1. Design experiment 2. Purify RNA 3. Prepare libraries 4. Sequence 5. Analysis • 2. Purify RNA • A cell contains many types of RNA, e.g • rRNA (>80%) • tRNA • mRNA (1-5% of totalRNA) • miRNA • ncRNA • snoRNA • Always use high quality and high purity RNA for sequencing • OD 260/280 ratio > 1.8, 260/230 ratio close to 2.0 • RIN > 8.0 • Measure concentration using Qubit • If RNA extraction is based on phenol (e.g. TRIzol) or organic methods -> RNA clean-up is recommended using e.g. columns to remove traces of phenol • DNaseI treatment of RNA is recommended

  7. An RNA seq experiment consist of many steps 1. Design experiment 2. Purify RNA 3. Prepare libraries 4. Sequence 5. Analysis • Prepare libraries • Library preparation by the platform or by you? • Library prep. needs to match the sequencing technology. • PolyA selection or rRNA depletion for mRNA sequencing? • PolyA selection isolates mRNA very efficiently but cannot be used for non-poly RNA. • rRNA depletion preserves non-polyARNAs, but less effective of removing all rRNA. • Single-end or paired end (PE allows more accurate mapping and is useful for isoform detection) • Strand-specific library (or non-stranded?) • Barcoding and Pooling

  8. Strand-specific (or non-stranded) library • Non-stranded library • Does not contain any information about which strand was originally transcribed • Strand-specific library • Preserve the information about which strand was transcribed • Anti-sense transcripts can be identified • Identify the exact boundaries of adjacent genes transcribed from opposite strands • Correct expression pattern of coding or non-coding overlapping transcripts • Often the default method today LevinJZ, et al. Comprehensive comparative analysis of strand-specific RNA sequencing methods.Nat Methods. 2010 Sep;7(9):709-15.

  9. Strand-specific (or non-stranded) library LevinJZ, et al. Comprehensive comparative analysis of strand-specific RNA sequencing methods.Nat Methods. 2010 Sep;7(9):709-15.

  10. Barcoding and pooling Total RNA mRNA Fragmented mRNA/cDNA Finished library Adapter cDNA insert Adapter Index • Barcoding and pooling: • Short 6-8 nt´s (index) are introduced as part of the adapters • Index provide unique identifier for each sample • The index allows pooling of samples to avoid lane effects and to use the sequencing capacity more efficiently

  11. An RNA seq experiment consist of many steps 1. Design experiment 2. Purify RNA 3. Prepare libraries 4. Sequence 5. Analysis • Sequence • Pooling strategy • Sequence depth 1 3 2 Control: 3 biological replicates Pool and sequence in one lane on IlluminaHiseq Pool and sequence in one lane on IlluminaHiseq Treated: 3 biological replicates 1 3 2 Pool and sequence in one lane on IlluminaHiseq Pool and sequence in one lane on IlluminaHiseq

  12. An RNA seq experiment consist of many steps 1. Design experiment 2. Purify RNA 3. Prepare libraries 4. Sequence 5. Analysis • Sequence • Pooling strategy • Sequence depth • 30M reads is sufficient to detect nearly all annotated chicken genes (15742). • 30M reads generate representative assemblies, good balance between coverage and noise. • >60M reads sequencing errors accumulate in highly expressed genes and few new genes are discovered • Increasing replicates is more important than increasing sequencing depth for DE analysis. Wang et al. BMC Bioinformatics 2011, 12(Suppl10):S5 Francis et al. BMC Genomics 2013, 14:167 Rapaport et al. Genome Biology 2013, 14:R95

  13. An RNA seq experiment consist of many steps 1. Design experiment 2. Purify RNA 3. Prepare libraries 4. Sequence 5. Analysis • Analysis • Quality check of sequence reads • Preprocessing of sequencing reads • De novo transcriptome assembly or aligning RNA-seq reads to a reference? • Annotation of transcripts/differential gene expression, downstream analysis

  14. Quality check of sequence reads • Illumina sequencing runs stores data in large text files called FASTQ (extension .fq or .fastq). • FastQfiles contain both the sequence and the quality of each base call for every read in the run. • Information about each read is listed on four consecutive lines • 1. Sequence ID beginning with @ • 2. Base calls (sequence) • 3. A plus sign • 4. Sequence quality codes @61G9EAAXX100520:5:100:10000:12335/1 CGGGTTAGAATCAACAAGTGTAGGAGGAACTTGGTAACGATGATTTAAATTATCTGCACTACGGTCGT + GGGFEGGGGFGGGGGGGGEGDGGEFGGEEFGGFFCFCGGEFFDEEEEAEGDEEBDEDCDEAEBCACED 1. 2. 3. 4.

  15. Quality check of sequence reads Paired-end Sequences One FastQ file with all the left (/1) reads @61G9EAAXX100520:5:100:10000:12335/1 CGGGTTAGAATCAACAAGTGTAGGAGGAACTTGGTAACGATGATTTAAATTATCTGCACTACGGTCGT + GGGFEGGGGFGGGGGGGGEGDGGEFGGEEFGGFFCFCGGEFFDEEEEAEGDEEBDEDCDEAEBCACED @61G9EAAXX100520:5:100:10000:14468/1 ACGAGTAATCTTGGTGGGGATACCAAGAGCTTGGAAGAAAGAGGTCTTACCGGGTTCCATACCAGTGT + GGGGGGGGGDGGGGBGGGGGGGGFDFGGGGGGGFEFGEFFGDEFDDEGGEEEEECDDFDEDDACDCDE cDNA insert One FastQ file with all the right (/2) reads @61G9EAAXX100520:5:100:10000:12335/2 GGATCTTTCACATTTGAAATGTCTCTTCCTCACCGTAATCCCTCATTGTCTTCCCTTCCAACTACTGG + GGDGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGEGFGGGGGGFFFGEFFGGGGGGGGDEEGEFGFG @61G9EAAXX100520:5:100:10000:14468/2 GTCTTCACCAACGCTGATTTGAAGGAAGTCCGTGAGACCATTATTGCTAATGTTATTGCTGCTCCTGC + GGFGGGGGDGGGGGGGGGGGFEGGFGGGEGGGFGGGGGFGGGGGGGGGGGGGDGBGFFFFFEEFEFFB

  16. Quality check of sequence reads using FastQC tool (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) Quality score across all reads in a file summarized by position. A good run will have quality score >28. If lower at some point, consider trimming.

  17. Quality check of sequence reads using FastQC tool (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) Shows if a subset of your sequences have overall low quality scores. If the most frequently observed mean quality is <27, a warning is raised. You can consider filtering your reads by average quality to keep only the best reads.

  18. An RNA seq experiment consist of many steps 1. Design experiment 2. Purify RNA 3. Prepare libraries 4. Sequence 5. Analysis • Analysis • Quality check of sequence reads • Preprocessing of sequencing read • De novo transcriptome assembly or aligning RNA-seq reads to a reference? • Annotation of transcripts/differential gene expression, downstream analysis

  19. Preprocessing of sequencing read using Trimmomatic (http://www.usadellab.org/cms/index.php?page=trimmomatic) Consider running FastQC again to check your trimming

  20. An RNA seq experiment consist of many steps 1. Design experiment 2. Purify RNA 3. Prepare libraries 4. Sequence 5. Analysis • Analysis • Quality check of sequence reads • Preprocessing of sequencing read • De novo transcriptome assembly or aligning RNA-seq reads to a reference? • Annotation of transcripts/differential gene expression, downstream analysis

  21. De novo transcriptome assembly or aligning RNA-seq reads to a reference? • Novel organism – little or no previous sequencing? • Non-model organism some sequences available (ESTs, Unigene set) • Genome-Sequenced organism– draft genome with maybe tens of chromosomes, some annotations etc. • Model organism – genome fully sequenced and annotated with multiple genomes available, well-annotated transcriptomes, genetic maps, available mutants etc.

  22. De novo transcriptome assembly or aligning RNA-seq reads to a reference? TopHat Cufflinks Haas BJ and Zody MC. Nat Biotechnol. 2010 May;28(5):421-3.

  23. De novo transcriptome assembly or aligning RNA-seq reads to a reference? Trinity Trans-ABySS Velvet-Oases SOAPdenovo-trans

  24. De novo assembly using Trinity • Trinity combines three independent software modules: • Inchworm • Chrysalis • Butterfly • Inchworm • kmer =short oligonucleotide of length k • All sequence reads are cut into overlapping kmers (25-mers). Each kmer overlap with its neighbor in all but one base. Martin and Wang, Nat. Rev. Genet. Oct 2011, vol 12:671-682

  25. De novo assembly using Trinity Inchworm algorithm Identifies seed kmer as most abundant kmer. Extend kmer at 3´end and at 5´end based on coverage For each extension, 4 possible kmers exists, each ending with one of the four nt´s. The most abundant cumulative ending wins! The assembled contig is reported and the assembled kmers are removed from the catalog and the whole process starts again.

  26. Inchworm algorithm 5 G G G G G 4 4 0 4 1 A A A A A 1 1 5 1 GATTACA GATTACA 9 9 T T T T 1 0 1 0 C C C C GATTACA 4 0 1 4 9

  27. 0 0 Inchworm algorithm 0 0 0 0 5 5 0 0 G G 4 4 A A GATTACA GATTACA C 9 9 0 T 0 A 6 0 A G 6 1 0 0 0 • Report the contig …….AGATTACAGA…... • Remove assembeldkmers from the catalog of all kmers and then repeat this step • Trinity default is set at a minimum kmer of 1 (all kmers are used) but with large datasets this parameter can be changed to min. kmer of 2

  28. De novo assembly using Trinity • Trinity combines three independent software modules: • Inchworm – linear contigs • Chrysalis – recluster/re-groups related contigs from Inchworm • Butterfly – reconstructs transcripts and alternatively spliced isoforms Trinity output – a fasta file with all the transcripts c2 is read cluster from Inchworm g0 is “gene” i1 is the isoform gene identifier

  29. An RNA seq experiment consist of many steps 1. Design experiment 2. Purify RNA 3. Prepare libraries 4. Sequence 5. Analysis • Analysis • Quality check of sequence reads • Preprocessing of sequencing read • De novo transcriptome assembly or aligning RNA-seq reads to a reference? • Annotation of transcripts/differential gene expression, downstream analysis

  30. An RNA seq experiment consist of many steps 1. Design experiment 2. Purify RNA 3. Prepare libraries 4. Sequence 5. Analysis • Design experiment • Is the primary aim qualitative/annotation or quantitative/Differential gene expression? • Qualitative/annotation

  31. An RNA seq experiment consist of many steps 1. Design experiment 2. Purify RNA 3. Prepare libraries 4. Sequence 5. Analysis • Design experiment • Is the primary aim qualitative/annotation or quantitative/Differential gene expression? • Quantitative/differential gene expression • The level of gene expression corresponds to read counts • Align reads to transcriptome assembly or reference genome • Calculate expression values/abundance estimation based on the mapped reads • Output is normalized expression values • Normalization based on both length of the transcript and total depth of the sequencing. • RPKM (Reads Per Kilobase per Million reads Mapped) • FPKM (Fragments Per Kilobase per Million reads mapped)

  32. Normalized read count/expression values 1. Low expression 2. High expression 3. Short transcript 4. Long transcript Read count Expression value (RPKM or FPKM) 1 2 3 4 1 2 3 4

  33. Summary • An RNAseq experiment consist of many steps • Design experiment • Purify RNA • Prepare libraries • Sequence • Analysis • Several different options to choose between at every step • De novo assembler Trinity

  34. Thank you! Questions? ALLBIO – Scilife – UPPNEX – BILS course 12 -16 May 2014 MajaMolin, PhD Dept. of Medical Biochemistry and Microbiology, Uppsala University

More Related