360 likes | 503 Views
mRNA - Seq : methods and applications. Jim Noonan GENE 760. Introduction to mRNA- seq. Technical methodology Read mapping and normalization Estimating isoform-level gene expression De novo transcript reconstruction Sensitivity and sequencing depth Differential expression analysis.
E N D
mRNA-Seq: methods and applications Jim Noonan GENE 760
Introduction to mRNA-seq • Technical methodology • Read mapping and normalization • Estimating isoform-level gene expression • De novo transcript reconstruction • Sensitivity and sequencing depth • Differential expression analysis
mRNA-seq workflow Wang et al. Nat Rev Genet 10:57 (2009) Martin and Wang Nat Rev Genet 12:671 (2011)
Illumina RNA-seq library preparation • Capture poly-A RNA with poly-T oligo attached beads (100 ng total) (2x) • RNA quality must be high – degradation produces 3’ bias • Non-poly-A RNAs are not recovered Fragment mRNA Synthesize ds cDNA Ligate adapters Amplify Generate clusters and sequence
Ribosomal RNA subtraction RiboMinus
RNA-seq reads mapped to a reference genome Normalization : Reads per kilobase of feature length per million mapped reads (RPKM) • What is a “feature?” • What about genomes with poor genome annotation? • What about species with no sequenced genome? For a detailed comparison of normalization methods, see: Bullard et al. BMC Bioinformatics 11:94 (2010). Robinson and Oshlack, Genome Biol 11:R25 (2010)
Quantifying gene expression by RNA-seq • Use existing gene annotation: • Align to genome plus annotated splices • Depends on high-quality gene annotation • Which annotation to use: RefSeq, GENCODE, UCSC? • Isoform quantification? • Identifying novel transcripts? • Reference-guided alignments: • Align to genome sequence • Infer splice events from reads • Allows transcriptome analyses of genomes with poor gene annotation • De novo transcript assembly: • Assemble transcripts directly from reads • Allows transcriptome analyses of species without reference genomes
Composite gene model approach Map reads to genome Map remaining reads to known splice junctions • Requires good gene models • Isoforms are ignored
Strategies for transcript assembly Garber et al. Nat Methods 8:469 (2011)
Splice-aware short read aligners Martin and Wang Nat Rev Genet 12:671 (2011)
Reference based transcript assembly Martin and Wang Nat Rev Genet 12:671 (2011)
Transcript assembly programs Martin and Wang Nat Rev Genet 12:671 (2011)
Cufflinks: ab initio transcript assembly Step 1: map reads to reference genome Trapnell et al. Nat. Biotechnology 28:511 (2010)
Cufflinks: ab initio transcript assembly Isoform abundances estimated by maximum likelihood Trapnell et al. Nat. Biotechnology 28:511 (2010)
Graph-based transcript assembly Martin and Wang Nat Rev Genet 12:671 (2011)
Graph-based transcript assembly Martin and Wang Nat Rev Genet 12:671 (2011)
Trinity: de novo transcript assembly Grabherr et al. Nat Biotechnol 29:644 (2011)
What depth of sequencing is required to characterize a transcriptome? Wang et al. Nat Rev Genet 10:57 (2009)
Considerations • Gene length: • Long genes are detected before short genes • Expression level: • High expressors are detected before low expressors • Complexity of the transcriptome: • Tissues with many cell types require more sequencing • Feature type • Composite gene models • Common isoforms • Rare isoforms • Detection vs. quantification • Obtaining confident expression level estimates (e.g., “stable” RPKMs) requires greater coverage
Transcript detection is biased in favor of long genes Tarazona et al. Genome Res 21:2213 (2011)
Applications of mRNA-seq • Characterizing transcriptome complexity • Alternative splicing • Differential expression analysis • Gene- and isoform-level expression comparisons • Novel RNA species • lincRNAs and eRNAs • Pervasive transcription • Translation • Ribosome profiling • Allele-specific expression • Effect of genetic variation on gene expression • Imprinting • RNA editing • Novel events
Alternative isoform regulation in human tissue transcriptomes Wang et al Nature 456:470 (2008)
Diversity of alternative splicing events in human tissues Wang et al. Nature 456:470 (2008)
Differential expression Garber et al. Nat Methods 8:469 (2011)
Differential expression: Characterizing transcriptome dynamics during brain development Neuronal functions synaptic transmission cell adhesion Embryonic mouse cortex RNA-seq Neuronal migration DEX “Stemness” functions Cell cycle M phase Sox2, Oct4 Ayoubet al PNAS 1086:14950 (2011)
Differential expression: Characterizing transcriptome dynamics during brain development Differential isoforms Embryonic mouse cortex RNA-seq DE isoforms Ayoubet al PNAS 1086:14950 (2011)
Novel RNA species: annotating lincRNAs Guttmanet al Nat Biotechnol28:503 (2010)
Enhancer-associated RNAs (eRNAs) Neurons treated with KCL Kim et al Nature 465:182 (2010)
Enhancer-associated RNAs (eRNAs) Ren B. Nature 465:173 (2010)
How much of the genome is transcribed? van Bakelet al. PLoS Biol. 8:e1000371 (2010)
Exploiting sequence information in RNA-seq reads Majewski and Pastinen. Trends Genet 27:72 (2011)
Detecting variants that affect splicing Pickrellet al . Nature 464:768 (2010)
Summary: mRNA-seq applications • Quantify transcriptome complexity and compare across biological states • Determine how transcriptomes are translated in • different biological contexts • Effect of genetic variation on gene expression • Imprinting and RNA editing