1 / 61

RNA Sequencing and transcriptome reconstruction Manfred G. Grabherr

RNA Sequencing and transcriptome reconstruction Manfred G. Grabherr. A historic perspective. Traditional: sequence cDNA libraries by Sanger Tens of thousands of pairs at most (20K genes in mammal) Redundancy due to highly expressed genes Not only coding genes are transcribed

brianh
Download Presentation

RNA Sequencing and transcriptome reconstruction Manfred G. Grabherr

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RNA Sequencing and transcriptome reconstruction Manfred G. Grabherr

  2. A historic perspective • Traditional: sequence cDNA libraries by Sanger • Tens of thousands of pairs at most (20K genes in mammal) • Redundancy due to highly expressed genes • Not only coding genes are transcribed • Poor full-lengthness (read length about 800bp) • Indels are the dominant error mode in Sanger (frameshifts)

  3. A historic perspective • Quantification: microarrays • Sequences have to be known • Annotations are often incomplete • No novel transcripts • Hybridization bias (SNPs) • Noise

  4. Next-Gen Sequencing technologies • 1 Lane of HiSeq yields 30GB in sequence • Short reads (100nt), but: • Good depth, high dynamic range • Full-length transcripts • Novel transcripts • Allow for expression quantification • Error patterns are mostly substitutions • Strand-specific libraries

  5. Strategy: read mapping vs. de novo assembly Haas and Zody, Nature Biotechnology 28, 421–423 (2010)

  6. Strategy: read mapping vs. de novo assembly Good reference No genome Haas and Zody, Nature Biotechnology 28, 421–423 (2010)

  7. Leveraging RNA-Seq for Genome-free Transcriptome Studies Brian Haas

  8. A Paradigm for GenomicResearch WGS Sequencing Assemble Draft Genome Scaffolds Methylation Tx-factorbinding sites SNPs Proteins

  9. A Paradigm for GenomicResearch RNA-Seq WGS Sequencing Assemble Align Draft Genome Scaffolds Transcripts Methylation Tx-factorbinding sites Expression SNPs Proteins

  10. A Maturing Paradigm for TranscriptomeResearch RNA-Seq WGS Sequencing Assemble Align Assemble Draft Genome Scaffolds Transcripts Methylation Tx-factorbinding sites Expression SNPs Proteins

  11. A Maturing Paradigm for TranscriptomeResearch RNA-Seq $$$$$ $$$$$ $$$$$ $$$$$ WGS Sequencing + $ Assemble Align Assemble Draft Genome Scaffolds $ Transcripts Methylation Tx-factorbinding sites Expression SNPs Proteins

  12. A Maturing Paradigm for TranscriptomeResearch RNA-Seq $$$$$ $$$$$ $$$$$ $$$$$ WGS Sequencing + $ Assemble Align Assemble Draft Genome Scaffolds $ Transcripts Methylation Tx-factorbinding sites Expression SNPs Proteins

  13. A Maturing Paradigm for TranscriptomeResearch RNA-Seq $$$$$ $$$$$ $$$$$ $$$$$ WGS Sequencing + $ Assemble Align Assemble Draft Genome Scaffolds $ Transcripts Methylation Tx-factorbinding sites Expression SNPs Proteins

  14. A Maturing Paradigm for TranscriptomeResearch RNA-Seq $$$$$ $$$$$ $$$$$ $$$$$ WGS Sequencing + $ Assemble Align Assemble Draft Genome Scaffolds $ Transcripts Methylation Tx-factorbinding sites Expression SNPs Proteins

  15. De-novo transcriptome assembly Brian Haas Moran Yassour Kerstin Lindblad-Toh Aviv Regev NirFriedman David Eccles AlexiePapanicolaou Michael Ott …

  16. The problem Transcript

  17. The problem Transcript Reads

  18. The problem Transcript Reads Assembly Transcript

  19. The problem Transcript Paralog A Paralog B Reads Assembly Transcript

  20. The problem Transcript Isoform A Isoform B Reads Assembly Transcript

  21. Transcriptome vs. Genome assembly • Genome: • Large • High coverage • Long mate pairs (hard to make) • Linear sequences • Even coverage • Transcriptome: • Smaller • Standard paired-end Illumina (1 lane) • Multiple solutions (alternative splicing) • Uneven coverage (expression)

  22. Transcriptome vs. Genome assembly • Genome: • Large • High coverage • Long mate pairs (hard to make) • Linear sequences • Even coverage • Transcriptome: • Smaller • Standard paired-end Illumina (1 lane) • Multiple solutions (alternative splicing) • Uneven coverage (expression) • In common: k-mer based approach 

  23. The k-mer • K consecutive nucleotides Reads K-mers Graph

  24. The de Bruijn Graph • Graph of overlapping sequences • Intended for cryptology • Fixed length element: k • CTTGGAA • TTGGAAC • TGGAACA • GGAACAA • GAACAAT

  25. The de Bruijn Graph • Graph has “nodes” and “edges” • G GGCAATTGACTTTT… • CTTGGAACAAT TGAATT • A GAAGGGAGTTCCACT…

  26. Iyer MK, Chinnaiyan AM (2011) Nature Biotechnology29, 599–600

  27. Iyer MK, Chinnaiyan AM (2011) Nature Biotechnology29, 599–600

  28. Iyer MK, Chinnaiyan AM (2011) Nature Biotechnology29, 599–600

  29. Iyer MK, Chinnaiyan AM (2011) Nature Biotechnology29, 599–600

  30. Inchworm Algorithm Decompose all reads into overlapping Kmers (25-mers) Identify seed kmer as most abundant Kmer, ignoring low-complexity kmers. Extend kmer at 3’ end, guided by coverage. G A GATTACA 9 T C

  31. Inchworm Algorithm G 4 A GATTACA 9 T C

  32. Inchworm Algorithm G 4 A 1 GATTACA 9 T C

  33. Inchworm Algorithm G 4 A 1 GATTACA 9 T 0 C

  34. Inchworm Algorithm G 4 A 1 GATTACA 9 T 0 C 4

  35. Inchworm Algorithm G 4 A 1 GATTACA 9 T 0 C 4

  36. Inchworm Algorithm G A 0 5 T 1 G C 4 0 A 1 GATTACA 9 T 0 G C 1 4 A 1 T C 1 1

  37. Inchworm Algorithm G A 0 5 T 1 G C 4 0 A 1 GATTACA 9 T 0 G C 1 4 A 1 T C 1 1

  38. Inchworm Algorithm A 5 G 4 GATTACA 9

  39. Inchworm Algorithm A 5 C G 0 4 T 0 GATTACA A 9 6 G 1

  40. Inchworm Algorithm A 5 G 4 GATTACA A 9 6 A 7 Report contig: ….AAGATTACAGA…. Remove assembled kmers from catalog, then repeat the entire process.

  41. Inchworm Contigs from Alt-Spliced Transcripts=> Minimal lossless representation of data +

  42. Chrysalis Integrate isoforms via k-1 overlaps

  43. Chrysalis Integrate isoforms via k-1 overlaps

  44. Chrysalis Integrate isoforms via k-1 overlaps Verify via “welds”

  45. Chrysalis Integrate isoforms via k-1 overlaps Verify via “welds” Build de Bruijn Graphs (ideally, one per gene) Build de Bruijn Graphs (ideally, one per gene)

More Related