80 likes | 102 Views
NGS Transcriptomic Workflows Hugh Shanahan & Jamie al-Nasir Royal Holloway, University of London. Setting the scene. Transcriptome – total sequence and abundance of RNA generated by a cell RNA is transcribed from DNA Genome is fixed for a organism Transcriptome is dynamic
E N D
NGS Transcriptomic WorkflowsHugh Shanahan & Jamie al-NasirRoyal Holloway, University of London
Setting the scene • Transcriptome – total sequence and abundance of RNA generated by a cell • RNA is transcribed from DNA • Genome is fixed for a organism • Transcriptome is dynamic • Variation between tissues • Variation over time • RNA transcripts are 1,000’s-10,000 bases in length
Interested in • How many copies of a particular transcript are there • What is the sequence • - sequence comes from genome but alternative splicing means a transcript may not just be a contiguous block of DNA
Sequencing steps • Fragment transcripts into shorter pieces (reads) • 100-300 bases longs • Have many overlapping reads • Amplify (make lots of copies of) the short reads • Can sequence these short reads and then assemble them to reconstruct transcripts. • Size of data set depends on size of transcriptome but also amount of fragmentation (sequencing depth) • Can either assemble with a reference genome or de novo (very hard)
Final points • File formats have been updated to binary – used to use flat text so sizes were huge (Reference Genome – 39 Gbyte -> 0.8 Gybte) • Raw image data is actually discarded • Discussions focusses on assembly and down-stream analysis • Much of this data is deposited in the Sequence Read Archive (SRA) • We’ve papered over everything that happens before sequencing – i.e. the biochemical steps carried out • This is highly variable • These steps are not properly annotated