120 likes | 339 Views
InCoB 2009. MapView: visualization of short reads alignment on a desktop computer. Hua Bao Sun Yat-sen University. 2009-09-09. Next-generation sequencing. Sequencing by synthesis High-throughput (tens of millions reads per lane) Read length is short (25-50bp)
E N D
InCoB 2009 MapView: visualization of short reads alignment on a desktop computer Hua Bao Sun Yat-sen University 2009-09-09
Next-generation sequencing • Sequencing by synthesis • High-throughput (tens of millions reads per lane) • Read length is short (25-50bp) • Sequencing error rate is relatively higher than Sanger sequencing
Statement of the problem 1. Alignment results: (e.g. , 50M reads) read1 TATCGCACATAGTTCGCG hhhhhhhllhhhh;hA - Chr1 126609 read2 CATACGACACTCATGTAG h,abhhhh;hAhhda, + Chr2 94 2. Reference genome: (e.g. , 500M bp) >Chr1 CGATCGAGCGACAGACGAGCACACGTAGCACTGTGGGGGAA Visualization of large-scale alignment data with super-high computational efficiency.
Computational efficiency Memory usage : • Data compressed • Fractional loading CPU time : • Indexing • Pre-computing
File format design MapView format (MVF) : Basic info of reference and reads Offset of Data, Index and Statistics Compressed sequences Ordered alignments The offset address of data is indexed by reference position Coverage information of reference site Statistics Data Head Index
Loading algorithms Jump to different region MapView window MapView window Genomic position Using Index Offset address Data Data MapView file
Efficiency of MapView Computational efficiency comparision The alignment data for the assessment are of reference length 43 million bp and 6 million Illumina 44-bp reads.
Summary • Super-high computational efficiency: Visualization of hundreds of millions reads with 40M memory in 2 seconds. • Rich featured and user-friendly: Compact alignment view for both single-end and paired-end short reads, multiple navigation and zoom modes.
MapView: visualization of short reads alignment on a desktop computer Thank you! 2009-09-09