230 likes | 443 Views
Sequencing technologies and Velvet a ssembly. Lecturer : Du Shengyang September 29 , 2012. The Advances of DNA Sequencing Technology. 化学降解法. Sanger 法. 荧光自动测序技术. 454. The second generation of sequencing technologies. Solexa. SOLiD.
E N D
Sequencing technologiesand Velvet assembly Lecturer:Du Shengyang September 29,2012
The Advances of DNA Sequencing Technology 化学降解法 Sanger法 荧光自动测序技术 454 • The second generation of sequencing technologies Solexa SOLiD The first generation of sequencing technologies
The third generation of sequencing 一、HelicoBioScience单分子测序技术 二、Pacific Bioscience SMRTT 技术 三、Oxford Nanopore Technologies 的纳米孔单分子测序技术
三代测序技术的优点 High throughput, low cost, long read length, sequencing time is short And avoid the second generation sequencing of PCR amplification link reduce the sequencing of the error rate, the real realize the single molecule sequencing
The key to Sequencing success 1、Sample preparation 2、Choose the right sequencing platform 3、Late bioinformatics analysis
Bioinformatics analysis Introduction Somesequencing techniques are commercially available (e.g. 454 Sequencing, Solexa) 454 Sequencing ~ 100 – 200bp Solexa ~ 30bp
Introduction Euler assembler (Pevzner 2001) used k-mer for a node of de Bruijn graphs Reads are mapped as a path through the de Brujin graph High redundancy does not affect the number of nodes “Velvet” effectively deals with experimental errors and repeats by using Brujin graphs with k-mers
De Bruijn Graphs – construction • Adjacent k-mers overlap by k-1 nucleotides • Each node is attached to twin node • Reverse series of reverse complement k-mers • Overlap between reads from opposite strand • Union of a node and its twin node is called a “block”
De Bruijn Graphs – construction • For each k-mer, hash table records ID of the first read and its position • Each k-mer is recorded with reverse complement • Reads are traced through the graph • Create a directed arc if necessary
De Bruijn Graphs – simplification • Simplify the chains of blocks • No information loss • If node A has only one outgoing arc to node B, and if node B has only one ingoing arc → merge A B
De Bruijn Graphs – error removal error error k k Velvet focuses on “topological features” of the graph • First step: remove tips • Tip: chain of nodes disconnected on one end • Use two criteria: (1) length and (2) minority count • Length: remove a tip if < 2k bp since two nearby errors can create a tip up to 2k bp
De Bruijn Graphs – error removal • Minority count: multiplicity m < n • Starting from node B, going through the tip is an alternative to a more common path m A B tip C n
De Bruijn Graphs – error removal Second step: remove bubbles using Tour Bus • Redundant paths start and end at the same nodes • Bubbles are created by errors or biological variants such as SNP Bubble
De Bruijn Graphs – error removal Tour Bus • Detect redundant paths 2. Compare them using dynamic programming methods 3. If similar, merge them
De Bruijn Graphs – error removal Third step: remove erroneous connections Remove erroneous connections after Tour Bus algorithm Remove erroneous connections with basic coverage cutoff Genuine short nodes which cannot be simplified in the graph should have high coverage
Breadcrumb: resolution of repeats unambiguous long nodes Using read pairs, pair up the long nodes Flag paired reads using unambiguous long nodes
Breadcrumb: resolution of repeats • Extends the nodes as far as possible using flagged paired reads • All nodes between A and B are paired up to either A or B
Experimental Results Test error removal pipeline on simulated data • Simulate reads are from E. coli, S. cerevisiae, C.elegans, and H. sapiens
Experimental Results Test error removal pipeline on experimental data 173,428 bp human BAC was sequenced using Solexa machines Reads were 35bp long, and k=31 Tour Bus increased sensitivity by correcting errors and preserved the integrity of the graph structure
Conclusions Velvet is a de Bruijn graph based sequence assembly method for short reads Errors are handled by removing tips and Tour Bus algorithm A large number of repeats are resolved by Breadcrumb algorithm Velvet was assessed using simulated and real datasets and it performed well