100 likes | 277 Views
Genome Assembly. Bonnie Hurwitz Graduate student TMPL . Genome assembly. Genome assembly. …ACGGCTGCGTTACATCGATCAT. ACATCGATCATTTACGATACCATTG…. genomic DNA. Shotgun sequencing (WGS). sheared. clone library (insert sizes of 1-2, 3-4, 30-40, 100kb). end sequence clones (f / r).
E N D
Genome Assembly Bonnie Hurwitz Graduate student TMPL
…ACGGCTGCGTTACATCGATCAT ACATCGATCATTTACGATACCATTG… genomic DNA Shotgun sequencing (WGS) sheared clone library (insert sizes of 1-2, 3-4, 30-40, 100kb) end sequence clones (f / r) assemble reads by alignment identity
Genome scaffolding H G A B E’ C D F E’’ D contig A G B E F H C break mate pair linkage 4 1 3 7 6 8 5 2 “composite” genome scaffold
Sanger sequencing costs 3.1 2.0 1.00 1.8 0.90 1.6 0.80 1.4 0.70 1.2 0.60 1.0 0.50 0.57 ¢ 0.46¢ .8 0.40 0.35¢ .6 0.30 .4 0.19¢ 0.20 0.10¢ .2 0.10 0 0 Sequence production (Billions of bases/month) Cost: Cents per base 2003 2005 1995 2001 1997 1999 1993 1989 1991 2008 ~ $1/read
0.01 ¢ 0.03 ¢ 0.003 ¢ Cost / bp --> (Sanger is currently 0.1 ¢ ) 454 Pyrosequencing - the generations
When is a genome “finished”?(by Poisson Calculations) Fold coveragePercent of genome sequenced 0.25 x 22% 0.50 x 39% 0.75 x 53% 1 x 63% 2 x 88% 3 x 95% 4 x 98% 5 x 99.4% 6 x 99.75% 7 x 99.91% 8 x 99.97% 9 x 99.99% 10 x 99.995% Coverage: Coverage is the average number of reads representing a given nucleotide in the reconstructed sequence. It can be calculated from the length of the original genome (G), the number of reads (N), and the average read length(L) as NL / G
Tablet: Assembly Viewer Current location Sequence Overlap Contig info Consensus Sequence reads
Our goal today • Assemble a phage genome • Assemble a phage genome with different levels of coverage • Compute basic statistics on each genome assembly • View the assemblies • Compare the best assembly to the finished genome