230 likes | 365 Views
DNA sequence-ULTIMATE Map DNA sequencing-methods Assembly/sequencing. Genome Characterization. Assigned reading: Service 2006 review paper Assigned listening: Ecic Lander genomics lecture. BIO520 Bioinformatics Jim Lund. DNA Sequence Project Size/Type. 500 bases 2500 bases 10 kbp
E N D
DNA sequence-ULTIMATE Map DNA sequencing-methods Assembly/sequencing Genome Characterization Assigned reading: Service 2006 review paper Assigned listening: Ecic Lander genomics lecture BIO520 Bioinformatics Jim Lund
DNA Sequence Project Size/Type 500 bases 2500 bases 10 kbp 150 kbp 3 Mbp simple repeats 3 Gbp 31 Gbp 1 EST,STS whole cDNA/EST Gene, virus BAC, big virus Bacterial genome, YAC-size Human, mouse Salamander
Metazoan genome sizes Nematode (Caenorhabditis elegans): 100 Mb Thale cress (Arabidopsis thaliana): 160 Mb Fruit fly (Drosophila melanogaster): 180 Mb Puffer fish (Takifugu rubripes): 400 Mb Rice (Oryza sativa): 490 Mb Human (Homo sapiens): 3.5 Gb Leopard frog (Rana pipiens): 6.5 Gb Onion (Allium cepa): 16.4 Gb Mountain grasshopper(Podisma pedestris): 16.5 Gb Tiger salamander (Ambystoma tigrinum): 31 Gb Easter lily (Lilium longiflorum): 34 Gb Marbled lungfish (Protopterus aethiopicus): 130 Gb
DNA Sequencing Methods Chain termination/Dideoxy/Sanger Fluorescence paradigm, ABI Main method Next generation sequencing Polymerase addition sequencing 454 Sequencing, Illumina Chips: Affymetrix
Dideoxy / Chain Terminator / Sanger Template Primer Extension Chemistry polymerase termination labeling Separation Detection
Chain Terminator Basics Target Template-Primer ddC ddA ddG ddT ddA Labeled Terminators A ddC AC ddG ACG ddT TGCA Extend dN : ddN 100 : 1 Ladder n, n+1...
Electrophoresis Sequencing Reaction products Polyacrylamide Gel Electrophoresis (PAGE)
Separation Gel Electrophoresis Capillary Electrophoresis suited to automation rapid (2 hrs vs 12 hrs) re-usable simple temperature control 96 well format migration ~1/log N
Paradigm Instrument Applied Biosystems http://www.appliedbiosystems.com/ ABI3730XL (2002, 96 samples, 1000 base reads, ~$350,000, higher sensitivity, lower reagent cost, ~$1/reaction) 700 Kbp / 24 hours. 384 capillary sequencers 5700 sequences / 24 hr day 2.8 Mbp / 24 hours.
384-well capillary sequencing Results are shown as an electropherogram showing a peak for each base. From the peak heights and widths, a Phred score is assigned to each individual base. A high Phred score indicates a high certainty as to the identity of that particular base.
Sample Output 1 lane
1 trace=1000 bases or less ABI: 1000 bp reads Illumina: 50-100 bp reads 454 Sequencing: 300-400 bp reads How do we cover a genome? DIVIDE AND CONQUER: assemble these short sequence fragments.
Assembly/Trace Editing Consed UNIX EBI’s Phusion EditView (ABI PRISM) Mac Chromas (free/pay versions) Windows
Sequencing Strategies Ordered Divide and Conquer Random Sequence Brute Force Sequencing Assembly Finishing Annotation The random approach now predominates for big projects
Random Method (details for Sanger seq) Shear DNA (nebulize) finish ends, ligate into vector Produce template Sequence to 8X – 10X coverage Sequence both ends of templates. Read length (1,000bp typical) Accuracy (99% good)
Assembly Problem CONTIG
Contigs, Islands contigs Island
Assembling random sequences T T C No coverage DISAGREEMENT Only 1 strand
Assembly programs • Celera Assembler (Eugene Myers et al.) • Arachne (Serafim Batzoglou et al.) • PCAP (Xiaoqiu Huang, Iowa State University) • Phusion (EBI)
1990’s: Human genome 3Gbps, $300 million (just sequencing) • Current: Mammalian genome (3 Gbps): $1 million • Goal: $100,000 genome, 10X cheaper (and faster) likely 2012! • New goal! $1,000 genome. UK’s sequencing center has one: http://www.uky.edu/Centers/AGTC/
454 Sequencing’s Genome Sequencer FLX Pyrosequencing (sequencing by detection of nucleotides added during DNA synthesis. 350-400 million bases per run (10 hrs.). 400 bp sequence reads. 1,000,000 reads per run. $6,600 per run, 60kb/$1, or $0.00165/bp.