390 likes | 703 Views
DNA Sequencing. Basic Techniques Project Design Process Improvements. 500 bases 2500 bases 10 kbp 150 kbp 3 Mbp simple repeats BIG. 1 locus EST,STS whole cDNA/EST gene, virus BAC, big virus bacterial genome YAC-size HUMAN, etc. Project Size/Type. DNA Sequencing Methods.
E N D
DNA Sequencing Basic Techniques Project Design Process Improvements Chuck Staben
500 bases 2500 bases 10 kbp 150 kbp 3 Mbp simple repeats BIG 1 locus EST,STS whole cDNA/EST gene, virus BAC, big virus bacterial genome YAC-size HUMAN, etc. Project Size/Type Chuck Staben
DNA Sequencing Methods • Chain termination/Dideoxy/Sanger • fluorescence paradigm, ABI, HOOD • Sequencing by hybridization • chips Affymetrix (Lander, et al) • other formats • Hyseq (Church, et al) • Lark Chuck Staben
Dideoxy/Chain Terminator/Sanger • Template • Primer • Extension Chemistry • polymerase • termination • labeling • Separation • Detection Chuck Staben
Target ddC ddA Template-Primer ddG ddT Terminators ddA A ddC AC ddG ACG ddT Chain Terminator Basics TGCA Extend dN : ddN 100 : 1 Ladder n, n+1... Chuck Staben
Electrophoresis Chuck Staben
Template Preparation • ssDNA vectors • M13 • pUC • PCR • dsDNA (+/- PCR) Chuck Staben
Primers • Universal primers • cheap, reliable, easy, fast, parallel • BULK sequencing • Custom primers • expensive, slow, one-at-a-time • ADAPTABLE Primer Label Dye Terminator Chuck Staben
Extension Chemistry 100% termination Accurate Even signal • Polymerase • Sequenase • Thermostable (Cycle Sequencing) • Terminators • Dye labels (“Big Dye”) • spectrally different, high fluorescence • (mass labels??) • ddA,C,G,T with primer labels Chuck Staben
Separation • Gel Electrophoresis • Capillary Electrophoresis • suited to automation • rapid (2 hrs vs 12 hrs) • re-usable • simple temperature control • 96 well format migration ~1/log N Chuck Staben
Paradigm Instrument • Applied Biosystems • ABI3700 (early 1999) • 1500 samples/day! • http://www2.perkin-elmer.com/ga/3700/features.html • ABI377 (gel) and ABI310 (capillary) Chuck Staben
Alternate Instruments • Molecular Dynamics, Beckman Coulter… • ALF, LiCor • infrared detection Not Complete List Chuck Staben
1 lane Sample Output Chuck Staben
Trace Editing • EditView • Mac • Chromas • WinNT • Consed • UNIX Chuck Staben
Project Goals • de novo sequence • Chain terminators • repetitive sequencing • Sequencing by hybridization • Chip technology, eg Chuck Staben
Sequencing Strategies • Random Sequence • Brute Force • Ordered • Divide and Conquer Sequencing Assembly Finishing Annotation Mix to Suit Chuck Staben
Random Method • Shear DNA (nebulize) • finish ends, ligate into vector • Produce template • Sequence to target coverage • read length (500 typical) • accuracy (99% good) Assemble Contigs Chuck Staben
T T C No coverage DISAGREEMENT Only 1 strand Random Chuck Staben
Poisson Statistics L=read length N=#reads G=genome size P0=e-L(N)/G Chuck Staben
Poisson-2 Gap Length=P0G Chuck Staben
Poisson-3 Gap Number=P0N (assume N=500 bases) Chuck Staben
4 Mbp Genome • 10x Coverage • 80,000 reads at 500 bases/read • 4 gaps • 400 bases in gaps 55 instrument days on ABI3700 Chuck Staben
300 machines, 300 days 3 years Plenty 3000 Mbp GenomeHUMAN 50000 instrument days on ABI3700 Chuck Staben
Automation QT Chuck Staben
Costs • Raw cost ~$0.01/base • “Semi-finished” $0.10 per base • “finished” $0.30 per base • High-quality Genome Project • $0.50/base Chuck Staben
Ordered Methods Primer Walking Nested Deletion Chuck Staben
Limitations • Slow, Expensive • Expertise Needed • especially nested deletion • Repeat Problems • especially primer walking Chuck Staben
Finishing • GOALS • >95% coverage on BOTH strands • every base covered 3X • resolve ambiguities • Finish when random no longer productive (3-10 X range) Chuck Staben
Finish-How • Identify gaps, ambiguities • Extend from end of contigs • specific primers • subclones, etc. • Resolve ambiguities • consensus or resequence • specific primers, different chemistry Chuck Staben
Assembly Methods • Strip out vector • Mask known repeats • Trim off unreliable data • Find Matches (500 x 500 x many!!) • how long (and what ktuple) • how perfect (reliability index) • where to look? (ends only vs entire) Chuck Staben
Assembly Programs • PHRAP FAMILY • phrap, kangaroo, phrapo, • GAP4, TIGRAssembler,... • GCG • gelstart, gelenter, gelmerge, gelassemble, geldisassemble • thinly veiled vi editor • SeqWeb…. Chuck Staben
Assembly ImprovementsRepeat Problems • Multiple fragment sizes in 1 project • Use length/distance info Chuck Staben
Project Management • Editing and Assembly • RepeatMasker • Phred/Phrap • Consed • Databases • ACeDB • A C. elegans database • Oracle Chuck Staben
Annotation • ORFs • GRAIL, PowerBLAST • Repeats • Other Regions Submit to Genbank ...HTGS (level1,2,3) ...nr Chuck Staben
A C G T A C G T Sequencing by Hybridization Hybridize labeled query DNA CHIP OLIGOS (20-mers) ...gaactAatact... ...gaactCatact... ...gaactGatact... ...gaactTatact... site 1 ...gaactaAtact... ...gaactaCtact... ...gaactaGtact... ...gaactaTtact... site 2 GAACTATGTACT Chuck Staben
Modern Sequencing Challenges • Heterozygous DNAs • germline differences • somatic variation • Massive sequencing • population studies • genome scans • Minimal sample preparation • “Doctor’s Office” Chips, Quantitative Seq Automation Miniaturization Chuck Staben
Physical MappingGenome Characterization • Genome fragmentation and cloning • vectors, etc. • Physical map assembly • hybridization • fingerprinting Chuck Staben