480 likes | 638 Views
Matthew 13:17 17 For verily I say unto you, That many prophets and righteous men have desired to see those things which ye see, and have not seen them; and to hear those things which ye hear, and have not heard them. DNA Sequencing. Timothy G. Standish, Ph. D. Sequenced Genomes.
E N D
Matthew 13:17 17 For verily I say unto you, That many prophets and righteous men have desired to see those things which ye see, and have not seen them; and to hear those things which ye hear, and have not heard them.
DNA Sequencing Timothy G. Standish, Ph. D.
Sequenced Genomes • Over the past three years large-scale sequencing of eukaryotic genomes has become a reality • Currently the sequencing of at least 5 multicelled eukaryotic genomes has been completed: • 1998 Caenorhabditis elegans - 8 x 107 bp - A nematode worm • 2000 Homo sapiens - 3 x 109 bp - Humans • 2000 Arabidopsis thaliana - 1.15 x 108 - A plant related to mustard • 2000 Drosophila melanogaster - 1.65 x 108 bp - Fruit flies • 2002 Anopheles gambiae – 2.78 x 108 bp mosquito vector of malaria
New Technology • Rapid sequencing of large complex genomes has been made possible by: • Foundational work done over many years and… • Dramatic improvement in DNA sequencing technology over the past few years • In this presentation we will look at both the basic principles of DNA sequencing and how techniques have been refined to yield the dramatic results we now see
A Sequencing Timeline Samples/person/week Average read length Total/week X 50 bp 4 = 200 bp X 100 bp 20 = 2,000 bp X 300 bp 60 = 18,000 bp X 500 bp 180 = 90,000 bp X 650 bp 500 = 325,000 bp X 600 bp 5000 =3,000,000bp 1977 Sanger and Maxam-Gilbert sequencing techniques developed 1980 M13 vector developed for cloning, many refinements and application of computer technology 1990 Improved sequencing enzymes, fluorescent dyes developed, robotics used for high throughout 1997Sacromycetes Cerevisiae genome sequenced 1999Caenorhabdits elegans Human chromosome 22 and about 20 bacterial genomes 2000Drosophila melanogaster, Homo sapiens, Arabidopsis thaliana
Basic Principles • All current practical DNA sequencing techniques can be divided into four major steps: • Labeling of DNA so that small quantities can be easily detected, traditionally done by labeling with either P32 or S35 • Generation of fragments for which the specific bases at the 3’ end are known • Separation of fragments using gel electrophoresis sensitive enough to resolve differences in size of one nucleotide • Fragment detection
Outline • In this presentation we will look at: • The Maxam-Gilbert and Sanger methods of DNA fragment generation • Then methods for separation of fragments • And finally examine how these techniques have been refined and automated to allow for rapid cheap sequencing of large quantities of DNA
The Maxam-GilbertChemical Method • Three major steps: • DNA to be sequenced is typically labeled at the 5’ end using P32 • Fragments are generated using chemicals that break DNA at specific bases • These fragments are then separated and detected using autoradiography • Polyacylamide Gel Electrophoresis is typically used to separate fragments on the basis of single nucleotide differences
2 Fragment Generation • A number of chemicals will specifically modify the bases in DNA • Modified bases can then be removed from the deoxyribose sugar to which they are attached on the sugar-phosphate DNA backbone • Piperidine, a volatile secondary amine, is used to cleave the sugar-phosphate backbone of DNA at sites where bases were modified
Cleavage at Specific Bases • Typically 5 reactions are run: • Dimethylsulfate at pH 8.0 results in modification of guanine (G) • Piperidine formate at pH 2.0 breaks glycosidic bonds between deoxyribose and both purines, guanine (G) and adenine (A), by protonation of nitrogen atoms • Hydrazine (rocket fuel!) opens pyrimidine rings on both pyrimidines, cytosine (C) and thymine (T) • Hydrazine in the presence of 1.5 M NaCl only reacts with C • 1.2 N NaOH at 90 oC strongly cleaves at A and may also weakly cleave at C
Cleavage at Specific Bases • The trick in chemical sequencing is to not allow the reactions to go to completion • Partial reactions run using the following conditions will result in a series of labeled DNA fragments whose final base is known: Dimethylsulfate at pH 8.0 -----------> G Piperidine formate at pH 2.0 -------> G and A Hydrazine------------------------------> C and T Hydrazine in 1.5 M NaCl-----------> C 1.2 N NaOH at 90 oC-----------------> A and some C
Partial Reactions:Dimethylsulphate pH 8.0 P32 5’*NNGACGTACTTA3’ 5’*NNGACGTACTTA3’ 5’*NNGACGTACTTA3’ 5’*NNGACGTACTTA3’ 5’*NNGACGTACTTA3’ 5’*NNGACGTACTTA3’ 5’*NNGACGTACTTA3’
Partial Reactions:Dimethylsulphate pH 8.0 Modification of some, but not all, of the G bases as the reaction is not allowed to go to completion 5’*NNGACGTACTTA3’ 5’*NNGACGTACTTA3’ 5’*NNGACGTACTTA3’ 5’*NNGACGTACTTA3’ 5’*NNGACGTACTTA3’ 5’*NNGACGTACTTA3’ 5’*NNGACGTACTTA3’
Partial Reactions:Dimethylsulphate pH 8.0 5’*NN3’ 5’ACGTACTTA3’ Labeled fragments all of which represent a place where G used to be 5’*NNGAC3’ 5’TACTTA3’ 5’*NNGAC3’ 5’TACTTA3’ 5’*NN3’ 5’ACGTACTTA3’ Unlabeled fragments undetectable using autoradiography 5’*NN3’ 5’TACTTA3’ 5’*NN3’ 5’ACGTACTTA3’ Following breaking of the DNA strand at positions where G was chemically modified, two sets of fragments result: 1) A labeled set all ending where a G once was and 2) An unlabeled set which cannot be detected using autoradiography
Partial Reactions:Hydrazine 5’*NNGACGTACTTA3’ 5’*NNGACGTACTTA3’ 5’*NNGACGTACTTA3’ 5’*NNGACGTACTTA3’ 5’*NNGACGTACTTA3’ 5’*NNGACGTACTTA3’ 5’*NNGACGTACTTA3’ Some, but not all, C and T bases are modified as the reaction is not allowed to go to completion
Partial Reactions:Hydrazine 5’*NNGA3’ 5’G3’5’ACTTA3’ 5’*NNGACG3’ Labeled T C set 5’ACTTA3’ 5’*NNGACGTAC3’ Unlabeled fragments 5’*A3’ 5’*NNGA3’ 5’GTACTTA3’ 5’*NNGACGTACT3’ 5’*A3’ 5’*NNGACGTAC3’ 5’*TA3’ Following breaking of the DNA strand at positions where C or T was chemically modified, two sets of fragments result: 1) A labeled set all ending where a C or T once was and 2) An unlabeled set which cannot be detected using autoradiography
Disadvantages • Toxic chemicals • Large amounts of radioactivity • Sometimes ambiguous and frequently ugly sequencing gels • Tricky to read autorads • Lack of automated methods
Sanger Sequencing • The Sanger sequencing method takes advantage of the way that normal DNA replication occurs • For DNA to be extended using normal DNA polymerases, a hydroxyl group must be present at the 3’ carbon on deoxyribose • Fragments are generated by spiking reactions with small quantities 2’ 3’ dideoxy nucleotides which terminate polymerization whenever they are incorporated into DNA • Polymerases used must lack 3’ to 5’ exonuclease proofreading activity for this method to work
Dideoxynucleotides OH Phosphate NH2 HO O P Base N N O N N CH2 5’ O 4’ 1’ Sugar 3’ 2’ OH H H 2’3’-dideoxynucleotide monophosphate • DNA Sequencing using the Sanger method involves the use of 2’3’-dideoxynucleotide triphosphates in addition to regular 2’-deoxynucleotide triphosphates • Because 2’3’-dideoxynucleotide triphosphates lack a 3’ hydroxyl group, and DNA polymerization occurs only in the 3’ direction, once 2’3’-dideoxynucleotide triphosphates are incorporated, primer extension stops 2’-dideoxynucleotide monophosphate
CH3 2’3’dideoxy-nucleotidesTerminateDNAReplication O H OH HN N OH NH2 O P HO O O CH2 N N O O N N CH2 O OH P O OH NH2 B A S E S H O N H2N H O H P HO O N O N N NH 2’3’dideoxynucleotide O SUGAR-PHOSPHATE BACKBONE N N O H2O NH2 N O O CH2 N O CH2 N O HN N O HO P H2N H O O H H P HO O O CH2 O O CH2 O O HO P OH H HO
Making DNA Fragments • In Sanger DNA sequencing reactions all the basic components needed to replicate DNA are used • 4 reactions are set up, each containing: • DNA Polymerase • Primer • Template to be sequenced • dNTPs • A small amount of one ddNTP • ddATP, ddCTP, ddGTP, ddTTP • As incorporation of ddNTPs terminates DNA replication, a series of fragments is produced all terminating with the ddNTP that was added to each reaction
DNA Sequencing Cloned fragment Primer Primer Binding sites Plasmid (or phage) with cloned DNA fragment
The ddATP Reaction Pol. Pol. Pol. Pol. 5’TTATCGTACCATGACTAGA 5’TTATCGTACCATGA 5’TTATCGTACCATGACTAGATGCGATA 5’TTATCGTA Let me Through! Oh come on! Not Again! Agggg…. 5’TTATCGTACCA 5’TTATCGTACCATGACTA 5’TTATCGTACCATGACTAGATGCGA 3’AATAGCATGGTACTGATCTTACGCTAT5’ 5’TTATCG 5’TTATCGTA 5’TTATCGTACCATGA 5’TTATCGTACCATGACTAGA 5’TTATCGTACCATGACTAGATGCGATA
Separation of DNA Fragments • All current practical sequencing methods rely on separation of DNA fragments in such a way that differences in length of a single base can be resolved • This is typically done using polyacrylamide gel electrophoresis
Polyacrylamide Gels O Acrylamide C NH2 CH2 CH CH2 CH O C NH2 CH2 Acrylamide Acrylamide O C NH2 bis-Acrylamide CH2 CH • Polyacrilamide is a polymer made of acrylamide (C3H5NO) and bis-acrilamide (N,N’-methylene-bis-acrylamide C7H10N2O2)
Polyacrylamide Gels O O C C NH2 NH2 CH2 CH2 CH CH SO4-. • Acrylamide polymerizes in the presence of free radicals typically supplied by ammonium persulfate
Polyacrylamide Gels O O O O C NH2 C NH2 C NH2 C NH2 CH2 CH CH2 CH CH2 CH CH2 CH SO4-. • Acrylamide polymerizes in the presence of free radicals typically supplied by ammonium persulfate • TMED (N,N,N’,N’-tetramethylethylenediamine) serves as a catalyst in the reaction
Polyacrylamide Gels O O O C NH2 C NH2 C NH2 CH2 CH CH2 CH CH2 CH CH2 CH O O O C NH2 C NH2 C NH2 CH2 O CH2 CH CH2 CH O C NH2 C NH2 CH2 CH CH2 CH bis-Acrylamide • bis-Acrylamide polymerizes along with acrylamide forming cross-links between acrylamide chains
Polyacrylamide Gels • bis-Acrylamide polymerizes along with acrylamide forming cross-links between acrylamide chains
Polyacrylamide Gels Lots of bis-acrylamide • Pore size in gels can be varied by varying the ratio of acrylamide to bis-acrylamide • DNA sequencing separations typically use a 19:1 acrylamide to bis ratio Little bis-acrylamide
Denaturation of DNA Self-annealing DNA Double-stranded DNA 8 M Urea 8 M Urea Denatured Single- Stranded DNA Denatured Single- Stranded DNA • For gel electorphoresis to accurately separate on the basis of size and not shape or other considerations it is important that the DNA be denatured • This is typically achieved by using a high urea concentration (8 M) in the gel
Separation of Fragments:Maxam-Gilbert X X X X 1.2 N NaOH at 90 oC A>C Dimethyl sulfate pH 8 G Piperidine formate pH 2 G+A Hydrazine T+C Hydrazine in 1.5 M NaCl C 5’ to 3’ X 5’GACGTACTTA3’ X G G+A T+C C A>C
Separation of Sanger Fragments ddATP ddCTP ddGTP ddTTP Read 5’ to 3’ from bottom to top • Products from 4 reactions each containing a small amount of a dideoxynucleotide are loaded onto a gel • Because polymerization goes 5’ to 3’ shortest fragments are 5’ compared to longer fragments which are in the 3’ direction
DNA SequencingWhat A SequencingAutorad ActuallyLooks Like A C G T • To read the autorad it is important to start at the bottom and work up so that it is read in the 5’ to 3’ direction 5’CTAGAGGATCCCCGGGTACCGAGCT...3’
Sequencing Method Refinements • Because of difficulties intrinsic to the Maxam-Gilbert chemical sequencing strategy, efforts at improvement have been concentrated on the Sanger method • Major improvements in the following areas have been achieved • Labeling and detection • Fragment separation • DNA Polymerases used in sequencing and resulting strategies for generation of fragments • Automation
Pros and Cons of theSanger Method • It is more amenable to automation than Maxam-Gilbert • Fewer dangerous chemicals are used, but acrylamide and P32 or S35 are still a problem • Gels or autorads are generally cleaner looking and the reading of bases is a lot easier than Maxam-Gilbert data • The bottom line: Without improvements in automation, detection and separation technologies Sanger sequencing is still very labor intensive
Labeling and Detection • Labeling using radioactive isotopes is difficult, dangerous and expensive • Using biotin labeled primers has allowed conjugation of enzymes to fragments and their subsequent detection using substrates that change color in the presence of the enzyme • This technique is clumsy, expensive, time consuming and unreliable • It also may require transfer of fragments to membranes thus increasing labor and generally has not caught on
Labeling and Detection • Another approach has involved development of very sensitive silver-staining technologies • I have tried this one, it is miserable and unreliable • Read length on gels is typically short and creation of a permanent copy of the gel requires expensive additional equipment and supplies • It may not involve isotopes, but it is such a hassle and the data is of such low quality that it is not worth the effort
Labeling and Detection • The most significant advance in labeling has been the production of electrophoretically neutral dyes that fluoresce at specific wavelengths when excited by laser-produced light over a very narrow range of wavelengths • These dyes, when attached to primers allow detection down to 15 attomoles (10-18) • That’s less than 107 molecules!
The Li-Cor System • Li-Cor of Lincoln, Nebraska was one of the first to implement fluorescent dyes as part of an automated sequencing system • The Li-Cor system uses infrared lasers scanning a fixed line toward the bottom of an acrylamide slab gel • Fluorescence of dyes attached to DNA fragments are detected as they pass the lasers and detectors • Data in digital form is fed directly into a computer system where automated base calling is done • A graphic representation of the data resembles a traditional autorad with bands appearing in 4 lanes
The Li-Cor System Polyacrylamide gel Dye labeled fragments Laser Big Hairy Zappo Computer ….. CCD Detector AT G C
Pros and Cons • The Li-Cor systems major advantage is the lengths of its DNA reads • Because all fragments travel through the entire gel, resolution is sufficient to read over 1,000 bases in a single run with over 99 % accuracy • This is better than just about any single-run manual sequencing method • Elimination of manual reading of autorads also eliminates human error and removes a labor intensive step • P32 or S35 not used - another major advantage • Tricky acrylamide gels still must be cast and loaded manually
Applied Biosystems • Applied Biosystems (ABI) has developed fluorescent dye systems further and improved methods for loading and electrophoresis • Four dyes each of which fluoresce at a different wavelength, but having about the same impact on electrophoritic mobility can be used to label either primers or the nucleotides that terminate a reaction • If terminator dyes are used, the entire sequencing reaction is reduced to one tube from 4 in conventional Sanger sequencing • Instead of polyacrylamide slab gels, a single capillary can be used with a liquid polymer that is replaced after each individual run
Replication Using Dye Terminators Pol. Pol. Pol. Pol. 5’TTATCGTACCAC 5’TTATCGTACCATAATT 5’TTATCGTACCATAATTGCA 5’TTATCGTA Let me Through! Oh come on! Not Again! Agggg…. 5’TTATCGTAT 5’TTATCGTATT 5’TTATCGTATTG 5’TTATCGTATTGCA 5’TTATCGTATTGCAA 5’TTATCGTATTGCAAT 5’TTATCGTATTGCAATTG 5’TTATCGTATTGCAATTGC 3’AATAGCATAACGTTAACGTTACGCTAT5’ 5’TTATCG 5’TTATCGTA As the base at the end of each fragment is clearly marked with a unique fluorescent dye, the entire reaction can be done in a single tube 5’TTATCGTATTGC 5’TTATCGTATTGCAATT 5’TTATCGTATTGCAATTGCA
ABI Prism 310 System ATTGCA Capillary Liquid polymer - Heat plate Laser Big Hairy Zappo Beam splitter Computer Sequencing reaction ….. Window Detectors Sequencing reaction +
The State of the Art • The ABI Prism 310 (1 capillary), 3100 (16 capillaries) and 3700 (96 capillaries) represent the current state of the art in automated sequencing machines • A single ABI Prism 377 slab gel sequencer can run 115,000 bases per day! • The 3100 can run up to 184,000 bases per day • The 3700 can run up to 1,104,000 bases per day • Large sequencing facilities, like Celera, have factories full of these machines which can run 24 hours a day with very little down time for routine maintenance
The State of the Art ABI Prism 3700
The End