1 / 60

Lisa Crossman & the Pathogen Sequencing Unit

The Nuts and Bolts of Bacterial Genome Sequencing. Lisa Crossman & the Pathogen Sequencing Unit. Dr. Fred Sanger Double Nobel laureate and developer of the dideoxy sequencing method, first published in December 1977. [Credit: Wellcome Images].

colman
Download Presentation

Lisa Crossman & the Pathogen Sequencing Unit

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Nuts and Bolts of Bacterial Genome Sequencing Lisa Crossman & the Pathogen Sequencing Unit

  2. Dr. Fred Sanger Double Nobel laureate and developer of the dideoxy sequencing method, first published in December 1977. [Credit: Wellcome Images] "Fred Sanger is a quiet giant, whose discoveries and inventions transformed our research world.” (A.Bradley, WTSI.)

  3. Sequence centre contributions to the finished human genome sequence

  4. Human sequences The Human genome project (2000) Celera (2000) First working drafts published 2001 Race for the ‘$1,000 genome’ 2004 (~£510) James Watson ($2 million) (2007) J. Craig Venter (2007)

  5. Francis Crick, 1958 Watson & Crick

  6. Sanger Sequencing DNA extraction Sequencing reactions Automated DNA Sequencing Finishing Annotation and analysis Publication in journal Publication on the internet

  7. Two stage strategy for producing long DNA sequences Target DNA molecule Randomly produced DNA fragments Many overlapping sequences (reads) Assembled reads (draft sequence) Finished sequence BAC Prepare multiple copies Purified BAC DNA Physically fragment DNA Subclone random fragments Random shotgun Generate reads from random subclones Assemble sequence Prefinished sequence Sequence finishing Directed finishing Final assembly Finished sequence

  8. – use a kit Millipore's Montage Plasmid Miniprep 96 kit Promega's Wizard SV 96 Plasmid DNA Purification System How to purify DNA Methods for DNA purification - Many sources of DNA • bacteria, animal cells, blood, soil, plant cells • properties (solubility, charge) not sequence dependent • thus “generic” purification methods are possible • chemically stable, particularly at alkaline pH • intracellular & associated with stabilizing proteins • fragile & subject to mechanical shear

  9. Beckman Coulter Biomek FX Laboratory Automation Workstation 2 x 96-well plates in 70 minutes 192 plasmid preps in 70 minutes (using Promega Wizard SV96 kit) PerkinElmer MiniTrak liquid handling system 12 x 384-well boxes in less than 1h 4608 plasmid preps in less than 1h Automated DNA purification

  10. Levels of automation Colony picking robots

  11. DNA sequencing from: Sanger, F., Nicklen, S. & Coulson, A. R., Proc. Natl. Acad. Sci. USA 74, 5463 (1977)

  12. DNA sequencing using dideoxy-mediated chain termination method of Sanger et al. • DNA to be sequenced acts a template for the enzymatic synthesis of new DNA starting at a defined primer site • Incorporation of a dideoxynucleotide blocks further chain elongation

  13. Dideoxy sequencing • Denature ds DNA • Anneal Primer 5’ 3’ to template DNA 3’ 5’ • Enzyme, dNTPs and buffer added at the optimum temperature will initiate chain elongation • Addition of dideoxynucleotides will terminate elongation

  14. Sanger sequencing method

  15. Ratios of deoxy and dideoxynucleotides • Are such that a finite probability is created for a dideoxynucleotide to be incorporated in place of the usual deoxynucleotide at each nucleotide position on the growing chain resulting in a population of truncated fragments

  16. Label location • The label can be incorporated into: • The oligonucleotide primer used to initiate the sequencing reaction • The deoxynucleotides in chain elongation • The dideoxynucleotides used in chain termination

  17. Radio (manual) and fluorescent (automated) label sequencing

  18. Run module Capillary length to detector (cm) Runs/ day LOR* Phred Q20 bases/read Phred Q20 bases/day Rapid 36 40 550 500 1,920,000 Standard 36 24 700 650 1,497,600 Long-Read 50 12 > 1,000 > 800 > 921,600 Automated DNA sequence analyzer Specification for Applied Biosystems 3730xl DNA Analyzer (96 samples per run) Sequencing Production Capacity * Length of read with 98.5% basecalling accuracy, less than 2% N's, using pGEM-3Zf(+) as template.

  19. Original Sanger method (1977) DNA sequencing in capillary analysers (e.g ABI 3730xl), 1999 to present Chain termination with dideoxy nucleotides Chain termination with dideoxy nucleotides DNA radiolabelled DNA labelled with fluorescent dyes Detection of DNA fragments by autoradiography Fluorescent detection of DNA fragments Data format not digital Data in digital form Single sequencing reaction Thermostable DNA polymerase used for cycle sequencing DNA fragments separated by electrophoresis in polyacrylamide slab gels DNA fragments separated by electrophoresis in a liquid matrix in capillaries Manual gel pouring Automated filling of capillaries Manual sample loading Automated sample loading

  20. Large scale DNA sequencing facility Every day 120,000 DNA sequences 60,000 plasmid preps

  21. Two stage strategy for producing long DNA sequences Target DNA molecule Randomly produced DNA fragments Many overlapping sequences (reads) Assembled reads (draft sequence) Finished sequence BAC Prepare multiple copies Purified BAC DNA Physically fragment DNA Subclone random fragments Random shotgun Generate reads from random subclones Assemble sequence Prefinished sequence Sequence finishing Directed finishing Final assembly Finished sequence

  22. Redundancy in genome sequencing

  23. What do we mean by finished sequence? • A closed consensus sequence without gaps that meets our finishing criteria and therefore has an overall accuracy of at least 99.99%. • The sequence may contain (small) regions that do not meet our finishing criteria. These are likely to be of lower quality but they will have been characterized and should be identified in the annotation. • The sequence has been checked by an experienced finisher. • When the finishing is finished, no further finishing is being done.

  24. DNA extraction Sequencing reactions Automated DNA Sequencing Finishing Annotation and analysis Publication in journal Publication on the internet

  25. What do we mean by a finished and annotated sequence? • A closed consensus sequence in which coding sequences have been identified, systematically numbered and analysed. • Vital metabolic genes and previously sequenced genes have been identified. • The sequence and annotation has been checked by an experienced annotator. • A full analysis has been carried out and the genome sequence is deposited in the sequence databases.

  26. Next (New) Generation Sequencing Technologies Technological breakthroughs…… …..driven by the race for the $1,000 (human) genome 454 (Roche) Solexa (Illumina) And others….

  27. Next Generation Sequencing Technologies • Pyrosequencing • 454 sequencing • Clonal amplification on beads • Pico titre plate (1.6 M wells) • Sequencing-by-synthesis • Chemiluminescent detection • No cloning required • Increased performance • 20,000,000 bp per run (4.5 hours) • 2 Mb genome, 10x coverage • Current performance (ABI 3730) • 48,000 bp per run

  28. The GS20 Sequencing Machine Reagent Drawer CCD Camera andSequencing PlateHousing Computery Bits

  29. Developments in Technology Pyrosequencing

  30. 454 • Genome fragmented into 300-500 bp • Ends are polished and adapters ligated: 4 nucleotide “key” + sequencing primer + PCR primer • Fragments immobilised onto magnetic, streptavidin-coated beads • A+B fragments then isolated as sstDNA library B Isolate AB fragments only A

  31. emPCR A) Anneal Single Stranded template to an excess of DNA Capture beads C) Break Microreactors and enrich for DNA positive beads B) Emulsify beads and PCR reagents in water-in-oil microreactors

  32. 44 μm Depositing DNA Beads into the PicoTiter™Plate Load Enzyme Beads Load beads into PicoTiter™Plate Centrifugation

  33. Reagent Flow Across PicoTiterTMPlate Peristaltic Pump Sequencing plate in front of CCD Reagent Cassette The four nucleotides are washed in series over the plate

  34. Repeated dNTP Flow Sequence: G T C A PP Sulfurylase Luciferase i APS ATP luciferin Light + oxy luciferin Pyrosequencing Signal Generation • Each of the hundreds of thousands of beads with millions of copies of DNA are sequenced in parallel. • If a complementary nucleotide is flowed into a well, polymerase extends the strand by adding a nucleotide. • Addition of one or more generates a light signal which is recorded. DNA Capture Bead Containing Millions of Copies of a Single Clonal Fragment A A T C G G C A T G C T A A A A G T C A G T T A G C C G T A C G A T T T T C A G T Anneal Primer Process continues until defined number of nucleotide flow cycles are completed

  35. Illumina (Solexa) machine (from http://www.gatc-biotech.com)

  36. Ilumina (Solexa) Sequencing Dense lawn of primers

  37. Bridge amplification

  38. Ilumina (Solexa) Sequencing

  39. Next Generation sequencing technologies 454Solexa Sanger Data generation 25 Mbp/run 3,3000 Mbp/run 0.25 Mbp/run Read length 240 bp 35 bp 800 bp Read pair information no no ` yes Homopolymeric runs <5 accurate accurate Cloning bias no no yes De novo genomes hybrid? hybrid? yes Current Cost $100/Mb $5/Mb $500/Mb (~£50) (~£2.55) (~£255) www.454.com and Margulies et al (2005) Nature 15;376-80 www.solexa.com

  40. Even newer sequencing technologies • ABI SOLiD (bead/light, interrogates every 3,4 base) • - Roche GS FLX (bead/light, longer reads)

  41. New Challenges…. Genome Sequence Finishing Sanger sequence New Generation sequencing technologies

  42. New Challenges • Data handling WTSI will generate ~100 Terabytes of Processed sequence data per year: global repository is currently Only 75TB Each machine of the newer generation sequencing technologies can generate 1,000,000,000,000 bytes/day raw data. (~ 1Tb and equivalent to approximately 10 laptops).

  43. New Challenges - Annotation De novo annotation Deep resequencing Metagenomics Comparative genomics

  44. Artemis free genome viewer & analysis tool www.sanger.ac.uk/Software/Artemis

  45. Escherichia coli (scanning electron micrograph)

  46. Escherichia coli Workhorse of modern molecular biology Human commensal organism found in the gut Gram negative, optimum growth temperature 37oC, motile Indicator of feacal contamination in the environment Some strains can cause severe infections: E.coli 0157:H7

  47. 748 You sequenced one E. coli you’ve done ‘em all? 0157:H7 (EDL922) K12 66 190 226 3166 EAEC Unique 114 152 240 CFT073 4902 CDS total 748 =15%

  48. Cole, 2001 • Yersinia pestis • Primarily a pathogen of rodents • Evolved from the gastrointestinal pathogen Y.pseudotuberculosis 16srRNA – identical, DNA-DNA hybridisations - highly related Diverged 1,500-20,000 years ago • Employ an insect vector • Infect multiple hosts • Become a blood borne intracellular pathogen

  49. Bacterial diversity is large: Enteric genome content correlates with pathogenicity and host range: inter-species inter-genus inter-strain Yersinia pestis 1335 (33%) plague 2686 1460 (35%) Yersinia enterocolitica gastroenteritis 1708 (41%) 2438 Escherichia coli O157:H7 1876 (43%) 1387 (26%) gastroenteritis 3953 528 (12%) Escherichia coli K12 1220 (28%) non-pathogen 3094 1505 (33%) Salmonella enterica Typhi 601 (13%) typhoid fever 3998 479 (11%) Salmonella enterica Typhimurium 100 Mya gastroenteritis unique Gene differences shared 4 3 2 1 0 unique

More Related