340 likes | 1.43k Views
Next Generation Sequencing. Itai Sharon November 11th, 2009 Introduction to Bioinformatics. 2001: Human Genome Project 2.7G$, 11 years. 2007: 454 1M$, 3 months. 2008: ABI SOLiD 60K$, 2 weeks. 2001: Celera 100M$, 3 years. 2010: 5K$, a few days?. 2009: Illumina, Helicos 40-50K$.
E N D
Next Generation Sequencing Itai Sharon November 11th, 2009 Introduction to Bioinformatics
2001: Human Genome Project 2.7G$, 11 years 2007: 454 1M$, 3 months 2008: ABI SOLiD 60K$, 2 weeks 2001: Celera 100M$, 3 years 2010: 5K$, a few days? 2009: Illumina, Helicos 40-50K$ 2012: 100$, <24 hrs? 2000 Sequencing the Human Genome 10 8 6 Log10(price) 4 2 2005 2010 Year
In this Talk: • Sequencing 1.0: Sanger • Assembly • Next generation sequencing (NGS) • NGS applications • Future directions
Genome Sequencing • Goal • figuring the order of nucleotides across a genome • Problem • Current DNA sequencing methods can handle only short stretches of DNA at once (<1-2Kbp) • Solution • Sequence and then use computers to assemble the small pieces
TG..GT TC..CC AC..GC CG..CA TT..TC TG..AC AC..GC GA..GC CT..TG AC..GC GT..GC AC..GC AA..GC AT..AT TT..CC Short DNA sequences ACGTGGTAACGTATACAC TAGGCCATAGTAATGGCG CACCCTTAGTGGCGTATACATA… ACGTGGTAATGGCGTATACACCCTTAGGCCATA ACGTGACCGGTACTGGTAACGTACACCTACGTGACCGGTACTGGTAACGTACGCCTACGTGACCGGTACTGGTAACGTATACACGTGACCGGTACTGGTAACGTACACCTACGTGACCGGTACTGGTAACGTACGCCTACGTGACCGGTACTGGTAACGTATACCTCT... Sequenced genome Genome Sequencing Genome Short fragments of DNA 5
Sanger Sequencing • Mix DNA with dNTPs and ddNTPs • Amplify • Run in Gel • Fragments migrate distance that isproportional to their size
Sanger Sequencing • Advantages • Long reads (~900bps) • Suitable for small projects • Disadvantages • Low throughput • Expensive
~(length―1,000) ~500 bp ~500 bp 15Kbp mates contig 2 contig 1 resolving repeats Better assembly of contigs, gap lengths estimation 2Kbp mates Assembly Cut DNA to larger pieces (2Kbp, 15Kbp) and sequence both ends of each piece (Fleischmann et al., 1994) 9
Lander and Waterman, 1988 Low coverage: A few pieces to assemble many contigs, many gaps High coverage: many pieces to assemble a few contigs, a few gaps Assembly: How Much DNA? Input Output
1990 2000 1980 Sanger Sequencing 2007: Global Ocean Sampling Expedition ~3,000 organisms, 7Gbp (Venter et al.) 1994: H. Influenzae 1.8 Mbp (Fleischmann et al.) 1982: lambda virus DNA stretches up to 30-40Kbp (Sanger et al.) 2001: H. Sapiens, D. Melanogaster 3 Gbp (Venter et al.)
Next Generation Sequencing: Why Now? • Motivation: HGP and its derivatives, personalized medicine • Short reads applications: (re-)sequencing, other methods (e.g. gene expression) • Advancements in technology
High Parallelism is Achieved in Polony Sequencing Sanger Polony
Generation of Polony array: DNA Beads (454, SOLiD) DNA Beads are generated using Emulsion PCR
Generation of Polony array: DNA Beads (454, SOLiD) DNA Beads are placed in wells
Generation of Polony array: Bridge-PCR (Solexa) DNA fragments are attached to array and used as PCR templates
Sequencing: Pyrosequencing (454) Complementary strand elongation: DNA Polymerase
Sequencing: Fluorescently labeled Nucleotides (Solexa) Complementary strand elongation: DNA Polymerase
Sequencing: Fluorescently Labeled Nucleotides (ABI SOLiD) Complementary strand elongation: DNA Ligase
Sequencing: Fluorescently Labeled Nucleotides (ABI SOLiD) 5 reading frames, each position is read twice
Single Molecule Sequencing: HeliScope • Direct sequencing of DNA molecules: no amplification stage • DNA fragments are attached to array • Potential benefits: higher throughput, less errors
Technology Summary *Source: Shendure & Ji, Nat Biotech, 2008
What, When and Why • Sanger: Small projects (less than 1Mbp) • 454: De-novo sequencing, metagenomics • Solexa, SOLiD, Heliscope: • Gene expression, protein-DNA interactions • Resequencing
Where Do We Go from Here? • Higher throughput, longer reads (Pacific BioSciences) • Computational bottleneck • Shift to sequencing-based technologies • Will it help to cure cancer?