540 likes | 679 Views
small beads, big reads. Jay Shendure Church Lab 12-08-03. DNA SEQUENCING Sanger Sequencing (state-of-the-art): ~ 0.1 cents per unfinished base ~ $50,000,000 per 16x human genome coverage ~ 1 million bases per machine per day (~ 12 bases per second) But Personalized Genomics Requires…
E N D
small beads, big reads Jay Shendure Church Lab 12-08-03
DNA SEQUENCING • Sanger Sequencing (state-of-the-art): • ~ 0.1 cents per unfinished base • ~ $50,000,000 per 16x human genome coverage • ~ 1 million bases per machine per day (~ 12 bases per second) • But Personalized Genomics Requires… • ~ 0.000002 cents per unfinished base • ~ $1,000 per 16x human genome coverage • ~ 10 billion bases per instrument per day (~ 120,000 bases per second)
Short Term Aims • Sequence several E. Coli strains • 4.6 mbp genome • Resequencing, not de novo !! • 11x coverage = ~50 million bases per slide • Sequence several mammalian mRNA-tag libraries • 2 mllion tags per library • 25 bp per tag = ~50 million bases per slide
Strategy Overview (for short-term aims) • Generate flanked library with purely in vitro methods. (Nick) • Generate 1-micron “clonal beads” via emulsion PCR. (Greg) • FACS-sort amplified beads from “empty” beads. (Jun) • Sequence acrylamide-immoblized beads in parallel via FISSEQ protocol. (Jay)
(1) In Vitro Library Construction Strategy (Nick) • Shear & size-select genomic DNA to 200 bp fragments • Ligate 100-bp “forward” linker & 200-bp “reverse” linker • Size-select 500 bp fragments Unique segment Universal primer sequences
(2)Clonal PCR Amplification with Emulsions (Greg) • Protocol adopted from Dressman et al. (PNAS 2003) • Sub-picoliter aqueous compartments => isolated reaction chambers • Compartments also contain paramagnetic beads to which PCR products end up immoblized (via biotin-streptavidin interaction). • All copies of amplified DNA on same bead are the same, but amplified DNA on different beads is different (AMPLIFIED CLONALITY, just like polonies!) • ~100 million beads per PCR tube!!!
(3)FACS Sorting of Amplified vs. Empty Beads (Jun) • Poisson statistics limit the fraction of compartments that have a single template amplified and immoblized on a single bead. • FACS sorting permits rapid enrichment for beads bearing PCR products of clonally amplified template.
(4)Prepare for Sequencing… • Immobilize millions (to billions?) of beads in 6% acrylamide gel. • Denature second strand and hybridize Cy3-labeled universal sequencing primer adjacent to unique region of amplified templates. • Sequence in parallel as per a modified version of the standard FISSEQ protocol (with Cy5-SS-dNTPs)
Strategy Overview (for short-term goals) • Generate flanked library with purely in vitro methods. (Nick) • Generate 1-micron “clonal beads” via emulsion PCR. (Greg) • FACS-sort amplified beads from “empty” beads. (Jun) • Sequence acrylamide-immoblized beads in parallel via FISSEQ protocol. (Jay)
Clonal Amplification via Emulsion Based PCR • Library template = ~112-mer with 24 unique bases. • Generated via oligonuclotide “wobble” in synthesis. • Unique sequence alternates between C/A and G/T. • Library complexity of ~1.7 million. • Tube PCR in oil-aqueous emulsion. • Primer-loaded 1-micron beads (~100 million per tube) • Dilute template concentration. • Beads recovered and poured into polyacrylamide gel. • Sequence of 3rd base determined • Cy5-SS-dATP vs. Cy3-SS-dCTP
Clonal Amplification via Emulsion Based PCR • ~11,235 beads present on this frame. • ~96 have strong Cy3 signal. • ~46 have strong Cy5 signal. • ~3 have both strong Cy5 and Cy3 signal. • ~1% of beads are clonal. 99% are “empty”. • ~100 million beads => ~1 million useable beads (per PCR tube)
Emulsion PCR Optimization • Amplified Bead => ~31% of signal of “loaded” bead. • 2x template => 46% signal drop • 4x dNTPs, MgCl2 => 72% signal boost • 30 cycles -> 50 cycles => 72% signal boost
Strategy Overview (for short-term goals) • Generate flanked library with purely in vitro methods. (Nick) • Generate 1-micron “clonal beads” via emulsion PCR. (Greg) • FACS-sort amplified beads from “empty” beads. (Jun) • Sequence acrylamide-immoblized beads in parallel via FISSEQ protocol. (Jay)
Best Sequencing Experiment To Date • 3 uM beads, each loaded with 1 of 5 templates • Beads are immoblized in 6% acrylamide (no trapping!) • Hybridize universal sequencing primer and begin cycles. • Sequencing primer is Cy3 labeled • 28 FISSEQ quarter-cycles (C-A-G-T-C-A-G-T…) • Image Analysis of ~250 brightest beads in MATLAB
Sequencing Quarter-Cycle (30 minutes + scan-time) • Klenow Buffer Equilibration 1 minute • Single Base Extension 4 minutes • Wash 5 minutes • Scan (variable) • TCEP Cleavage 5 minutes • Wash 5 minutes • Wash 5 minutes • Wash 5 minutes (lots of room for temporal optimization, process automation)
FISSEQ(C..A..G..T) x 7 = 28 quarter-cycles T1 CACACACACACACACTCCACCA T2 GTGTGTGTGTGTGTGTCCACCA T3 AGTGCTCACACACGTGATCCAC T4 CAGCCGAACGACCGATCCACCA T5 ATGTGAGAGCTGTCGTCCACCA
Scanning Setup • Only captured single frame per cycle (~ 0.5 mm2) • 10x Objective (~0.5 pixels per micron) • 1600 x 1200 pixels (8-bit), ~1 to 2 second exposures • 21 pixels per 3-micron bead used in image analysis • Installation of XY motor stage system (happening today) will instantly increase throughput 1000-fold. • Plan is to switch to 20x objective (for 1 micron beads)
Image Analysis (MATLAB) • Align images • Select beads to process • Sum Cy5 and Cy3 bead-intensities over 21-pixel areas • Normalize by Cy5 values by corresponding Cy3 values • Apply manually-set thresholds • Derive sequences for each bead from signature patterns of “add” vs. “no-add” at each cycle • Compare sequence signatures to known template sequences to determine accuracy.
Image Alignment y x
y x
y x
Sum Signal for Cy5 and Cy3 for Each Bead at Each CycleNormalize by Cy5 (on base) by Cy3 (on primer) 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 0
Cy5 Cy3
y x
y x
y x
y x
y x
y x
y x
y x
y x
Templates Beads Cycle Number ->
Summary of Results • 252 sequencing reads • 3,495 bases sequenced • Read lengths of 13-16 bases • Library complexity = 5 • 72 unique bases @ mean-fold-coverage of ~48
Normalized Sequencing Error Analysis Templates Cycle Number ->
Speculation on Sources of Error • (1) Signal-to-noise decay • (2) Progressive dephasing • (3) Cumulative misincorporation events • (4) Local “slippage” mispriming • All of the above? • Error is probably “local” in nature. Cy5 Cy3 T1 CACACACACACACACTCCACCA T2 GTGTGTGTGTGTGTGTCCACCA T3 AGTGCTCACACACGTGATCCAC T4 CAGCCGAACGACCGATCCACCA T5 ATGTGAGAGCTGTCGTCCACCA
How much room for rapid improvement? • 50e6 / 3495 => 14306-fold improvement is required. • (21*252)/(1200*1600) => 0.2% of area in frame is utilized. • One frame = 0.5 mm2 => 0.02% of full slide surface. • 10,000-fold improvement will be straight-forward. • Frame utilization -> 2% (higher bead density, better software) • Slide utilization -> 10% (XY motor on stage) • Boosting read-length by requisite 50% might be trickier.
The Homopolymer Issue • Resequencing and tag-sequencing DOES NOT require that this problem be perfectly solved, provided: (a) Reads can be confidently matched to unique locations or identities. (b) Small differences cause reproducible changes to sequencing traces. Cy5 Cy3
How many beads can we sequence on one slide? (present-day) 30,000 beads per frame => 30 million beads per slide
How many beads can we sequence on one slide? (at the extreme) Mitra Lab Self-organizing-monolayer 1 micron beads MAX = ~2 billion per slide
AUTOMATION [GP & Jim Horn] GRUNT I GRUNT II