1 / 54

small beads, big reads

small beads, big reads. Jay Shendure Church Lab 12-08-03. DNA SEQUENCING Sanger Sequencing (state-of-the-art): ~ 0.1 cents per unfinished base ~ $50,000,000 per 16x human genome coverage ~ 1 million bases per machine per day (~ 12 bases per second) But Personalized Genomics Requires…

Download Presentation

small beads, big reads

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. small beads, big reads Jay Shendure Church Lab 12-08-03

  2. DNA SEQUENCING • Sanger Sequencing (state-of-the-art): • ~ 0.1 cents per unfinished base • ~ $50,000,000 per 16x human genome coverage • ~ 1 million bases per machine per day (~ 12 bases per second) • But Personalized Genomics Requires… • ~ 0.000002 cents per unfinished base • ~ $1,000 per 16x human genome coverage • ~ 10 billion bases per instrument per day (~ 120,000 bases per second)

  3. Short Term Aims • Sequence several E. Coli strains • 4.6 mbp genome • Resequencing, not de novo !! • 11x coverage = ~50 million bases per slide • Sequence several mammalian mRNA-tag libraries • 2 mllion tags per library • 25 bp per tag = ~50 million bases per slide

  4. POLONY FISSEQ(Mitra et al. 2003)

  5. Strategy Overview (for short-term aims) • Generate flanked library with purely in vitro methods. (Nick) • Generate 1-micron “clonal beads” via emulsion PCR. (Greg) • FACS-sort amplified beads from “empty” beads. (Jun) • Sequence acrylamide-immoblized beads in parallel via FISSEQ protocol. (Jay)

  6. (1) In Vitro Library Construction Strategy (Nick) • Shear & size-select genomic DNA to 200 bp fragments • Ligate 100-bp “forward” linker & 200-bp “reverse” linker • Size-select 500 bp fragments Unique segment Universal primer sequences

  7. (2)Clonal PCR Amplification with Emulsions (Greg) • Protocol adopted from Dressman et al. (PNAS 2003) • Sub-picoliter aqueous compartments => isolated reaction chambers • Compartments also contain paramagnetic beads to which PCR products end up immoblized (via biotin-streptavidin interaction). • All copies of amplified DNA on same bead are the same, but amplified DNA on different beads is different (AMPLIFIED CLONALITY, just like polonies!) • ~100 million beads per PCR tube!!!

  8. (3)FACS Sorting of Amplified vs. Empty Beads (Jun) • Poisson statistics limit the fraction of compartments that have a single template amplified and immoblized on a single bead. • FACS sorting permits rapid enrichment for beads bearing PCR products of clonally amplified template.

  9. (4)Prepare for Sequencing… • Immobilize millions (to billions?) of beads in 6% acrylamide gel. • Denature second strand and hybridize Cy3-labeled universal sequencing primer adjacent to unique region of amplified templates. • Sequence in parallel as per a modified version of the standard FISSEQ protocol (with Cy5-SS-dNTPs)

  10. Fluorescent In Situ Sequencing (FISSEQ)

  11. Strategy Overview (for short-term goals) • Generate flanked library with purely in vitro methods. (Nick) • Generate 1-micron “clonal beads” via emulsion PCR. (Greg) • FACS-sort amplified beads from “empty” beads. (Jun) • Sequence acrylamide-immoblized beads in parallel via FISSEQ protocol. (Jay)

  12. Clonal Amplification via Emulsion Based PCR • Library template = ~112-mer with 24 unique bases. • Generated via oligonuclotide “wobble” in synthesis. • Unique sequence alternates between C/A and G/T. • Library complexity of ~1.7 million. • Tube PCR in oil-aqueous emulsion. • Primer-loaded 1-micron beads (~100 million per tube) • Dilute template concentration. • Beads recovered and poured into polyacrylamide gel. • Sequence of 3rd base determined • Cy5-SS-dATP vs. Cy3-SS-dCTP

  13. Clonal Amplification via Emulsion Based PCR • ~11,235 beads present on this frame. • ~96 have strong Cy3 signal. • ~46 have strong Cy5 signal. • ~3 have both strong Cy5 and Cy3 signal. • ~1% of beads are clonal. 99% are “empty”. • ~100 million beads => ~1 million useable beads (per PCR tube)

  14. Emulsion PCR Optimization • Amplified Bead => ~31% of signal of “loaded” bead. • 2x template => 46% signal drop • 4x dNTPs, MgCl2 => 72% signal boost • 30 cycles -> 50 cycles => 72% signal boost

  15. Strategy Overview (for short-term goals) • Generate flanked library with purely in vitro methods. (Nick) • Generate 1-micron “clonal beads” via emulsion PCR. (Greg) • FACS-sort amplified beads from “empty” beads. (Jun) • Sequence acrylamide-immoblized beads in parallel via FISSEQ protocol. (Jay)

  16. Best Sequencing Experiment To Date • 3 uM beads, each loaded with 1 of 5 templates • Beads are immoblized in 6% acrylamide (no trapping!) • Hybridize universal sequencing primer and begin cycles. • Sequencing primer is Cy3 labeled • 28 FISSEQ quarter-cycles (C-A-G-T-C-A-G-T…) • Image Analysis of ~250 brightest beads in MATLAB

  17. Sequencing Quarter-Cycle (30 minutes + scan-time) • Klenow Buffer Equilibration 1 minute • Single Base Extension 4 minutes • Wash 5 minutes • Scan (variable) • TCEP Cleavage 5 minutes • Wash 5 minutes • Wash 5 minutes • Wash 5 minutes (lots of room for temporal optimization, process automation)

  18. FISSEQ(C..A..G..T) x 7 = 28 quarter-cycles T1 CACACACACACACACTCCACCA T2 GTGTGTGTGTGTGTGTCCACCA T3 AGTGCTCACACACGTGATCCAC T4 CAGCCGAACGACCGATCCACCA T5 ATGTGAGAGCTGTCGTCCACCA

  19. Scanning Setup • Only captured single frame per cycle (~ 0.5 mm2) • 10x Objective (~0.5 pixels per micron) • 1600 x 1200 pixels (8-bit), ~1 to 2 second exposures • 21 pixels per 3-micron bead used in image analysis • Installation of XY motor stage system (happening today) will instantly increase throughput 1000-fold. • Plan is to switch to 20x objective (for 1 micron beads)

  20. Image Analysis (MATLAB) • Align images • Select beads to process • Sum Cy5 and Cy3 bead-intensities over 21-pixel areas • Normalize by Cy5 values by corresponding Cy3 values • Apply manually-set thresholds • Derive sequences for each bead from signature patterns of “add” vs. “no-add” at each cycle • Compare sequence signatures to known template sequences to determine accuracy.

  21. Image Alignment y x

  22. y x

  23. y x

  24. Sum Signal for Cy5 and Cy3 for Each Bead at Each CycleNormalize by Cy5 (on base) by Cy3 (on primer) 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 0

  25. Cy5 Cy3

  26. y x

  27. y x

  28. y x

  29. y x

  30. y x

  31. y x

  32. y x

  33. y x

  34. y x

  35. Templates Beads Cycle Number ->

  36. Summary of Results • 252 sequencing reads • 3,495 bases sequenced • Read lengths of 13-16 bases • Library complexity = 5 • 72 unique bases @ mean-fold-coverage of ~48

  37. Normalized Sequencing Error Analysis

  38. Normalized Sequencing Error Analysis Templates Cycle Number ->

  39. Speculation on Sources of Error • (1) Signal-to-noise decay • (2) Progressive dephasing • (3) Cumulative misincorporation events • (4) Local “slippage” mispriming • All of the above? • Error is probably “local” in nature. Cy5 Cy3 T1 CACACACACACACACTCCACCA T2 GTGTGTGTGTGTGTGTCCACCA T3 AGTGCTCACACACGTGATCCAC T4 CAGCCGAACGACCGATCCACCA T5 ATGTGAGAGCTGTCGTCCACCA

  40. How much room for rapid improvement? • 50e6 / 3495 => 14306-fold improvement is required. • (21*252)/(1200*1600) => 0.2% of area in frame is utilized. • One frame = 0.5 mm2 => 0.02% of full slide surface. • 10,000-fold improvement will be straight-forward. • Frame utilization -> 2% (higher bead density, better software) • Slide utilization -> 10% (XY motor on stage) • Boosting read-length by requisite 50% might be trickier.

  41. The Homopolymer Issue • Resequencing and tag-sequencing DOES NOT require that this problem be perfectly solved, provided: (a) Reads can be confidently matched to unique locations or identities. (b) Small differences cause reproducible changes to sequencing traces. Cy5 Cy3

  42. How much read-length is needed? (E. coli)

  43. How much read-length is needed? (Agencourt LongSage)

  44. How many beads can we sequence on one slide? (present-day) 30,000 beads per frame => 30 million beads per slide

  45. How many beads can we sequence on one slide? (at the extreme) Mitra Lab Self-organizing-monolayer 1 micron beads MAX = ~2 billion per slide

  46. AUTOMATION [GP & Jim Horn] GRUNT I GRUNT II

More Related