290 likes | 382 Views
Polymerase colonies & Fisseq. 1-Apr-2003 Santa Fe. Thanks to: DOE HGP & GtL & DARPA BioComp Wash U: Rob Mitra HMS: Jay Shendure, Jun Zhu, Vincent Butty, Ben Williams. U. Del: Jeremy Edwards, Josh Merritt Ambergen: Jerzy Olejnik.
E N D
Polymerase colonies & Fisseq 1-Apr-2003 Santa Fe Thanks to: DOE HGP & GtL & DARPA BioComp Wash U: Rob Mitra HMS: Jay Shendure, Jun Zhu, Vincent Butty, Ben Williams. U. Del: Jeremy Edwards, Josh Merritt Ambergen: Jerzy Olejnik
gggatttagctcagttgggagagcgccagactgaa gat ttg gag gtcctgtgttcgatccacagaattcgcacca Modeling successes:3D & Sequence alignment
Biosystems Integrating Measures & Models Environment Metabolites RNAi Insertions SNPs DNA Proteins RNA Replication rate interactions Microbes Cancer & stem cells Darwinian optima In vitro replication Small multicellular organisms
Improving Models & Measures Why model? “Killer Applications”: Share, Search, Merge, Check, Design
Why improve measurements? Human genomes (6 billion)2 = 1019 bp Immune & cancer genome changes >1010 bp per time point RNA ends & splicing: in situ 1012 bits/mm3 Biodiversity: Environmental & lab evolution Compact storage 105 now to 1017 bits/ mm3 eventually & How? ($1K per genome, 108-1013 bits/$ ) • The issue is not speed, but integration. • Cost per 99.99% bp : Including Reagents, Personnel, • Equipment/5yr, Overhead/sq.m • Sub-mm scale : 1mm = femtoliter (10-15) • Instruments should match GHz / $2K CPU
Examples of cost bottlenecks Affymetrix $30M? microfabricator limited by chemical reaction rate to one set of chips per day. Electrophoresis limited to 4000 bp/capillary/day. Fix cost ratio of capillaries to CPUs.
Projected costs determine when biosystems data overdetermination is feasible. In 1984, pre-HGP (fX, pBR322, etc.) 0.1bp/$, would have been $30B per human genome. In 2002, (de novo full vs. resequencing ) ABI/Perlegen/Lynx: $300M vs. $3M 103 bp/$(4 log improvement) Other data I/O (e.g. video) 1013 bits/$
Steeper than exponential growth Kurzweil/Moore's law of ICs 1965 http://www.faughnan.com/poverty.html http://www.kurzweilai.net/meme/frame.html?main=/articles/art0184.html
New sequencing approaches in commercial R&D Method liter/bp Length Error Test-set $/device bp/hr Capil mfluidics e-6 600 <0.1% 1e11 350k 80k ABI, Amersham, GenoMEMS, Caliper*, RTS* SeqByHyb e-12 1 <5% 1e9 200k 1M Perlegen-Affymetrix*, Xeotron* Mass Spectrometry Sequenom, Bruker* Single molecule >e-24 >>40 ? >80 30k-1M 180k Pore(Agilent*) Fluor(USGenomics, Solexa) FRET(VisiGen,Mobious,Caltech) In vitro DNA-Amplification (e.g. Polonies) -- Multiplex cycles: Lynx* e-15 20 <3% 1e7 ? 1M Pyroseq.* e-6 >40 <1% 1e6 100k 5k HMS* e-13 >35 <1% 40 90k >1M? ParAllele, 454, RTS* *Church lab involvement
Why single molecules? (1) Integration from cells/genomes/RNAs to data (2) Geometry, “cis-ness” on a molecule, complex, or cell. e.g. DNA Haplotypes & RNA splice-forms (3) Asynchronous dNTP incorporation
Polymerase colony (polony) PCR in a gel B A’ A’ A’ B B B A’ B B B A’ A’ A’ A’ B A’ B B Single Molecule From Library A’ Primer is Extended by Polymerase A Primer A has 5’ immobilizing Acrydite 1st Round of PCR Mitra & Church Nucleic Acids Res. 27: e34
Sequence polonies by sequential, fluorescent single-base extensions B B B’ B’ • Hybridize Universal Primer • Add Red(Cy3) dTTP. Wash. • Add Green(FITC) dCTP • Wash; Scan 3’ 5’ 3’ 5’ C G A T C G C G T . . .
Inexpensive, off-the-shelf equipment Automated slide fluidics $4K MJR in situ Cycler $10K Microarray Scanner $26K-100K
Human Haplotype:CFTR gene45 kbp Rob Mitra Vincent Butty Jay Shendure Ben Williams
Quantitative removal of Fluorophores Rob Mitra
Sequencing multiple polonies Template ST30: 3' TCACGAGT Base added: (C) A G T (C) (A) G (T) C (A) 3' TCACGAGT AGTGCTCA (G) T C A Rob Mitra
Mutiple Image Alignment • Metric based on optimal coincidence of high intensity noise pixels over a matrix of local offsets • (0.4 pixel precision)
Polony exclusion principle &Single pixel sequences Mitra & Shendure
Biosystems Integrating Measures & Models Environment Metabolites RNAi Insertions SNPs DNA Proteins RNA Replication rate interactions Microbes Cancer & stem cells Darwinian optima In vitro replication Small multicellular organisms
CD44 Exon Combinatorics (Zhu & Shendure) Alternatively Spliced Cell Adhesion Molecule Specific variable exons are up-or-down-regulated in various cancers Controversial prospective diagnostic / prognostic marker (>1000 papers) Can full isoforms resolve controversy and/or act as superior markers? Eph4 = murine mammary epthithelial cell line Eph4bDD = stable transfection of Eph4 with MEK-1 (tumorigenic)
Trial & Error Derived Algorithm for Polony Finding 1. Search Signature Image for qualified ‘objects’ a. > 50 connected pixels with same signature value b. ‘solidity’ of > 0.50 c. long axis / short axis ratio < 3 OR a. > 25 connected pixels with same signature value b. ‘solidity’ of > 0.80 c. long axis / short axis ratio < 1.5 2. Search for internal regional maxima within each object (lest two adjacent polonies with same signature get counted as one) 3. Assign centroid locations as qualified individual ‘polonies’
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
Summary of Counts (isoforms) Eph4 = murine mammary epthithelial cell line Eph4bDD = stable transfection of Eph4 with MEK-1 (tumorigenic) Jun Zhu
Polony Flavors • Replica Plating of DNA images [Mitra et al. NAR 1999] • Long Range Haplotyping [Mitra et al. PNAS 2003] • Allelic mRNA Quantitation (HEP) [Mitra et al. in prep] • Alternative Splicing Combinatorics [Zhu et al. 2003] • Precise SNP-mutant & mRNA ratios [Merritt et al. 2003] • Fluorescent in situ Sequencing (FISSEQ) [Mitra et al. 2003] • Multiplex Genotyping [ApoE, Hyman, Shendure & Williams] • In situ / single-cell extensions of the above [Zhu & Williams]
Next steps • Scale up slide making • Anchor points in long DNA (mini-Tn vs tagged-random primers) • Runs • a. Signature • b. Quantitate • c. Terminatiors
Long-range continuity inspired by DNA-Fiber Fluorescent In Situ Hybridization 129 bp mini Tn5 300 kb = 100 microns http://allserv.rug.ac.be/~fspelema/neubla/content/images_r.htm