580 likes | 694 Views
BioSystems Synthesis: New optima demand new technologies. 17-Sep-2003 Virtual Conference on Genomics & Bioinformatics. Thanks to: DOE GtL DARPA BioComp PhRMA NHLBI. Harvard MIT DOE GtL Center. C.Ting. Collaborating PIs: Chisholm, Polz, Church, Kolter, Ausubel, Lory, Kucherlapati.
E N D
BioSystems Synthesis: New optima demand new technologies 17-Sep-2003 Virtual Conference on Genomics & Bioinformatics Thanks to: DOE GtL DARPA BioComp PhRMA NHLBI
HarvardMIT DOEGtL Center C.Ting Collaborating PIs: Chisholm, Polz, Church, Kolter, Ausubel, Lory, Kucherlapati
Improving Models & Measures Why model? “Killer Applications”: Share, Search, Merge, Check, Design (e.g. sequence & 3D alignment)
Biosystems Integrating Measures & Models Environment Metabolites RNAi Insertions SNPs DNA Proteins RNA Replication rate interactions Microbes Cancer & stem cells Darwinian optima In vitro replication Small multicellular organisms
Why improve measurements? Human genomes (6 billion)2 = 1019 bp Immune & cancer genome changes >1010 bp per time point RNA ends & splicing: in situ 1012 bits/mm3 Biodiversity: Environmental & lab evolution Compact storage 105 now to 1017 bits/ mm3 eventually & How? ($1K per genome, 108-1013 bits/$ ) • The issue is not speed, but integration. • Cost per 99.99% bp : Including Reagents, Personnel, • Equipment/5yr, Overhead/sq.m • Sub-mm scale : 1mm = femtoliter (10-15) • Instruments should match GHz / $2K CPU
Examples of cost bottlenecks Affymetrix $30M? microfabricator limited by chemical reaction rate to one set of chips per day. (~10000X CPU cost) Electrophoresis limited to 4000 bp/capillary/day. Fixed cost ratio of capillaries to CPUs. (~1e9X CPU cost)
Projected costs determine when biosystems data overdetermination is feasible. In 1984, pre-HGP (fX, pBR322, etc.) 0.1bp/$, would have been $30B per human genome. In 2002, (de novo full vs. resequencing ) ABI/Perlegen/Lynx: $300M vs. $3M 103 bp/$(4 log improvement) Other data I/O (e.g. video) 1013 bits/$
Steeper than exponential growth Instructions Per Second 1965 Moore's law of integrated circuits 1999 Kurzweil’s law http://www.faughnan.com/poverty.html http://www.kurzweilai.net/meme/frame.html?main=/articles/art0184.html
Why single molecules? (1) Integrate from cells/genomes/RNAs to data (2) Geometry, “cis-ness” on a molecule, complex, or cell. e.g. DNA Haplotypes & RNA splice-forms (3) Asynchronous dNTP incorporation
Polymerasecolonies(Polonies) along a DNAor RNAmolecule HMS: Shendure, Zhu, Butty, Williams Wash U: Mitra Ambergen: Olejnik U. Del: Edwards, Merritt
Polymerase colony (polony) PCR in a gel B A’ A’ A’ B B B A’ B B B A’ A’ A’ A’ B A’ B B Single Molecule From Library A’ Primer is Extended by Polymerase A Primer A has 5’ immobilizing Acrydite 1st Round of PCR Mitra & Church Nucleic Acids Res. 27: e34
Sequence polonies by sequential, fluorescent single-base extensions B B B’ B’ • Hybridize Universal Primer • Add Red(Cy3) dTTP. Wash. • Add Green(FITC) dCTP • Wash; Scan 3’ 5’ 3’ 5’ C G A T C G C G T . . .
Inexpensive, off-the-shelf equipment Automated slide fluidics $4K MJR in situ Cycler $10K Microarray Scanner $26K-100K
Human Haplotype:CFTR gene45 kbp Rob Mitra Vincent Butty Jay Shendure Ben Williams
Quantitative removal of Fluorophores Rob Mitra
Sequencing multiple polonies Template ST30: 3' TCACGAGT Base added: (C) A G T (C) (A) G (T) C (A) 3' TCACGAGT AGTGCTCA (G) T C A Rob Mitra
Multiple Image Alignment • Metric based on optimal coincidence of high intensity noise pixels over a matrix of local offsets • (0.4 pixel precision)
1 micron bead sequences Correct signatures are pseudocolored red,white, yellow; noise signatures blue; and “guide” beads green.
Polony exclusion principle &Single pixel sequences Mitra & Shendure
Biosystems Integrating Measures & Models Environment Metabolites RNAi Insertions SNPs DNA Proteins RNA Replication rate interactions Microbes Cancer & stem cells Darwinian optima In vitro replication Small multicellular organisms
CD44 Exon Combinatorics (Zhu & Shendure) Alternatively Spliced Cell Adhesion Molecule Specific variable exons are up-or-down-regulated in various cancers Controversial prospective diagnostic / prognostic marker (>1000 papers) Can full isoforms resolve controversy and/or act as superior markers? Eph4 = murine mammary epthithelial cell line Eph4bDD = stable transfection of Eph4 with MEK-1 (tumorigenic)
Algorithm for RNA Polony Finding 1. Search Signature Image for qualified ‘objects’ a. > 50 connected pixels with same signature value b. ‘solidity’ of > 0.50 c. long axis / short axis ratio < 3 OR a. > 25 connected pixels with same signature value b. ‘solidity’ of > 0.80 c. long axis / short axis ratio < 1.5 2. Search for internal regional maxima within each object (lest two adjacent polonies with same signature get counted as one) 3. Assign centroid locations as qualified individual ‘polonies’
RNA exon polony examples
RNA exon examplesauto-regridded& quan-titated V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
Summary of Counts (RNA isoforms) Eph4 = murine mammary epthithelial cell line Eph4bDD = stable transfection of Eph4 with MEK-1 (tumorigenic) Jun Zhu
PolonyFlavors • Replica plating of DNA images [Mitra et al. NAR 1999] • Alternative RNA splicing combinatorics [Zhu et al. Science 2003] • Long range haplotyping [Mitra et al. PNAS 2003] • Precise SNP-mutant & mRNA ratios [Merritt et al. NAR 2003] • Fluorescent in situ Sequencing (FISSEQ) [Mitra et al. An.Bioch2003] • Tumor LOH [Butz et al BMC Biotech. 2003] • Polony models [Aach & Church, submitted to JTB 2003] • http://arep.med.harvard.edu/Polonator/
Biosystems Integrating Measures & Models Environment Metabolites RNAi Insertions SNPs DNA Proteins RNA Replication rate interactions Microbes Cancer & stem cells Darwinian optima In vitro replication Small multicellular organisms
Comparison of predicted with observed protein properties (abundance, localization, postsynthetic modifications)E.coli Link et al. 1997 Electrophoresis 18:1259-313 (Pub)
Multidimensional peptide measures (Optionally protein separation steps) 3rd 2nd
Prochlorococcus Proteogenomic Map Numberson top in basepairs. 1700 ORFs are predicted . Proteomic Model is based on Mass-spectrometry of peptides at 24h time points. DifferenceMapindicates new peptide regions. The 6 colors represent ORFs in the 6 reading frames .(Harvard-MIT GtL:Jaffe, Church, Lindell, Chisholm, et al. )
Circadian time-series (Prochlorococcus)RNA &protein quantitation: RNA (3 AM) RNA (3 AM) R2=.992R2=.635 Linear RegressionR2=.1 (Harvard-MIT GtL:Jaffe, Church, Lindell, Chisholm, et al. )
RNAs & Proteomics Integration: Next steps • Detect a higher fraction of peptides • (currently ~ 80% proteins, 87% peptides max, 19% average) • 2 Comparative proteomics, e.g. high vs low light adapted) • Smoother time-series. • Degradation
Biosystems Integrating Measures & Models Environment Metabolites RNAi Insertions SNPs DNA Proteins RNA Replication rate interactions Microbes Cancer & stem cells Darwinian optima In vitro replication Small multicellular organisms
Synthetic Biology • Test or manipulate optimality • Program minimal cells (100kbp) • Nanobiotechnology - new polymers • Manage complex systems • e.g. stem cells & ocean ecology
Suboptimality of mutants --integrating growth rate & flux data Minimization of MetabolicAdjustment (MoMA) for the analysis of non-optimal metabolic phenotypes Daniel Segre, Dennis Vitkup
MoMA/FBA REFERENCES - Haemophilus influenzae metabolism (Schilling andPalsson, J.Theor.Biol. 2000) - Escherichia coli metabolic network and gene deletions (Edwards and Palsson, PNAS 2000, BMC Bioinf. 2000) - Helicobacter pylori (Edwards, Schilling, Covert, Church, Palsson, J. Bact 2002) - Escherichia coli MOMA (Segre, Vitkup, & Church, PNAS 2003)
Fluxes include transport, & a growth flux Vtrans Membrane Vsyn Vdeg Xi Vgrowth Xi=const. vj=0 Growth: c1Xi+ c2X2+... +cmXm Biomass
Biomass Composition ATP GLY LEU coeff. in growth reaction ACCOA NADH FAD SUCCOA COA metabolites
FluxBalanceAnalysis core 2 1 Find max{Growth} using simplex Null(S)={v : Sv=0}
Can we use flux analysis to say something about suboptimal states ?
Flux ratios at each branch point yields optimal polymer composition for replication x,y are two of the 100s of flux dimensions
Projection can leave the mutant feasible space…so Quadratic programming (QP) to find the nearest point
Flux DataC009-limited 200 WT (LP) 180 7 8 160 140 9 120 10 Predicted Fluxes r=0.91 p=8e-8 100 11 14 13 12 3 1 80 60 40 16 20 2 6 5 15 4 17 18 0 0 50 100 150 200 Experimental Fluxes 250 250 Dpyk (LP) Dpyk (QP) 200 200 18 7 r=0.56 p=7e-3 8 r=-0.06 p=6e-1 150 150 7 8 2 Predicted Fluxes Predicted Fluxes 10 9 13 100 9 100 11 12 3 1 14 10 14 13 11 12 3 50 50 5 6 4 16 16 2 15 5 6 18 17 15 17 0 0 4 1 -50 -50 -50 0 50 100 150 200 250 -50 0 50 100 150 200 250 Experimental Fluxes Experimental Fluxes
Competitive growth data On minimal media negative small selection effect C 2 p-values 4x10-3 1x10-5 Novel redundancies Position effects
Replication rate of a whole-genome set of mutants Badarinarayana, et al. (2001) Nature Biotech.19: 1060
lysC 1 2 10.4 Replication rate challenge met: multiple homologous domains thrA 1 2 3 1.1 6.7 metL 1 2 3 1.8 1.8 Selective disadvantage in minimal media probes