600 likes | 604 Views
gggatttagc tcagttggg agagcgcca gactgaa ga t ttg gag g tcctgtgtt cgatccac agaattc gcacca. Share, Search, Merge, Check, Design: e.g. 3D & Sequence alignment. Harvard-MIT GtL Center Goals. Protein Complexes : Mass Spectrometry
E N D
gggatttagctcagttgggagagcgccagactgaa gat ttg gag gtcctgtgttcgatccacagaattcgcacca Share, Search, Merge, Check, Design:e.g. 3D & Sequence alignment
Harvard-MIT GtL Center Goals • Protein Complexes : Mass Spectrometry • multi-species-time-series & crosslinking • 2 Regulatory Networks : RNA array quantitation • 3 Microbial Communities, Biofilms : Polonies* • Tagged-strain-competition, Single Cell Activities. • 4 Computational Modeling: Metabolic Optimization • & 4D Cell modeling* (Workshop B*)
CO2 100 ppmv increase http://jan.ucc.nau.edu/~doetqp-p/courses/env470/Lectures/lec41/Lec41.htm
Energy & CO2 Fluxes 4x1013 kW of sunlight hits earth per year. We consume 2kW per person* 6x109 = 1010 kW. CO2 >370 ppm = 730 x1015 g globally, increase ~3 x1015 /yr. Ocean productivity = ~100 x1015 g/yr. Autotrophs: 1025 Prochlorococcus cells globally (108 per liter) Undone by Cyanophages & Heterotrophs: 2x1028 SAR11 cells in the oceans Pseudomonas & Caulobacter in a variety of soils & aquatic environments http://www.gsfc.nasa.gov/gsfc/service/gallery/fact_sheets/earthsci/terra/earths_energy_balance.htm http://clear.eawag.ch/models/optionenE.html Morris et al. Nature 2002 Dec 19-26;420(6917):806-10. http://hosting.uaa.alaska.edu/mhines/biol468/pages/carbon.html
HarvardMIT DOEGtL Center C.Ting Collaborating PIs: Chisholm, Polz, Church, Kolter, Ausubel, Lory, Laub, Kucherlapati
Biosystems Integrating Measures & Models Environment Metabolites RNAi Insertions SNPs DNA Proteins RNA Replication rate interactions Microbes Cancer & stem cells Darwinian optima In vitro replication Small multicellular organisms
Comparison of predicted with observed protein properties (abundance, localization, postsynthetic modifications)E.coli Link et al. 1997 Electrophoresis 18:1259-313 (Pub)
Multidimensional peptide measures (Optionally protein separation steps) 3rd 2nd
Prochlorococcus Proteogenomic Map Numberson top in basepairs. 1700 ORFs are predicted . Proteomic Model is based on Mass-spectrometry of peptides at 24h time points. DifferenceMapindicates new peptide regions. The 6 colors represent ORFs in the 6 reading frames .(Harvard-MIT GtL:Jaffe, Church, Lindell, Chisholm, et al. )
Circadian time-series (Prochlorococcus)RNA &protein quantitation: RNA (3 AM) RNA (3 AM) R2=.992R2=.635 Linear RegressionR2=.1 (Harvard-MIT GtL:Jaffe, Church, Lindell, Chisholm, et al. )
Goals 1& 2: RNAs & Proteins Next steps • Detect a higher fraction of peptides • (currently ~ 80% proteins, 87% peptides max, 19% average) • 2 Comparison of two Prochlorococcus isolates • (1700 vs 2500 genes, high vs low light adapted) • 3 Move from two time points to smooth series.
Biosystems Integrating Measures & Models Environment Metabolites RNAi Insertions SNPs DNA Proteins RNA Replication rate interactions Microbes Cancer & stem cells Darwinian optima In vitro replication Small multicellular organisms
Why we model cells? • Tests of understanding • Program minimal cells (100kbp) • Nanobiotechnology - new polymers • Manage complex systems • e.g. stem cells & ocean ecology
Suboptimality of mutants --integrating growth rate & flux data Minimization of MetabolicAdjustment (MoMA) for the analysis of non-optimal metabolic phenotypes Daniel Segre, Dennis Vitkup
MoMA/FBA REFERENCES - Haemophilus influenzae metabolism (Schilling andPalsson, J.Theor.Biol. 2000) - Escherichia coli metabolic network and gene deletions (Edwards and Palsson, PNAS 2000, BMC Bioinf. 2000) - Helicobacter pylori (Edwards, Schilling, Covert, Church, Palsson, J. Bact 2002) - Escherichia coli MOMA (Segre, Vitkup, & Church, PNAS 2003)
Fluxes include transport, & a growth flux Vtrans Membrane Vsyn Vdeg Xi Vgrowth Xi=const. vj=0 Growth: c1Xi+ c2X2+... +cmXm Biomass
Biomass Composition ATP GLY LEU coeff. in growth reaction ACCOA NADH FAD SUCCOA COA metabolites
FluxBalanceAnalysis core 2 1 Find max{Growth} using simplex Null(S)={v : Sv=0}
Can we use flux analysis to say something about suboptimal states ?
Flux ratios at each branch point yields optimal polymer composition for replication x,y are two of the 100s of flux dimensions
Projection can leave the mutant feasible space…so Quadratic programming (QP) to find the nearest point
Flux DataC009-limited 200 WT (LP) 180 7 8 160 140 9 120 10 Predicted Fluxes r=0.91 p=8e-8 100 11 14 13 12 3 1 80 60 40 16 20 2 6 5 15 4 17 18 0 0 50 100 150 200 Experimental Fluxes 250 250 Dpyk (LP) Dpyk (QP) 200 200 18 7 r=0.56 p=7e-3 8 r=-0.06 p=6e-1 150 150 7 8 2 Predicted Fluxes Predicted Fluxes 10 9 13 100 9 100 11 12 3 1 14 10 14 13 11 12 3 50 50 5 6 4 16 16 2 15 5 6 18 17 15 17 0 0 4 1 -50 -50 -50 0 50 100 150 200 250 -50 0 50 100 150 200 250 Experimental Fluxes Experimental Fluxes
Competitive growth data On minimal media negative small selection effect C 2 p-values 4x10-3 1x10-5 Novel redundancies Position effects
Replication rate of a whole-genome set of mutants Badarinarayana, et al. (2001) Nature Biotech.19: 1060
lysC 1 2 10.4 Replication rate challenge met: multiple homologous domains thrA 1 2 3 1.1 6.7 metL 1 2 3 1.8 1.8 Selective disadvantage in minimal media probes
Multiple mutations per gene Correlation between two selection experiments Badarinarayana, et al. (2001) Nature Biotech.19: 1060
Goals 3& 4: Populations and models Next steps • 1 Generate MOMA models for autotrophs • Comparison of models for multiple Prochlorococcus • & Pseudomonas genomes • Insertion & point mutant competitions for hard-to-grow species (e.g.. Prochlorococcus 24 hr doubling).
Harvard-MIT GtL Center Goals • Protein Complexes : Mass Spectrometry • multi-species-time-series & crosslinking • 2 Regulatory Networks : RNA array quantitation • 3 Microbial Communities, Biofilms : Polonies* • Tagged-strain-competition, Single Cell Activities. • 4 Computational Modeling: Metabolic Optimization • & 4D Cell modeling
Biosystems Integrating Measures & Models MOMA Darwinian (sub)optima Polonies (CD44 & cancer) Arrays&Mass-spec (circadian & cell cycle) Environment Metabolites DNA Proteins RNA interactions Microbes Cancer & stem cells In vitro replication multicellular organisms
GtL Workshop B: Experimental Technology Development and Integration Tue at 2 PM Co-Chairs – George Church, Harvard Medical School Ham Smith, Institute for Biological Energy Alternatives As we attempt to understand, protect, and/or engineer environmental microbial communities, we need to ask what sorts of data would most benefit our models and how to obtain these cost-effectively. For this session let us answer what small (or large) technological step are we taking toward these specific challenges: (1) microscopic methods capable of tracing the chain of a small genome, (2) quantitation of “all” peptide states (either in single cells or populations), (3) Sequencing at Mbp per $, and (4) automated designed genome engineering. The framework for the discussions will be the following questions: · What are the most useful technologies for our tasks/goals now and for the future? What are the major technological gaps that will need to be addressed to reach the GTL goals? To what extent will the technologies be developed by others? · How can technologies best be used to complement each other and strengthen the resulting research/insights? How do we promote the kind of synergistic interactions among the practitioners? Presentations by Joachim Frank (Wadsworth Center, New York State Department of Health) on Cryo-Electron Microscopy, Bob Hettich or Greg Hurst (ORNL) and Dick Smith (PNNL) on Mass spectrometry, Hoi-Ying Holman (Berkeley Lab) on FTIR imaging Steve Colson (PNNL) on optical imaging We would like to invite you to bring one viewgraph to share with the participants on your views about technologies needed to meet these challenges.
Biosystems Integrating Measures & Models Environment Metabolites RNAi Insertions SNPs DNA Proteins RNA Replication rate interactions Microbes Cancer & stem cells Darwinian optima In vitro replication Small multicellular organisms
Improving Models & Measures Why model? “Killer Applications”: Share, Search, Merge, Check, Design
Why improve measurements? Human genomes (6 billion)2 = 1019 bp Immune & cancer genome changes >1010 bp per time point RNA ends & splicing: in situ 1012 bits/mm3 Biodiversity: Environmental & lab evolution Compact storage 105 now to 1017 bits/ mm3 eventually & How? ($1K per genome, 108-1013 bits/$ ) • The issue is not speed, but integration. • Cost per 99.99% bp : Including Reagents, Personnel, • Equipment/5yr, Overhead/sq.m • Sub-mm scale : 1mm = femtoliter (10-15) • Instruments $2-50K per CPU
Projected costs determine when biosystems data overdetermination is feasible. In 1984, pre-HGP (fX, pBR322, etc.) 0.1bp/$, would have been $30B per human genome. In 2002, (de novo full vs. resequencing ) ABI/Perlegen/Lynx: $300M vs. $3M 103 bp/$(4 log improvement) Other data I/O (e.g. video) 1013 bits/$
Why single molecules? Integration from cells/genomes/RNAs to data Geometric constraints : Who’s “in cis” on a molecule, complex, or cell. e.g. DNA Haplotypes & RNA splice-forms
Polymerase colony (polony) PCR in a gel B A’ A’ A’ B B B A’ B B B A’ A’ A’ A’ B A’ B B Single Molecule From Library A’ Primer is Extended by Polymerase A Primer A has 5’ immobilizing Acrydite 1st Round of PCR Mitra & Church Nucleic Acids Res. 27: e34
Sequence polonies by sequential, fluorescent single-base extensions B B B’ B’ • Hybridize Universal Primer • Add Red(Cy3) dTTP. Wash. • Add Green(FITC) dCTP • Wash; Scan 3’ 5’ 3’ 5’ C G A T C G C G T . . .
$1K per diploid human sequence Input: Buccal cells, blood, or forensic samples. Output: Prioritized list of deviant bps (e.g. non-conservative). Raw data rate: 16 pixels/bp, 1Mpixel per 6sec/CPU = 24 CPU days. Amortization: 5 yr for camera/CPU/transport @ $50K total = $200 per 1011 bp Overhead: $200 /sq ft/yr * 40 sq.ft (400 cu.ft) = $40 Reagents: At 20 mm per (5 mm) polony and 40 bp reads means 10000 cm2 area, 800 ml of fluor dNTP, $100/mg = $40 5 ml PCR reactions = $200 Disposables: 500 slides = $50 Electricity: 2 kwatts 24hr*24days* 0.13$/kwatt-hr = $150 Labor for repair: 10% of instrument cost = $10 Labor for operation: Slide PCR, slide dips, scans, etc. = $20 R&D: Initially NIH grants (roughly 10%).
Inexpensive, off-the-shelf equipment Automated slide fluidics $4K MJR in situ Cycler $10K Microarray Scanner $26K+
Human Haplotype:CFTR gene45 kbp Rob Mitra Vincent Butty Jay Shendure Ben Williams
Quantitative removal of Fluorophores Rob Mitra
Sequencing multiple polonies Template ST30: 3' TCACGAGT Base added: (C) A G T (C) (A) G (T) C (A) 3' TCACGAGT AGTGCTCA (G) T C A Rob Mitra
Mutiple Image Alignment • Metric based on optimal coincidence of high intensity noise pixels over a matrix of local offsets • (0.4 pixel precision) Shendure
Polony exclusion principle &Single pixel sequences Mitra & Shendure