560 likes | 746 Views
Project CyberCell. David Wishart University of Alberta Computational Cell Biology, MITRE Inc. McLean, Virginia, USA Sept. 21-22, 2004. Why Simulate a Cell?. Physicists Chemists Biologists. Why Choose E. coli?. Fully sequenced (re-sequenced and re-annotated in June 2004)
E N D
Project CyberCell David Wishart University of Alberta Computational Cell Biology, MITRE Inc. McLean, Virginia, USA Sept. 21-22, 2004
Why Simulate a Cell? • Physicists • Chemists • Biologists
Why Choose E. coli? • Fully sequenced (re-sequenced and re-annotated in June 2004) • Studied for 60+ years, widely used, easily manipulated and grown • Simple cellular structure, no organelles, simple genetic structure • Almost all genes, pathways, proteins & metabolites are known or characterized • Lots of data…
International E. coli Alliance http://www.uni-giessen.de/~gx1052/IECA/ieca.html
Why E. coli? -- Potential Applications • Engineering Bugs to be Better Chemical or Protein Factories (Synthetic biology) • Engineering Bugs to Produce Novel Chemicals (wholesale metabolic transfer) • Bioremediation, Energy Production • Engineering Bugs to be “Intelligent” Drug Delivery Vehicles (nanobots) • Finding New Approaches to Treat Infectious Diseases
How to do it? • Genomics Genometrics • Proteomics Proteometrics • Metabolomics Metabometrics • Phenomics Phenometrics • Bioinformatics Biosimulation • Quantify, quantify, quantify
Three-Pronged Process(Project CyberCell) Data Mining Exp. Data Computer Backfilling Collection Simulation What you know What you don’t know What you want to know
Backfilling: The CCDB http://redpoll.pharmacy.ualberta.ca/CCDB Google Search: CCDB coli
The CyberCell Database (CCDB) • Most complete, current, quantitative collection of molecular data on E. coli • Integrates data from 100’s of articles, ~30 databases, updates automatically from ~5 DBs • Web accessible, Web browsable • Supports many kinds of query, viewing and browsing options • Structured using “ColiCards” as in the GeneCards database (includes MetaboCards, RNACards, StructCards, ChemCards, etc.)
ColiCard Contents • Functional info (predicted or known) • Sequence information (sites, modifications, pI, MW, cleavage) • Location information (in chromosome & cell) • Interacting partners (known & predicted) • Structure (2o, 3o, 4o, predicted) • Enzymatic rate and binding constants • Abundance, copy number, concentration • Links to other sites & viewing tools • Integrated version of all major DBs • 70+ fields for each entry
E. coli Statistics • Diffusion rates • Copy numbers • Transcription rates • Translation rates • Synthetic rates • Volumes • Dimensions • Energetics • Velocities • Etc. etc.
Searching Capabilities • Text search, BLAST search, SQL search • “Show all membrane proteins that are essential and have more than 6 membrane spanning regions” • Chemical Structure search • “Find all metabolites similar to this prospective drug structure:”
Three-Pronged Process Data Mining Exp. Data Computer Backfilling Collection Simulation
E. coli’s Pyramid of Life Metabolomics Proteomics Genomics 811 Chemicals 1152 Enzymes 4269 Genes
Global Expt. Efforts • Knockouts/minimal genomes • Blattner-Wisconsin, Wanner-Purdue, Tomita-Keio • Expression/promoter analysis • Weiner-Alberta, Church-Harvard, Surette-Calgary, Emili-Toronto, Mori-Nara • Unknown function ID • Brown-McMaster, Edwards-Toronto, Thomas-York • Structural Proteomics • Cygler-Montreal,Kunishima-RIKEN,Joachimiak-MCSG • Metabolomics • Wishart-Alberta, Nishioka-Kyoto, Wanner-Purdue
Annotations to Date Unique Spot ID proteins Cytoplasm 1142 650 Periplasm 170 120 Inner membrane 711 350 Outer Membrane 381 40 Total 2,404 1,150
M9-Glucose MOPS
Mixture Compound A Compound B Compound C Spectral Deconvolution of a Mixture Containing Compounds A, B and C
(+)-(-)-Methylsuccinic Acid 2,5-Dihydroxyphenylacetic Acid 2-hydroxy-3-methylbutyric acid 2-Oxoglutaric acid 3-Hydroxy-3-methylglutaric acid 3-Indoxyl Sulfate 5-Hydroxyindole-3-acetic Acid Acetamide Acetic Acid Acetoacetic Acid Acetone Acetyl-L-carnitine Alpha-Glucose Alpha-ketoisocaproic acid Benzoic Acid Betaine Beta-Lactose Citric Acid Creatine Creatinine D(-)Fructose D-(+)-Glyceric Acid D(+)-Xylose Dimethylamine DL-B-Aminoisobutyric Acid Current Compound List • L-Isoleucine • L-Lactic Acid • L-Lysine • L-Methionine • L-phenylalanine • L-Serine • L-Threonine • L-Valine • Malonic Acid • Methylamine • Mono-methylmalonate • N,N-dimethylglycine • N-Butyric Acid • Pimelic Acid • Propionic Acid • Pyruvic Acid • Salicylic acid • Sarcosine • Succinic Acid • Sucrose • Taurine • trans-4-hydroxy-L-Proline • Trimethylamine • Trimethylamine-N-Oxide • Urea • DL-Carnitine • DL-Citrulline • DL-Malic Acid • Ethanol • Formic Acid • Fumaric Acid • Gamma-Amino-N-Butyric Acid • Gamma-Hydroxybutyric Acid • Gentisic Acid • Glutaric acid • Glycerol • Glycine • Glycolic Acid • Hippuric acid • Homovanillic acid • Hypoxanthine • Imidazole • Inositol • isovaleric acid • L(-) Fucose • L-alanine • L-asparagine • L-aspartic acid • L-Histidine • L-homocitrulline
Fumarate Reductase The TCA Cycle Acetate Acetyl-CoA Glycerol Pyruvate Oxaloacetate Citrate Isocitrate L-Malate -Ketoglutarate Fumarate 2 1 Succinate dehydrogenase Succinate Succinyl-CoA
Metabolic Responses Acetate Glycerol Pyruvate Acetate Glycerol Pyruvate Succinate Succinate
Three-Pronged Process Data Mining Exp. Data Computer Backfilling Collection Simulation
Three Types of Simulation Meso Scale 1.0 - 10 nm Interaction data Kon, Koff, Kd 10 ns - 10 ms Mesodynamics Continuum Model 10 - 100 nm Concentrations Diffusion rates 10 ms - 1000 s Fluid dynamics Atomic Scale 0.1 - 1.0 nm Coordinate data Dynamic data 0.1 - 10 ns Molecular dynamics
Meso & Continuum Dynamics • Meso-scale dynamics also requires solving MD equations (stochastic DE’s) • Continuum dynamics require solving fluid dynamics and flux equations (more differential equations) • 3 Different methods to simulate at 3 different scales • Isn’t there a better way?
Yes! Cellular Automata • Computer modelling method that uses lattices and discrete state “rules” to model time dependent processes – a way to animate things • No differential equations to solve, easy to calculate, more phenomenological • Simple unit behavior -> complex group behavior • Used to model fluid flow, percolation, reaction + diffusion, traffic flow, pheromone tracking, predator-prey models, ecology, social nets • Scales from 10-12 to 10+12
Cellular Automata Can be extended to 3D lattice
CA Methods in Games SimCity 2000 The SIMS
Cell-Sim • CA or Agent-based simulation system • Designed to permit easy set-up (4-step set-up Wizard) • Allows for general dynamic, stochastic modelling of almost all cellular processes (enzyme kinetics, diffusion, metabolism, operon activity) • Allows real time monitoring (graphing) and animation of the system
Cell-Sim • Four types of molecules: • Proteins • Small Molecules • DNA Molecules • Membrane Molecules • Two types of rules: • Molecule interaction rules - protein-protein, protein-small molecule, protein-DNA interactions. • Membrane interaction rules - protein-membrane, small molecule-membrane interactions.
Simple Enzyme-Substrate Reaction # molecules (P) E + S E + P time
More Trp Repressor Bolus Trp addition No trp repressor # molecules (P) No Trp time
Repressilator Nature, 403: 335-338 (2000)