230 likes | 317 Views
A tremendous modeling feat. Whole-cell, molecular-level computational model of the life-cycle of a procaryote: Mycoplasma genitalium Integrates 28 different sub-modules, each employing its own modeling technique (ODEs, CBM, Stochastic processes, etc.). Some food for thought. Model Research
E N D
A tremendous modeling feat • Whole-cell, molecular-level computational model of the life-cycle of a procaryote: Mycoplasma genitalium • Integrates 28 different sub-modules, each employing its own modeling technique (ODEs, CBM, Stochastic processes, etc.)
Some food for thought Model Research Understanding “biological reality” . Simulation Engineering Understanding“biology as it could be” • What does it mean “to validate” a construct of such complexity? • How could such a tool be used?
The modeling approach • Each of the 28 sub-models was built, parametrized and tested independently • The processes were assumed to be independent on sub-1s time scales • Synchronous simulation was performed at 1s intervals, with 16 cellular variable types passed between the sub-models at each step
#genes associated with the process RNA weight, length, composition, etc. Translation progress Status (active/ stalled), RNA species, codon position Fluxes vector Molecule counts for 3 compartments: cytosol, membrane, extracellular space Boolean functions Poisson process ODE system FBA + MOMA
Model reconstruction • Chromosome reconstruction: • Genes • Transcription units • Promoters • Protein binding sites • Functional annotation (new annotations for 237 out of 525 genes!)
Model reconstruction (cont.) • Structural reconstruction of each gene product: • Protein sequence • Post-transcriptional and post-translational processing and modification of RNA and protein • Signal sequence and localization • DNA footprint of each DNA-binding protein • Chaperones and prosthetic groups required to fold each protein • Subunit composition of each protein and ribonucleoprotein complex • Disulfide bonds of each protein and complex • Curation and complementation of chemical reactions for all processes • The source metabolic model of M. genitalium by Maranas (iPS189 - Suthers et al., PLoS CompBio 2009) had 262 reactions and 274 metabolites • The current model has 645 metabolic reactions (out of 1857 reactions overall) and 722 metabolites! • • •
Parameter reconciliation • Constraints between parameters in multiple modules were identified • Model parameters were then tuned to • Satisfy these constraints • Deviate minimally from empirical observations
Determining initial conditions • Cell proerties right after division were assumed to be statistically identical in consequtive generations • Initial conditions for all cell state variables were iteratively modified, until the post-division variable distributions converged to a steady state
Example: metabolic modeling • Derivation of optimal flux distribution (maximizing biomass) through “modified FBA”: • Network expansion to include the metabolic requirements of the 27 other processes • Internal exchange reactions added to recycle the metabolic byproducts of the other processes • Optimization objective expanded to include the recycling and export of the metabolic byproducts of the other processes • Flux bounds taking into account enzyme copy numbers and catalytic rates • MOMA to then fit the growth rate to the observation (apparently with no distance limitation) • Updating metabolite counts due to transfer reactions between the three compartments
Using the simulation • “Wildtype” simulations: • Initializing the cell state from random initial conditions • “Knocked-out strains”: • Set the half-life of the RNA and protein products of the deleted gene to zero • Deleted all RNA and protein products of the deleted gene
B D Verification on training data • Simulated 128 wild-type cells in a typical Mycoplasma culture environment, to predict • Cellular properties: cell mass; growth rate • Molecular properties: count; localization; activity • Consistency with observations on: • Doubling time • Cellular chemical composition • Major cell mass fractions • Gene expression • Gene essentiality C E
Validation on ‘external’ data sets • The flux through glycolysis is >100-fold more than that through the pentose phosphate and lipid biosynthesis pathways (E) • Predicted metabolite concentrations are within an order of magnitude of those measures in E. coli for all the metabolites in one study and for 70% of them in a more recent study (F) • ‘‘Burst-like’’ protein synthesis due to the local effect of intermittent messenger RNA (mRNA) expression and the global effect of stochastic protein degradation on the availability of free amino acids for translation, which is comparable to recent reports (G) • The mRNA and protein level distributions are consistent with recently reported single-cell measurements (H)
Novel predictions (1) Interactions of DNA-binding proteins • Predicting chromosomal protein occupancy (averages over 128 wildtype simulations): • Overall average • DNA polymerase • RNA polymerase • Replication Initiator DnaA • Temporal dynamics of chromosome exploration • Temporal dynamics of gene expression
Interactions of DNA-binding proteins (cont.) • Spatio-temporal dynamics of DNA and RNA polymerases • Collision and displacement frequencies for pairs of DNA-binding proteins • Correlation between proteing density and frequency of collision across the chromosome
Novel predictions (2): Metabolism as an emergant cell cycle regulator • More cell-to-cell variation in the durations of the replication initiation (64.3%) and replication (38.5%) stages than in cytokinesis (4.4%) or the overall cell cycle (9.4%) • DNA replication proceeds at two distinct rates: • Initially, replication proceeds quickly due to availability of free dNTP in the cell • When the dNTP pool is exhausted the rate limiting factor becomes dNTP synthesis
More predictions • Global energy distribution • Molecular pathologies of single-gene disruptions • Model driven discovery: putative alternative NAD electron transfer pathway for • Who knows what more…
™ ™ Seinfeld: season 4, episode 19 – “The implant” Great results, but… “Are they real?”
M. arthritidis M. capricolum Here are some of the problems • A lot of the model was based on empirical results from other organisms, some of which pretty distant from M. genitalium • The MOMA stage of the metabolic model wasn’t controlled for distance • Due to the complexity of the system, the “External validation” is prone to multiple-hypothesis bias • Validation results don’tnecessarily require whole-cell model • No rigorous robustness analysisof the results to perturbations in model parameters (1900 of them…)
UN BUT nevertheless