720 likes | 857 Views
Model-based investigation of bacterial metabolism using gene essentiality data. PhD defense – Maxime Durot PhD prepared in the Computational Systems Biology Group at Genoscope under the supervision of Vincent Schachter & Jean Weissenbach. Motivation & goals of the thesis. Metabolism.
E N D
Model-based investigation of bacterial metabolism using gene essentiality data. PhD defense – Maxime Durot PhD prepared in the Computational Systems Biology Group at Genoscope under the supervision of Vincent Schachter & Jean Weissenbach
Motivation & goals of the thesis Maxime Durot – PhD defense – October 12, 2009
Metabolism Maxime Durot – PhD defense – October 12, 2009 [Picture: Roche Applied Science : http://www.expasy.org/tools/pathways/]
Information from two scales genome metabolism phenotype molecular scale cellular scale Maxime Durot – PhD defense – October 12, 2009
Mutant phenotyping experiments Wild-type bacterium • Mutant phenotype: • No growth = gene is essential on the tested environment • Growth = gene is dispensable on the tested environment • Experiments are performed genome-wide for a growing number of organisms (Gerdes et al, Curr Opin Biotechnol 2006) Wild-type growth phenotype Gene Genome Knock-out mutant Mutant growth phenotype Deleted gene Maxime Durot – PhD defense – October 12, 2009
Confronting the two scales is complex Maxime Durot – PhD defense – October 12, 2009
Modeling metabolism can help (Stelling, Curr Opin Microbiol. 2004) Maxime Durot – PhD defense – October 12, 2009
Key concepts: variable of interest = reactions fluxes The constraint-based modeling framework A(ext) B(ext) P(ext) R2 R1 B R3 R4 R5 R6 R7 A C P R9 R8 D Maxime Durot – PhD defense – October 12, 2009
Key concepts: variable of interest = reactions fluxes The constraint-based modeling framework A(ext) B(ext) P(ext) 0.5 1.5 B 1 0 0 0.5 0.5 A C P 1 1 D Maxime Durot – PhD defense – October 12, 2009
Key concepts: variable of interest = reactions fluxes constraint-based approach: applying constraints to the model reduces the possible flux distributions The constraint-based modeling framework A(ext) B(ext) P(ext) R2 R1 B R3 R4 R5 R6 R7 A C P R9 R8 D Admissible flux distributions v3 v2 v1 Maxime Durot – PhD defense – October 12, 2009
Key concepts: variable of interest = reactions fluxes constraint-based approach: applying constraints to the model reduces the possible flux distributions Classical constraints: metabolism in steady-state: metabolic concentrations remain constant some reactions are irreversible flux values are bound to a maximal value The constraint-based modeling framework A(ext) B(ext) P(ext) R2 R1 B R3 R4 R5 R6 R7 A C P R9 R8 D Admissible flux distributions Applicable at genome scale Maxime Durot – PhD defense – October 12, 2009
Key concepts: variable of interest = reactions fluxes constraint-based approach: applying constraints to the model reduces the possible flux distributions explore the space of admissible flux distributions Classical constraints: metabolism in steady-state: metabolic concentrations remain constant some reactions are irreversible flux values are bound to a maximal value The constraint-based modeling framework A(ext) B(ext) P(ext) R2 R1 B R3 R4 R5 R6 R7 A C P R9 R8 D Admissible flux distributions Applicable at genome scale Maxime Durot – PhD defense – October 12, 2009
Models and gene essentiality datasets • Constraint-based models can predict growth phenotypes for genetic and environmental perturbations (Price et al, Nat Rev Microbiol 2004)(Durot et al, FEMS Microbiol Rev 2009) • Gene essentiality datasets have been used to provide rough assessments of metabolic models (Covert et al, Nature 2004)(Joyce et al, J Bacteriol 2006) • Compute predictive accuracy for gene essentiality prediction • List of inconsistencies, used as a starting point for curation • Can gene essentiality datasets be used more systematically for metabolic model assessment & refinement ? Maxime Durot – PhD defense – October 12, 2009
Objectives of the thesis • Develop a framework for the refinement of metabolic models using gene essentiality data Maxime Durot – PhD defense – October 12, 2009
Context: the Metabolic Thesaurus project Acinetobacterbaylyi ADP1 • -proteobacteria, Pseudomonales group • Nutritionally versatile, strictly aerobic • Non-pathogenic • Evidence of xenobiotic degradation capabilities • Experimental context : • Reliable genome annotation (Barbe et al, Nucleic Acics Res 2004) • Comprehensive knock-out mutant collection (de Berardinis et al, Mol Syst Biol 2008) • Phenotyping capability : complete conditional essentiality datasets on several media (de Berardinis et al, Mol Syst Biol 2008) Maxime Durot – PhD defense – October 12, 2009
Objectives of the thesis • Develop a framework for the refinement of metabolic models using gene essentiality data • Application to Acinetobacter baylyi metabolism • reconstruct a global metabolic model from its genome annotation • assess and refine the model using mutant phenotypes • point out poorly understood metabolic events requiring further experimental investigation Maxime Durot – PhD defense – October 12, 2009
Outline • A/ A formal framework for comparing predicted and experimental gene essentialities • B/ Reconstruction and refinement of A. baylyi metabolic model using mutant phenotypes • C/ Automated reasoning with metabolic models and essentiality data Maxime Durot – PhD defense – October 12, 2009
A/ A formal framework for comparing predicted and experimental gene essentialities Maxime Durot – PhD defense – October 12, 2009
(Large-scale) experiments Initial metabolic reconstruction model predictions experimental results model assessment & refinement (Large-scale) experiments 2 model predictions experimental results refinementstep 2 model assessment & refinement Improved metabolic reconstruction (2) Model refinement using experimental data Improved metabolic reconstruction Maxime Durot – PhD defense – October 12, 2009
Formal representation of a metabolic model • Model refinement using large-scale genetics data requires : • Computer generation of variants of models • Understanding the impact of model variations on phenotype predictions • Problem : • Constraint-based models appear to be complex mathematical objects • An appropriate representation of metabolic models is required to perform automated reasoning with essentiality Maxime Durot – PhD defense – October 12, 2009
Formal representation of a metabolic model Genetic background • Boolean gene-reaction associations (GPR) GPR Set of reactions fulfilling the modeling constraints Gene g1 g2 Boolean rules Protein p1 p2 r1: g1 r2: g1 and g2 Complex c1 Reaction r1 r2 Maxime Durot – PhD defense – October 12, 2009
Formal representation of a metabolic model Genetic background • Boolean gene-reaction associations (GPR) • Set of metabolic reactions (NETWORK) GPR Set of reactions fulfilling the modeling constraints Metabolites of the medium Producible metabolites Maxime Durot – PhD defense – October 12, 2009
Formal representation of a metabolic model Genetic background • Boolean gene-reaction associations (GPR) • Set of metabolic reactions (NETWORK) • List of essential biomass precursors (BIOMASS) essential biomass precursors GPR Set of reactions fulfilling the modeling constraints Metabolites of the medium Producible metabolites Maxime Durot – PhD defense – October 12, 2009
Genetic background essential biomass precursors GPR Reactions fulfilling the modeling constraints Metabolites of the medium Producible metabolites Gene deletion Reduction of producible metabolites space GPR Inactivated reactions Predicting mutant phenotypes genetic perturbation Maxime Durot – PhD defense – October 12, 2009
Confronting model predictions with experiments • Comparison of predictions with experiments reveal inconsistencies Maxime Durot – PhD defense – October 12, 2009
Classifying inconsistencies according to likely cause & correction type Type of inconsistency False essential False dispensable GPR decrease impact of gene deletion on reaction set increase impact of gene deletion on reaction set - add an alternate enzyme - gene is a non-essential subunit of a complex - reaction may occur spontaneously - remove an isozyme - form a complex instead of isozyme - gene has an additional essential role NETWORK augment reaction set reduce reaction set - remove or block an alternate pathway - add an alternate pathway BIOMASS augment biomass requirements reduce biomass requirements - remove a biomass precursor - add a biomass precursor Maxime Durot – PhD defense – October 12, 2009
B/ Reconstruction and refinement of A. baylyi metabolic model using mutant phenotypes Maxime Durot – PhD defense – October 12, 2009
A. baylyi model reconstruction • Two step process • Identify all metabolic reactions occurring in the cell • Adapt representation to modeling requirements Maxime Durot – PhD defense – October 12, 2009
1/ Metabolic network reconstruction Maxime Durot – PhD defense – October 12, 2009
2/ Adapt to modeling requirements • Specific developments made for A. baylyi model • Automated expansion of generic pathways • Inference of enzyme complexes by homology to E. coli Maxime Durot – PhD defense – October 12, 2009
Transport Transport Central metabolism Central metabolism 70 73 148 133 Nucleotides Nucleotides Amino acids Amino acids Lipid metabolism Lipid metabolism synthesis synthesis synthesis synthesis 116 139 88 92 141 145 Cofactor synthesis Cofactor synthesis Degradation pathways Degradation pathways 107 108 115 181 Initial model reconstruction • 859 reactions using 697 metabolites, linked with 787 genes • 109 metabolites that are exchangeable with the environment Maxime Durot – PhD defense – October 12, 2009
70 Evidence supporting the enzymatic function of model genes Maxime Durot – PhD defense – October 12, 2009
Dataset 3 Dataset 1 • Growth phenotypes of A. baylyi mutants on 8 defined environments • 7 alternate C sources, 1 alternate N source Frequency Experimental datasets Dataset 2 • Growth phenotypes of wild-type strain on 190 carbon sources • Results: • Growth on 45 carbon sources • No growth on remaining 145 carbon sources • Genome-wide gene essentialities from A. baylyi mutant collection construction • Selection on succinate minimal medium • Gene essentiality results: (de Berardinis et al, Mol Syst Biol 2008) Maxime Durot – PhD defense – October 12, 2009
Iterative refinement of A. baylyi model Initial reconstruction Dataset 1 from: growth phenotypes of wild-type strain on 190 carbon sources • genome annotation • pathway databases • literature 1 strain x 190 media iAbaylyiv1 Maxime Durot – PhD defense – October 12, 2009
iAbaylyiv2 91% 33 / 45 (73%) 140 / 145 (97%) Corrected inconsistencies GPR 0 NETWORK 9 BIOMASS 0 Model refinement using dataset 1 iAbaylyiv1 86% overall prediction accuracy 24 / 45 (53%) correctly predicted carbon sources 140 / 145 (97%) correctly predicted non carbon sources Maxime Durot – PhD defense – October 12, 2009
iAbaylyi v2 Model accuracy • 91% on dataset 1 Iterative refinement of A. baylyi model Initial reconstruction Dataset 1 from: growth phenotypes of wild-type strain on 190 carbon sources • genome annotation • pathway databases • literature 1 strain x 190 media iAbaylyi v1 Model accuracy • 88% on dataset 1 Maxime Durot – PhD defense – October 12, 2009
iAbaylyi v2 Model accuracy • 91% on dataset 1 Iterative refinement of A. baylyi model Initial reconstruction Dataset 1 from: growth phenotypes of wild-type strain on 190 carbon sources • genome annotation • pathway databases • literature 1 strain x 190 media iAbaylyi v1 Dataset 2 Model accuracy • 88% on dataset 1 genome-wide gene essentialities from A. baylyi mutant collection construction 3093 strains x 1 medium Gene Gene Status Status ACIAD0001 ACIAD0001 NA NA ACIAD0002 ACIAD0002 Essential Essential ACIAD0003 ACIAD0003 Dispensable Dispensable ACIAD0004 ACIAD0004 Essential Essential ACIAD0005 ACIAD0005 Dispensable Dispensable ACIAD0006 ACIAD0006 Dispensable Dispensable Maxime Durot – PhD defense – October 12, 2009
iAbaylyiv3 94% 217 / 251 (86%) 495 / 505 (98%) Corrected inconsistencies GPR 26 NETWORK 11 BIOMASS 10 Model refinement using dataset 2 iAbaylyiv2 88% overall prediction accuracy 187 / 251 (75%) correctly predicted essential genes 489 / 516 (95%) correctly predicted dispensable genes Maxime Durot – PhD defense – October 12, 2009
iAbaylyi v2 Model accuracy • 91% on dataset 1 • 88% on dataset 2 iAbaylyi v3 Model accuracy • 91% on dataset 1 • 94% on dataset 2 Iterative refinement of A. baylyi model Initial reconstruction Dataset 1 from: growth phenotypes of wild-type strain on 190 carbon sources • genome annotation • pathway databases • literature 1 strain x 190 media iAbaylyi v1 Dataset 2 Model accuracy • 88% on dataset 1 genome-wide gene essentialities from A. baylyi mutant collection construction 3093 strains x 1 medium Gene Gene Status Status ACIAD0001 ACIAD0001 NA NA ACIAD0002 ACIAD0002 Essential Essential ACIAD0003 ACIAD0003 Dispensable Dispensable ACIAD0004 ACIAD0004 Essential Essential ACIAD0005 ACIAD0005 Dispensable Dispensable ACIAD0006 ACIAD0006 Dispensable Dispensable Maxime Durot – PhD defense – October 12, 2009
iAbaylyi v2 Model accuracy • 91% on dataset 1 • 88% on dataset 2 iAbaylyi v3 Model accuracy • 91% on dataset 1 • 94% on dataset 2 Iterative refinement of A. baylyi model Initial reconstruction Dataset 1 from: growth phenotypes of wild-type strain on 190 carbon sources • genome annotation • pathway databases • literature 1 strain x 190 media iAbaylyi v1 Dataset 2 Model accuracy • 88% on dataset 1 genome-wide gene essentialities from A. baylyi mutant collection construction 3093 strains x 1 medium Gene Gene Status Status ACIAD0001 ACIAD0001 NA NA ACIAD0002 ACIAD0002 Essential Essential ACIAD0003 ACIAD0003 Dispensable Dispensable ACIAD0004 ACIAD0004 Essential Essential ACIAD0005 ACIAD0005 Dispensable Dispensable Dataset 3 ACIAD0006 ACIAD0006 Dispensable Dispensable growth phenotypes of A. baylyi mutant collection on 8 minimal media 2350 strains x 8 media Quantitative growth measure Maxime Durot – PhD defense – October 12, 2009
iAbaylyiv4 94% 18 / 36 (50%) 408 / 416 (98%) Corrected inconsistencies GPR 8 NETWORK 1 BIOMASS 0 Model refinement using dataset 3 iAbaylyiv3 93% overall prediction accuracy correctly predicted gene phenotypeswith ≥ 1 essentiality 16 / 36 (44%) 406 / 419 (97%) correctly predicted gene phenotypeswith no essentiality Maxime Durot – PhD defense – October 12, 2009
iAbaylyi v2 Model accuracy • 91% on dataset 1 • 88% on dataset 2 iAbaylyi v3 iAbaylyi v4 Model accuracy Model accuracy • 91% on dataset 1 • 91% on dataset 1 • 94% on dataset 2 • 94% on dataset 2 • 93% on dataset 3 • 94% on dataset 3 Iterative refinement of A. baylyi model Initial reconstruction Dataset 1 from: growth phenotypes of wild-type strain on 190 carbon sources • genome annotation • pathway databases • literature 1 strain x 190 media iAbaylyi v1 Dataset 2 Model accuracy • 88% on dataset 1 genome-wide gene essentialities from A. baylyi mutant collection construction 3093 strains x 1 medium Gene Gene Status Status ACIAD0001 ACIAD0001 NA NA ACIAD0002 ACIAD0002 Essential Essential ACIAD0003 ACIAD0003 Dispensable Dispensable ACIAD0004 ACIAD0004 Essential Essential ACIAD0005 ACIAD0005 Dispensable Dispensable Dataset 3 ACIAD0006 ACIAD0006 Dispensable Dispensable growth phenotypes of A. baylyi mutant collection on 8 minimal media 2350 strains x 8 media Quantitative growth measure Maxime Durot – PhD defense – October 12, 2009
GPR correction example • ACIAD0661 (hisG) and ACIAD1257 (hisZ) were initially assigned as isozymes of ATP phosphoribosyl transferase reaction. • Observed essentiality of both genes suggests they are both necessary to the activity. • Further examination of the literature confirms that both proteins form an enzymatic complex (Sissler et al, PNAS 1999) PRPP ATP phospho-ribosyltransferase ACIAD0661OR ACIAD1257 phosphoribosyl-ATP protein histidine essential gene or reaction dispensable gene or reaction biomass precursor Maxime Durot – PhD defense – October 12, 2009
GPR correction example PRPP PRPP ATP phospho-ribosyltransferase ACIAD0661OR ACIAD1257 ACIAD0661ANDACIAD1257 phosphoribosyl-ATP phosphoribosyl-ATP protein protein histidine histidine essential gene or reaction dispensable gene or reaction biomass precursor Maxime Durot – PhD defense – October 12, 2009
Network correction example • ACIAD0822-0824 (gatABC) annotated as an aspartyl/glutamyl-tRNA amidotransferase • gatABC are essential : only way to produce asparagine. • ACIAD1920 (glnS) catalyzes direct charging of glutamine on its tRNA • Essentiality of ACIAD1920 suggests that gatABC pathway is not effective for glutamine aspartate glutamate ACIAD3371ORACIAD0272 ACIAD0609 glutamate-tRNA(gln) aspartate-tRNA(asn) glutamine ACIAD0822ANDACIAD0823ANDACIAD0824 ACIAD0822ANDACIAD0823ANDACIAD0824 ACIAD1920 asparagine -tRNA(asn) glutamine -tRNA(gln) protein protein essential gene or reaction dispensable gene or reaction biomass precursor Maxime Durot – PhD defense – October 12, 2009
Network correction example aspartate aspartate glutamate ACIAD3371ORACIAD0272 ACIAD0609 ACIAD0609 glutamate-tRNA(gln) aspartate-tRNA(asn) aspartate-tRNA(asn) glutamine glutamine ACIAD0822ANDACIAD0823ANDACIAD0824 ACIAD0822ANDACIAD0823ANDACIAD0824 ACIAD0822ANDACIAD0823ANDACIAD0824 ACIAD1920 ACIAD1920 asparagine -tRNA(asn) asparagine -tRNA(asn) glutamine -tRNA(gln) glutamine -tRNA(gln) protein protein protein protein essential gene or reaction dispensable gene or reaction biomass precursor Maxime Durot – PhD defense – October 12, 2009
A. baylyi model refinement Maxime Durot – PhD defense – October 12, 2009
Online prediction of mutant phenotypes Maxime Durot – PhD defense – October 12, 2009 (Le Fèvre et al, Bioinformatics 2009)
C/ Automated reasoning with metabolic models and essentiality data Maxime Durot – PhD defense – October 12, 2009
Automated reasoning on gene-reaction associations GPR • Use phenotypes as specifications for gene-reaction associations • Assume NETWORK and BIOMASS parts of the model are correct • For each inconsistency: • search all GPRs compatible with experimental data Maxime Durot – PhD defense – October 12, 2009