Using genomic-based information for the modelling of bacterial environments and lifestyle

Using genomic-based information for the modelling of bacterial environments and lifestyle ShiriFreilich EytanRuppin, RodedSharan School of Computer Sciences Tel Aviv University May 2009

Species evolve to adapt to their environment.

Phenotype Genotype Environment/lifestyle Can we use the genotype to predict the lifestyle of a species?

From genomic information to phenotypic (metabolic) information Gene Enzyme Genomic information Metabolic information

Genomic era: systematic construction of hundreds of metabolic networks Genomic information Metabolic information Hundreds of fully sequenced bacterial species

From metabolic information to environmental information Internal metabolite External metabolite Metabolic information Environmental information Predicted natural metabolic environments

D-GLUCOSE IS AN EXAMPLE OF AN EXTERNALMETABOLITE

New approaches allow reconstruction of species’ metabolic-environments Based on the network topology, identifying the set of compounds that are exogenously acquired External metabolite Internal metabolite From Borenstein et al, PNAS 2008

Constructing predicted environments across hundreds of species Metabolic information Environmental information Predicted metabolic environments

So what do we have and what is it good for? Species • Can we characterize the • lifestyle of a species based on • Genomic attributes? • How does the structure of the • metabolic network reflect • adaptation to species’ lifestyle? • Can we characterize ecological • strategies based on genomic • attributes? • Can we characterize ecological • communities based on genomic attributes? • Why should we do it? Metabolic networks Genomes ? Environments

First question Species • Can we characterize the • lifestyle of the species based on • Genomic attributes? • Can we predict, based on genomic • knowledge, whether a species • is a specialist or generalist? • Can we estimate the range of • environments it can inhabit? Metabolic networks Genomes ? Environments

Predicted metabolic environments And to be more specific – we should count in how many environments a species lives External/input metabolites Internal/essential metabolites

Genomic-base predicted diversity corresponds with ecological knowledge Available systematic estimates/information for environmental variability Genomic- based predicted environments NCBI annotations Fraction of reg. genes √ √ √ Specific examples : Pseudomonas aeruginosa x √ x Multiple High Desulfotalea psychrophila {Low} Specialized Beyond specific examples: strong correlation between metabolic-environment variability and established measures of environment variability

We now have an environmental model Environmental viability matrix Env 1 Env 2 Env N Spc 1 Spc 2 Information on species Spc N Viable Information on environment Not viable

Second question Species • How does the structure of the • metabolic network reflects • adaptation to species’ lifestyle? • Studying essentiality • of reactions across • hundreds of bacterial-species • across many simulated • growth-environments. Metabolic networks Genomes ? Environments

Essentiality of enzymes across species and environments Environment I: Environment II: Environment III: Predicted environments α β α β γ γ γ δ δ δ Enzymes √ √ √ ε ε ε ζ ζ ζ x x √ Predicted environments η η η x √ √ Essential biomass product √ √ √ External metabolite x x √ Intermediate product x √ √ Essential reaction Conditional-dependent reaction Accuracy: 0.86 (E. coli ) and 0.85 (B. subtilis) Backed-up reaction

Taking an enzymes’ point of view pathogens Non pathogenic-bacteria High-throughput identification of essential reactions across species  species-specific (or group-specific) essentiality looking for drug targets against reaction with a wide phylogenetic coverage. This approach can be applied for highlighting essentiality in groups of medical, ecological or agricultural interest, e.g., human pathogens versus human commensals. Essential reaction Enzymes Conditional-dependent reaction Backed-up reaction

One example: MCH: Methenyltetrahydrofolate cyclohydrolase FTL: Formyltetrahydrofolate synthetase Potential drug target Initiation of protein synthesis fMet-tRNA Most commensals Few pathogens Most commensals Most pathogens THF 5,10-MethenylTHF 10-Formyl-THF MCH FTL Purine synthesis

Taking a species point of view: estimating robustness of metabolic network The fraction of reaction across species Mean ~0.75 E.coli: 0.78 (0.83 backed-up genes) M. genitalium: 0.35 (0.2-0.45 backed-up genes) α β γ Fraction δ ε ζ η 1/7 Essential reaction 2/7Conditional-dependent reaction Description of network-robustness Environmental diversity 4/7 Backed-up reaction

genetic robustness • What is it? The ability of a biological system to continue functioning following mutations • How robust are biological systems? Under laboratory conditions most genes are dispensable; dispensability depends on the experimental setting. • How can we explain robustness in evolutionary terms? The origin of robustness is under debate: • Direct selection in favor of resistance to mutations • By product of the selection for other traits (e.g., increasing steady-state metabolic fluxes) • Genetic robustness reflects environmental robustness

Network robustness: environmental-dependent and independent components The fraction of reaction across species How can we explain the environmentally-independent component? Correlation: 0.1 Fraction environmentally-dependent component component is strongly associated with environmental diversity (rho=0.8) and responsible for the robustness of no more than 20% of metabolic reactions over all species and environments modeled. Correlation: 0.8 Environmental diversity

environmentally-independent robustness is associated with the metabolic capacities How can we explain the environmentally-independent component? The environmentally-independent component is associated (correlation=~0.6) with the metabolic capacities of a species -- higher robustness is observed in fast-growers or in organisms with an extensive production of secondary-metabolites. Observed growth rate (log) Prediction for growth rate (log), based on network robustness

Second question Species • How does the structure of the • metabolic network reflect • adaptation to species’ lifestyle? • The design of metabolic networks • represents a species-specific • adaptation to both its needs and its • environment. Metabolic networks Genomes ? Environments

Third question Species • The structure of the • metabolic network reflects • adaptation to species’ lifestyle. • Can we characterize complex • ecological attributes based • on genomic attributes? • Can we predict the level of • competition a species encounters • in its natural environments and • its rate of growth? Metabolic networks Genomes ? Environments

Once again – our environmental model Environmental viability matrix Env 1 Env 2 Env N Spc 1 Spc 2 Information on species Spc N Viable Information on environment Not viable

Applying the environmental-model for the characterization of ecological attributes – competition Environmental Viability Matrix Env 1 Env 2 Env 3 Env 4 Spc 1 Spc 2 Spc 3 Spc 4 Co-Habitation vector Max-CHS Viable 3 {1,3,2} Spc 1 Mean level of population Not viable Population of environments are in agreement with ecological knowledge Spc 2 {3} 3 Spc 3 {3,2} 3 Spc 4 {1} 1 Environments populated by bacteria of an annotated lifestyle

Delineating ecological strategies for rate of growth: Ecological diversity with intense co-inhabitation, associated with a typical fast rate of growth. Maximal co-habitation A specialized niche with little co-inhabitation, associated with a typical slow rate of growth Environments diversity

Third question Species • Can we characterize ecological • attributes based on genomic • attributes? • The patterns observed suggests • a universal principle where • metabolic flexibility is associated • with a need to grow fast, possibly • in the face of competition. • Beyond specific examples, • the interplay between the environmental • diversity – and maximal co habitation • allows training a predictor for growth • rate (ROC score of 0.75 ) Metabolic networks Genomes ? Environments

Fourth question Species • Can we characterize ecological • Communities based on genomic • attributes? • Characterization of pair-wise • relationship between bacterial • species to identify competition • and cooperation Metabolic networks Genomes ? Environments

Species do not live in a vacuum

Why should we model communities? • The composition of bacterial communities is a major factor in human health. • Variations in the identity and abundance of species affect its metabolic potential and hence have important medical implications. • Computational approaches can now be applied for the modeling of bacterial interactions. • The ultimate goal is to be able to manipulate bacterial communities to our advantage.

First step: modeling pair-wise interactions Environmental viability matrix New type of data: Pairwise interactions data Env 1 Env 2 Env N Spc1 Spc2 Spc N Spc 1 Spc 2 Spc 1 Spc 2 Spc N Spc N Interaction (competition/ cooperation) Viable Not viable No interaction

Lack of systematic knowledge of pair-wise interactions Modeling pairwise interactions Modelling lifestyle Spc1 Spc2 Spc N Env 1 Env 2 Env N Spc 1 3 Spc 1 Spc 2 Spc 2 1 Spc N Spc N 2 Systematic knowledge for species-specific lifestyle Systematic knowledge for pairwise interactions Metagenomic data? Mean level of population Environments populated by bacteria of an annotated lifestyle

What are metagenomic data?

Identifying “our” species in environmental samples Species represented by 16s rRNA Environmental samples Spc 1 Spc 3 Gut BLAST Marine Spc 2 Soil

Constructing experimentally-driven pairwise interactions data Environmentally-driven Database of interactions Environmental samples Gut Marine PM3 Spc 1 Spc 1 Spc 2 Spc 3 Spc 4 Spc 5 Spc 6 Spc 2 Spc 1 Spc 3 Spc 2 Spc 4 Spc 3 Spc 5 Spc 4 Spc 6 Spc 5 Spc 6 -134 species (including 47 species in the gut and 81 marine species) -~1200 interactions (limited to interactions within the gut and between marine species)

Pubmed is a large and comprehensive data source Müller & Mancuso, Plos ONE, 2008 Co-occurrence analysis is a technique often applied in text mining, comparative genomics, and promoter analysis. Co-occurrence between genes and proteins was shown to reveal functional interactions.

Constructing literature-driven pairwise interactions data Co-occurrence based Database of interactions Papers in Pubmed PM1 PM2 PM3 Spc 1 Spc 1 Spc 2 Spc 3 Spc 4 Spc 5 Spc 6 Spc 2 Spc 1 Spc 3 Spc 2 Spc 4 Spc 3 Spc 5 Spc 4 Spc 6 Spc 5 Spc 6 -All species are covered by the database (~400 species) -~6000 interactions (cut-offs vary in dependence with the statistical approach taken)

Co-occurrence based interactions data are in agreement with ecological properties Spc1 Spc2 Spc 3 2 Spc 1 Number of co-associated partners Spc 2 1 0 Spec. Obli. Aqua. Hoas. Mult. Soil Spc 3 Strong correlation with systematic annotations of ecological diversity Pairwise interaction data Significant enrichment in experimentally-based interactions

Pairwise interactions of gut bacteria Typical gut bacteria Potential gut bacteria Potential human-associated bacteria Other

Thanks Anat Kreimer Uri Gophna Roded Sharan Eytan Ruppin Nir Yosef Isaac Meilijson Elhanan Borenstein Moshe Mevarech

Using genomic-based information for the modelling of bacterial environments and lifestyle

Using genomic-based information for the modelling of bacterial environments and lifestyle

Presentation Transcript

How to access genomic information using Ensembl

Modelling Clinical Information Using UML

Model-based investigation of bacterial metabolism using gene essentiality data.

Potential for using agent based modelling in government

Using Flowchart-based Programming Environments for Simplifying Programming and Software Engineering Processes

Constraint-based modelling of bacterial metabolic networks – where are we in 2011?

Characterizing SNPs Using Genomic Information

Using BLAST for Genomic Sequence Annotation

Model-based investigation of bacterial metabolism using gene essentiality data.

Using genomic information to understand Leishmania biology

How to access genomic information using Ensembl

Using the SHE Method for UML-based Performance Modelling

Data-Based Modelling for Control

e-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins

Managing Economic and Demographic Information for Modelling

Using SuperVISE for process modelling

Activity 3.5—Bacterial environments

Plant based lifestyle

Using BLAST for Genomic Sequence Annotation

Data-Based Modelling for Control

Biological (genomic) information

How to access genomic information using Ensembl