410 likes | 518 Views
Using genomic-based information for the modelling of bacterial environments and lifestyle . Shiri Freilich Eytan Ruppin , Roded Sharan School of Computer Sciences Tel Aviv University May 2009. Species evolve to adapt to their environment . . Phenotype. Genotype.
E N D
Using genomic-based information for the modelling of bacterial environments and lifestyle ShiriFreilich EytanRuppin, RodedSharan School of Computer Sciences Tel Aviv University May 2009
Species evolve to adapt to their environment.
Phenotype Genotype Environment/lifestyle Can we use the genotype to predict the lifestyle of a species?
From genomic information to phenotypic (metabolic) information Gene Enzyme Genomic information Metabolic information
Genomic era: systematic construction of hundreds of metabolic networks Genomic information Metabolic information Hundreds of fully sequenced bacterial species
From metabolic information to environmental information Internal metabolite External metabolite Metabolic information Environmental information Predicted natural metabolic environments
New approaches allow reconstruction of species’ metabolic-environments Based on the network topology, identifying the set of compounds that are exogenously acquired External metabolite Internal metabolite From Borenstein et al, PNAS 2008
Constructing predicted environments across hundreds of species Metabolic information Environmental information Predicted metabolic environments
So what do we have and what is it good for? Species • Can we characterize the • lifestyle of a species based on • Genomic attributes? • How does the structure of the • metabolic network reflect • adaptation to species’ lifestyle? • Can we characterize ecological • strategies based on genomic • attributes? • Can we characterize ecological • communities based on genomic attributes? • Why should we do it? Metabolic networks Genomes ? Environments
First question Species • Can we characterize the • lifestyle of the species based on • Genomic attributes? • Can we predict, based on genomic • knowledge, whether a species • is a specialist or generalist? • Can we estimate the range of • environments it can inhabit? Metabolic networks Genomes ? Environments
Predicted metabolic environments And to be more specific – we should count in how many environments a species lives External/input metabolites Internal/essential metabolites
Genomic-base predicted diversity corresponds with ecological knowledge Available systematic estimates/information for environmental variability Genomic- based predicted environments NCBI annotations Fraction of reg. genes √ √ √ Specific examples : Pseudomonas aeruginosa x √ x Multiple High Desulfotalea psychrophila {Low} Specialized Beyond specific examples: strong correlation between metabolic-environment variability and established measures of environment variability
We now have an environmental model Environmental viability matrix Env 1 Env 2 Env N Spc 1 Spc 2 Information on species Spc N Viable Information on environment Not viable
Second question Species • How does the structure of the • metabolic network reflects • adaptation to species’ lifestyle? • Studying essentiality • of reactions across • hundreds of bacterial-species • across many simulated • growth-environments. Metabolic networks Genomes ? Environments
Essentiality of enzymes across species and environments Environment I: Environment II: Environment III: Predicted environments α β α β γ γ γ δ δ δ Enzymes √ √ √ ε ε ε ζ ζ ζ x x √ Predicted environments η η η x √ √ Essential biomass product √ √ √ External metabolite x x √ Intermediate product x √ √ Essential reaction Conditional-dependent reaction Accuracy: 0.86 (E. coli ) and 0.85 (B. subtilis) Backed-up reaction
Taking an enzymes’ point of view pathogens Non pathogenic-bacteria High-throughput identification of essential reactions across species species-specific (or group-specific) essentiality looking for drug targets against reaction with a wide phylogenetic coverage. This approach can be applied for highlighting essentiality in groups of medical, ecological or agricultural interest, e.g., human pathogens versus human commensals. Essential reaction Enzymes Conditional-dependent reaction Backed-up reaction
One example: MCH: Methenyltetrahydrofolate cyclohydrolase FTL: Formyltetrahydrofolate synthetase Potential drug target Initiation of protein synthesis fMet-tRNA Most commensals Few pathogens Most commensals Most pathogens THF 5,10-MethenylTHF 10-Formyl-THF MCH FTL Purine synthesis
Taking a species point of view: estimating robustness of metabolic network The fraction of reaction across species Mean ~0.75 E.coli: 0.78 (0.83 backed-up genes) M. genitalium: 0.35 (0.2-0.45 backed-up genes) α β γ Fraction δ ε ζ η 1/7 Essential reaction 2/7Conditional-dependent reaction Description of network-robustness Environmental diversity 4/7 Backed-up reaction
genetic robustness • What is it? The ability of a biological system to continue functioning following mutations • How robust are biological systems? Under laboratory conditions most genes are dispensable; dispensability depends on the experimental setting. • How can we explain robustness in evolutionary terms? The origin of robustness is under debate: • Direct selection in favor of resistance to mutations • By product of the selection for other traits (e.g., increasing steady-state metabolic fluxes) • Genetic robustness reflects environmental robustness
Network robustness: environmental-dependent and independent components The fraction of reaction across species How can we explain the environmentally-independent component? Correlation: 0.1 Fraction environmentally-dependent component component is strongly associated with environmental diversity (rho=0.8) and responsible for the robustness of no more than 20% of metabolic reactions over all species and environments modeled. Correlation: 0.8 Environmental diversity
environmentally-independent robustness is associated with the metabolic capacities How can we explain the environmentally-independent component? The environmentally-independent component is associated (correlation=~0.6) with the metabolic capacities of a species -- higher robustness is observed in fast-growers or in organisms with an extensive production of secondary-metabolites. Observed growth rate (log) Prediction for growth rate (log), based on network robustness
Second question Species • How does the structure of the • metabolic network reflect • adaptation to species’ lifestyle? • The design of metabolic networks • represents a species-specific • adaptation to both its needs and its • environment. Metabolic networks Genomes ? Environments
Third question Species • The structure of the • metabolic network reflects • adaptation to species’ lifestyle. • Can we characterize complex • ecological attributes based • on genomic attributes? • Can we predict the level of • competition a species encounters • in its natural environments and • its rate of growth? Metabolic networks Genomes ? Environments
Once again – our environmental model Environmental viability matrix Env 1 Env 2 Env N Spc 1 Spc 2 Information on species Spc N Viable Information on environment Not viable
Applying the environmental-model for the characterization of ecological attributes – competition Environmental Viability Matrix Env 1 Env 2 Env 3 Env 4 Spc 1 Spc 2 Spc 3 Spc 4 Co-Habitation vector Max-CHS Viable 3 {1,3,2} Spc 1 Mean level of population Not viable Population of environments are in agreement with ecological knowledge Spc 2 {3} 3 Spc 3 {3,2} 3 Spc 4 {1} 1 Environments populated by bacteria of an annotated lifestyle
Delineating ecological strategies for rate of growth: Ecological diversity with intense co-inhabitation, associated with a typical fast rate of growth. Maximal co-habitation A specialized niche with little co-inhabitation, associated with a typical slow rate of growth Environments diversity
Third question Species • Can we characterize ecological • attributes based on genomic • attributes? • The patterns observed suggests • a universal principle where • metabolic flexibility is associated • with a need to grow fast, possibly • in the face of competition. • Beyond specific examples, • the interplay between the environmental • diversity – and maximal co habitation • allows training a predictor for growth • rate (ROC score of 0.75 ) Metabolic networks Genomes ? Environments
Fourth question Species • Can we characterize ecological • Communities based on genomic • attributes? • Characterization of pair-wise • relationship between bacterial • species to identify competition • and cooperation Metabolic networks Genomes ? Environments
Species do not live in a vacuum
Why should we model communities? • The composition of bacterial communities is a major factor in human health. • Variations in the identity and abundance of species affect its metabolic potential and hence have important medical implications. • Computational approaches can now be applied for the modeling of bacterial interactions. • The ultimate goal is to be able to manipulate bacterial communities to our advantage.
First step: modeling pair-wise interactions Environmental viability matrix New type of data: Pairwise interactions data Env 1 Env 2 Env N Spc1 Spc2 Spc N Spc 1 Spc 2 Spc 1 Spc 2 Spc N Spc N Interaction (competition/ cooperation) Viable Not viable No interaction
Lack of systematic knowledge of pair-wise interactions Modeling pairwise interactions Modelling lifestyle Spc1 Spc2 Spc N Env 1 Env 2 Env N Spc 1 3 Spc 1 Spc 2 Spc 2 1 Spc N Spc N 2 Systematic knowledge for species-specific lifestyle Systematic knowledge for pairwise interactions Metagenomic data? Mean level of population Environments populated by bacteria of an annotated lifestyle
Identifying “our” species in environmental samples Species represented by 16s rRNA Environmental samples Spc 1 Spc 3 Gut BLAST Marine Spc 2 Soil
Constructing experimentally-driven pairwise interactions data Environmentally-driven Database of interactions Environmental samples Gut Marine PM3 Spc 1 Spc 1 Spc 2 Spc 3 Spc 4 Spc 5 Spc 6 Spc 2 Spc 1 Spc 3 Spc 2 Spc 4 Spc 3 Spc 5 Spc 4 Spc 6 Spc 5 Spc 6 -134 species (including 47 species in the gut and 81 marine species) -~1200 interactions (limited to interactions within the gut and between marine species)
Pubmed is a large and comprehensive data source Müller & Mancuso, Plos ONE, 2008 Co-occurrence analysis is a technique often applied in text mining, comparative genomics, and promoter analysis. Co-occurrence between genes and proteins was shown to reveal functional interactions.
Constructing literature-driven pairwise interactions data Co-occurrence based Database of interactions Papers in Pubmed PM1 PM2 PM3 Spc 1 Spc 1 Spc 2 Spc 3 Spc 4 Spc 5 Spc 6 Spc 2 Spc 1 Spc 3 Spc 2 Spc 4 Spc 3 Spc 5 Spc 4 Spc 6 Spc 5 Spc 6 -All species are covered by the database (~400 species) -~6000 interactions (cut-offs vary in dependence with the statistical approach taken)
Co-occurrence based interactions data are in agreement with ecological properties Spc1 Spc2 Spc 3 2 Spc 1 Number of co-associated partners Spc 2 1 0 Spec. Obli. Aqua. Hoas. Mult. Soil Spc 3 Strong correlation with systematic annotations of ecological diversity Pairwise interaction data Significant enrichment in experimentally-based interactions
Pairwise interactions of gut bacteria Typical gut bacteria Potential gut bacteria Potential human-associated bacteria Other
Thanks Anat Kreimer Uri Gophna Roded Sharan Eytan Ruppin Nir Yosef Isaac Meilijson Elhanan Borenstein Moshe Mevarech