CS 502 Vimalkumar Patel April 10 th 2014.

Hierarchical Organization ofModularity in MetabolicNetworksRavasz, E., Somera, A. L., Mongru, D. A., Oltvai, Z. N. & Barabási, A. -L. CS 502 Vimalkumar Patel April 10th 2014.

Fun facts • Chocolate has an anti-bacterial effect on the mouth and protect against tooth decay • The "smell of rain" is caused by a bacteria called actinomycetes. • before we were and after we are gone. • You can't block Mark Zuckerberg on Facebook.

Organization of this talk • Overview • Research, experiments and results • Conclusion • Questions

Overview • Problem definition • Why ? • Terminology

The Problem • We want to give quantitative support to the intuition that • microorganism’s large network of metabolic reaction has • generic structure • Modularity • And this modularity overlaps with known metabolic functions

Motivation • Cellular organisms consist of multiple cellular components. • several cellular components make up functional modules composed and carrying discrete functions • Functional modules are considered fundamental building blocks of cellular organization • Presence of functional modules in highly integrated biochemical networks lacks quantitative support

Terminology • Average path length • shortest path between all pairs of nodes, adding them up, and then dividing by the total number of pairs. • Diameter of a network • longest of all the calculated shortest paths in a network. • linear size of a network

Terminology (Cont.) • Clustering coefficient • Measure of degree to which nodes in a graph tend to cluster together • Clustering coefficient for a node Ci is given by • ei: number of connection between neighbor, • ki: number of neighbor • Modules • module is a discrete entity of several elementary components and performs an identifiable task, separable from the functions of other modules. • ribosomes and flagell

Terminology (Cont.) • Average Linkage Clustering • method of calculating distance between clusters in hierarchical cluster analysis • average distance between objects from the first cluster and objects from the second cluster. • X, Y : clusters • d(x,y): distance between object • Nx& Ny : Number of objects in clusters X & Y ;

Terminology (Cont.) • Connectivity distribution P(k) • P(k): probability that a substrate has k links • Substrates generated by a biochemical reaction is a product and has incoming links • kin : Number of incoming links on a substrate • Substrates that participate as educts in a reaction have outgoing links. • kout : Number of outgoing links on a substrate • P(kin) = kin / Total number of Substrates

Terminology (Cont.) • Random Networks • Erdös–Rényi(ER) model • N nodes, connected by probability p,randomly placed links • Poisson distribution: most nodes have almost the same number of links • Clustering Coefficient C(k) is independent of node’s degree. • Mean path length proportional to log N

Terminology (Cont.) • Scale-Free Networks • Barabási–Albert (BA) model • Characterized by power law distribution: P(k) ~ k-γ • Node with M links is added to the network, which connects to an already existing node I with probability ∏I = kI/∑JkJ • Probability that a node is highly connected is more important, aka hubs • average path length following ℓ ~log logN

Terminology (Cont.) – last one  • Hierarchical Networks • clusters combine in an iterative manner, generating a hierarchical network • integrates a scale-free topology with an inherent modular structure • power-law degree distribution with degree exponent= 1 + ℓn4/ℓn3 = 2.26 • system-size independent average clustering coefficient <C> 0.6 • scaling of the clustering coefficient, which follows C(k) ~ k-1

Research: Lets move forward ! • Uncover fundamental design underlying the function and process in all 43 microorganisms • Robustness of cellular processes is rooted in the dynamic interactions among its constituents • Network of constituents – complex & provide limited insight into the organization • Understanding cellular networks understanding of the dynamical processes responsible for these networks

Preparing the data • 43 different organisms based on WIT database • Integrated pathway-genome database predicts the existence of a given metabolic pathway • Metabolic networks processed to generate substrate participation graph • Incoming links and outgoing links of network, both processed • As 18 of the 43 genomes in the database are not yet fully sequenced • use it for metabolic pathway approximation.

What the networks look like ??? • A portion of the WIT database for E. coli. • Nodes of the graph - Substrates • Links - temporary complexes (black boxes) from which the products emerge as new nodes (substrates). • The enzymes, which provide the catalytic scaffolds for the reactions, are shown by their EC numbers.

Construction of metabolic network matrices • For all given organisms • N substrates, • E enzymes • R intermediate complexes • the full stoichiometric interactions were compiled into an (N+E+R) X (N+E+R) matrix, generated separately for each of the 43 organisms.

Connectivity distributions P(k) for substrates • Figure b. • Connectivity distribution for E.Coli(incoming and outgoing) • Figure d. • The connectivity distribution averaged over all 43 organisms.

Metabolic network & scale free network • P(k) for these organisms follow a power law • Just like scale-free networks • So it means that metabolic networks are similar to scale-free networks, right ???

BUT Doesn’t E.Coli have modular topology ! • E. coli metabolic network is more clustered than random graph of similar size suggesting modularity in graph high clustering coefficient of the graph due to local interactions within metabolic pathways

Rest of the organisms.. • The average clustering coefficient C(N) for 43 organisms. • Archaea ( purple), bacteria (green), eukaryotes (blue). • Dashed line: clustering coefficient on the network size for a module-free scale-free network • Diamonds denote C for a scale-free network with the same parameters (N and #links) as observed in the 43 organisms

Conflict ! • Fundamental conflict between • Predictions of the current models of metabolic organization • Power law degree distribution of all metabolic networks. • Knowledge of existence of modularity in networks. • High, size independent clustering coefficient.

Heuristic model of metabolic organization • “Hierarchical” network • starting hypothetical module • small cluster of four densely linked nodes. • generate three replicas of this hypothetical module • Connect the three external nodes of the replicated clusters to the central node of the old cluster, obtaining a large 16-node module. • Repeat the above step with new 16 node module • Connect peripheral nodes to central node of older module.

Hierarchical model • Combination of • hierarchy and • scale-free topology. • Follows power-law degree distribution • Degree exponent • Hypothetical model: γ = 2.26 • Metabolic network: γ= 2.2 • Clustering coefficient • C = 0.6, comparable with metabolic networks. • Independent of size of network.

Unique feature of Hierarchical model • Hierarchical architecture • Evident by repeated quadrupling of system. • Clustering coefficient scale with inverse degree. • Small nodes embedded in tight small clusters. • Larger hubs connect modules of increasing size. • C(k) ~ k-1

Key issues • From a biological perspective • whether the identified hierarchical architecture reflects the true functional organization of cellular metabolism. • We reach out to E.Coli, our answer to all the problems ! • Metabolic network is extensively studied, both biochemically and genetically.

Working with E.Coli’s metabolic organization • Three step reduction process • Step 1: • replacing non branching pathways with equivalent links, reducing complexity without altering network topology. • Step 2: • Calculate topological overlap matrix OT(i, j) • Step 3: • Application of average-linkage hierarchical clustering algorithm.

Topological overlap matrix • A topological overlap of 1 between substrates iand j implies that they are connected to the same substrates • 0 value indicates that iand j do not share links to common substrates. • larger overlap between two substrates within the E. coli metabolic network  more likely they belong to the same functional class • Topological overlap matrix encodes comprehensive functional relatedness of the substrates.

Proof of concept: Testing the approach • Small test network chosen • 3 modules, Color coded in adjoining network (A) • Initial application of above steps easily identifies the distinct modules in resulting topological overlap matrix (B) • Rows and columns of the matrix were reordered • The color code denotes the degree of topological overlap between the nodes. • Tree reflects the modules of upper model

Application to E.Coli Overlap matrix

Results • Based on small molecule biochemistry, color-coded the branches according to the predominant biochemical class of the substrates it produces • Strong correlation between • Global topological organization. • & Biochemical classification of metabolites.

Shortcoming in results • Focused study of pyrimidine metabolism in E.Coli • Divided these pathways into four well studied and well limited areas in E.Coli metabolism • Observed - predicted module boundaries do not always overlap with intuitive “biochemistry-based” boundaries. • Synthesis of uridine5-monophosphate from L-glutamine expected to fall within a single module based on a biochemical reactions. • synthesis of uridine5-diphosphate from UMP, leap across predicted module boundaries.

Conclusion • system-level structure of cellular metabolism best approximated by a hierarchical network with embedded modularity. • Metabolism network inherent self-similar property • Highly integrated small modules, group into few large modules, which further group into larger modules • Non-biological networks are also candidate for hierarchical modularity. • Ex: internet  combine scale-free topology with community (modularity) structure

Future work/open questions • How can we apply the modularity of metabolic networks for medical application ? • Think about diabetes. • Manipulate a well known module in human cell, mainly pancreatic/liver/kidney, to produce required insulin • Can the sub parts of these networks be applied to recognize existence of other networks / in other organisms ? • Think about flagellum and pilli • Figure out evolutionary path based on metabolic network hierarchy

Questions ?

References • Network biology: understanding the cell's functional organization • H. Jeong, B. Tombor, R. Albert, Z. N. Oltvai, A.-L. Baraba´si, Nature 407, 651 (2000). • A. Wagner, D. A. Fell, Proc. R. Soc. London Ser. B 268, 1803 (2001).

CS 502 Vimalkumar Patel April 10 th 2014.

CS 502 Vimalkumar Patel April 10 th 2014.

Presentation Transcript

April 10 th , 2013

April 10 th , 2013

28 th April 2014

April 10 th 2012

APRIL 10 TH 2010

17 th April 2014

18 th April 2014

4 th April 2014

April 9 th , 2014

24 th April 2014

April 9 th , 2014

April 14 th , 2014

April 10 , 2014

April 17 th , 2014

Tuesday, April 10 th

April 10, 2014

11 th April 2014

CS 108 Computing Fundamentals April 10, 2014

10 th April 1968

April 24 th 2014

Tuesday, April 10 th

APRIL 24 TH , 2014