Analysis of biological networks Part III Shalev Itzkovitz Uri Alon’s group

Analysis of biological networks Part III Shalev Itzkovitz Uri Alon’s group July 2005

What is a suitable random ensemble? Reminder - Network motifs definition Subgraphs which occur many times in the networks, significantly more than in a suitable random ensemble.

Types of random ensembles Erdos Networks For a given network with N nodes and E edges define : p=E/N2, the probability of an edge existing between any one of the N2 possible directed edges. Erdos & Renyi, 1960

UMAN ensemble a canonical version. All networks have the same numbers of Mutual, Antisymetric and Null edges as the real network, Uniformly distributed. Used in sociology, analytically solvable for subgraph distributions. Antisymetric edge Mutual edge Holland & Leinhardt, american journal of sociology 1970

The configuration model+no multiple edges All networks preserve the same degree sequence of the real network, and multiple edges between two nodes are not allowed The configuration model All networks preserve the same degree sequence of the real network Bollobas, Random graphs 1985, Molloy & Reed, Random structures and algorithms 1995, Chung et.al. PNAS 1999 Maslov & Sneppen, science 2002, Newman Phys. Rev.Lett. 2002, Milo science 2002

Stubs method for generating random networks Problem – multiple edges between nodes Solution – “Go with the winner” algorithm

A C A C B D B D Markov chain Monte-Carlo algorithm Uniform sampling issues : ergodicity, detailed balance, mixing time

Network hub This v-shaped subgraph appears many times would be a network motif when comparing with Erdos networks Random networks which do not preserve the degree sequence are not suitable It is important to filter out subgraphs which appear in high numbers only due to the degree sequence

Will appear many times if Is a motif More stringent ensembles • Preserve the number of all subgraphs of sizes 3,4..,n-1 when counting n-node subgraphs [Milo 2002] • Can be combined with the markov chain algorithm by using simulated annealing • Filters out subgraphs which appear many times only because they contain significant smaller subgraphs

A C A C B D B D Simulated annealing algorithm • Randomize network by making X switches • Make switches with a metropolis probability exp(-E/T) • E is the deviation of any characteristic of the real network you want to preserve (# 3-node subgraphs, clustering sequence etc)

Subgraphs in Erdos networks: exact solution N nodes (8) E edges (8) <k> mean degree (1)

Probability of forming a ffl given specific 3 nodes Subgraphs in Erdos networks: exact solution Possible triplets # nodes # edges Number of ffls does not change with network size!!!

n=3 g=3 Subgraphs on Erdos Networks • The expectancy of a subgraph with n nodes and g edges is analytically solvable. Scales as N(n-g) Select n nodes place g edges

Subgraph scaling families n=3, g=3, G~N3-3=O(1) n=3, g=2, G~O(N3-2)=O(N) n=3, g=4, G~N3-4=O(N-1) n=3, g=6, G~N3-6=O(N-3)

Natural networks often have scale-free outdegree - P(K)~K

Erdos network Scale free network - P(K)~K P(k) 2<<3

Scale-free networks have hubs - P(K)~K =2 =3

Edge probability in the configuration model high edge probability low edge probability

Edge probability in the configuration model 2 1

Subgraphs in networks that preserve degree sequence: approximate solution • Networks with E (~N) edges, and arbitrary indegree (Ri ) and outdegree (Ki ) sequences. K1, R1 K2, R2 K3, R3

Subgraph scaling depends on exact topology Subgraph topology effects Its expected numbers

O( N) Subgraph scaling depends on exact topology – as opposed to Erdos networks Directed networks with power-law out-degree, compact in-degree : - P(K)~K • Erdos • Networks • Scale-free Networks • ( =2.5) • Real networks • Example γ • O( N) • O(1) > • O( 1) • O(1) • O(1) Itzkovitz et. al., PRE 2003

Network motifs – a new extensive variable Milo et. al., science 2002

Global constraints on network structure can create network motifs • Subgraphs which appear many times in a network (more than random) • Might stem from evolutionary constraints of selection for some function, or be a result of other global constraints • Degree sequence is a global constraint with a profound effect on subgraph content • Are there other global constraints which might result in network motifs?

How do geometrical constraints influence the local structure?

Examples of geometrically constrained systems • Transportation networks (highways, trains) • Internet layout • Neuronal networks, brain layout • Abstract spaces (www, social, gene-array data)

The neuron network of C. elegans "The abundance of triangular connections in the nervous system of C. elegans may thus simply be a consequence of the high levels of connectivity that are present within neighbourhoods“ (White et. al.)

The geometric model • N nodes arranged on d-dimensional lattice • Connections made only to neighbors within range R

Erdos networks – every node can connect to every other node Probability of closing triangles - small

Geometric networks – every node can connect only to its neighborhood Probability of closing triangles - large

All subgraphs in geometric networks scale as network size N/Rd ‘sub-networks’, each one an Erdos network of size Rd Erdos sub-network All subgraphs scale as network size

All subgraphs scale as N

The Erdos scaling laws determine the network motifs

All subgraphs with more edges than nodes are motifs Not motifs – scale as N in both random and real networks Motifs – scale as N in geometric networks Constant number in random networks

Feedbacks in neuronal network are much more rare than expected from geometry geometric model = 1 : 3 C elegans neuronal network = 0 : 40

Imposing a field changes subgraph ratios outputs inputs Itzkovitz et. al., PRE 2005

A simple model of geometry + directional bias is not enough abundant in C elegans Mutual edges rare in geometric networks + directional bias

The mapping of network models and resulting network motifs is not a 1-1 mapping Model 1 Model 2 Model 3 Model 4 X Motif set 1 Motif set 2 Motif set 3

conclusions • Biological networks are highly optimized systems aimed at information processing computations. • These networks contain network motifs – subgraphs that appear significantly more than in suitable random networks. • Network motifs may be selected modules of information processing, or results of global network constraints. • The hypothesized functional advantage of each network motif can be tested experimentally. • The network motif approach can be used to reverse-engineer complex biological networks, and unravel their basic computational building blocks.

http://www.weizmann.ac.il/mcb/UriAlon/ Papers mfinder – network motif detection software Collection of complex networks Acknowledgments Uri Alon Ron Milo Nadav Kashtan More information :

Analysis of biological networks Part III Shalev Itzkovitz Uri Alon’s group