CIS 4930/6930 – Recent Advances in Bioinformatics Spring 2014

CIS 4930/6930 – Recent Advances in BioinformaticsSpring 2014 Network models Tamer Kahveci

Graphs • Useful for describing networks. • G = (V, E) with • V = set of nodes • E = set of edges • Topological models • Directed/Undirected • Weighted/Unweighted • Deterministic/Probabilistic (G = (V, E, P)) • Concepts • Degree (indegree/outdegree), path

Topological properties • Degree distribution, P(k) of G=(V, E) • Deg(k) = number of nodes in G with degree = k. • P(k) = Deg(k)/|V| = Probability that a random node in G has degree = k. 2 3 2 H.Pylori Todor et al. TCBB. 10:4. 2013 1

Topological properties • Neighbors of node v, N(v) = set of nodes adjacent to v. • Clustering coefficient of node v, Cv shows the connectivity of N(v). • Slightly different denominator for directed vs undirected graph 2/6 • C(k) = Average clustering coefficients for all nodes with k edges. • Networks clustering coefficient = average clustering coefficients of all nodes in G = (∑ Cv) / |V| # edges among N(v) • Cv = Max # edges possible among N(v)

Centrality of a node • Centrality of a node v in graph G = (V, E) indicates relative importance of v in G with respect to the rest of the nodes in G. Lets denote it with f(v | G) or simply f(v). • Many centrality measures exists • Degree centrality • How popular am I? • fDeg(v) = Deg(v) • Closeness centrality • Betweenness centrality

Closeness Centrality 1 • How close am I to everyone else? • Given G = (V, E) • Dist(u,v) = shortest path length from u to v in G • fClose(u) = ∑v in GDist(u, v) • Alternative (for disconnected networks) • fClose(u) = ∑v in V-{u} 1/ Dist(u, v) • 1/inf = 0 • How do I find shortest path? • Floyd-Warshall algorithm • Johnson’s algorithm 1 2 3

Betweenness Centrality • How many pairs of nodes use me on the cheapest route to communicate? • gst = number of shortest path between s & and t. • gst(v) = number of shortest path between s & and t that contains v. • fBetween(v) = (∑s,t gst(v)/ gst) / (number of s,t pairs in V- {v}).

Floyd-Warshall: shortest path k+1 V’ = {1, 2, …, k} j i Given G = (V, E, w) Distance(i, j, 0) = w(i, j) Distance(i, j, k+1) = min{Distance(i, j, k), Distance(i, k+1, k) + Distance(k+1, j, k)} for k = 1 to n do // use node k on path for i = 1 to n do // origin i for j = 1 to n do // destination j if (d[i,k] + d[k,j]) < d[i,j]) { d[i,j] = d[i,k] + d[k,j] // shorter path length visit[i,j] = k // new path goes through k }

Key network models • Erdos-Renyi • Small world • Scale free

Erdos-Renyi • Totally uniformly random distribution of edges • Construction • Given two parameters (n = # of nodes, p = probability of an edge existence) • For all pairs of node (u,v) • Create an edge (u,v) with probability p.

Small World (Watts-Strogatz) • Everyone tends to be close to each other. • As the number of nodes (N) in the network grows, the distance between two random nodes grows with the logarithm of N. • Construction • Given three parameters: • N = # of nodes. • K = average degree • p = rewiring probability • Construct a ring lattice • Connect each ith node to nodes {i-1, i-2, …, i-k/2} and {i+1, i+2, …, i+k/2} with an edge • For each node u • For each edge (u, v) • Randomly pick a node v’ = V-{u} • Replace (u, v) with (u, v’) with probability p …

Scale-Free • A lot of poor work for a few super rich • Probability that a node has degree k drops exponentially with k. • P(k) ~ k-ᵞ • Construction (preferential attachment – or rich gets richer) • Given two parameters (n = # of nodes, k = average degree) • Build a small network (e.g. two nodes and one edge) • Repeat • Insert a new node v • Insert k edges from v to existing nodes. Existing node u gets an edge with probability pu = Deg(u)/ ∑i Deg(i) • Until we have n nodes

Hierarchical • Similar to fractals • Scale-free networks with high clustering. • Construction • Create an initial network (seed) with t peripheral nodes • Create t copies of this network and connect each of them to the central node. Fractal

Probabilistic a a a a a G = (V, E, P) P: E -> (0, 1] 0.3 0.6 b b b b b c c c c c 0.12 0.42 0.18 (1-0.6) x (1-0.3) = 0.28 0.28 0.28 + 0.12 + 0.42 + 0.18 = 1

CIS 4930/6930 – Recent Advances in Bioinformatics Spring 2014