400 likes | 604 Views
Network Analysis. Max Hinne mhinne@sci.ru.nl. Social Networks. Networks & Digital Security. Interdisciplinary Combination formal & ‘soft’ interpretation Security in the sense of a detective. Overview. Primer on graph theory Centrality Who is important? Clustering
E N D
Network Analysis Max Hinne mhinne@sci.ru.nl
Social Networks Network Analysis
Networks & Digital Security • Interdisciplinary • Combination formal & ‘soft’ interpretation • Security in the sense of a detective Network Analysis
Overview • Primer on graph theory • Centrality • Who is important? • Clustering • Who belong together? • Detecting & predicting changes • LIGA project Central theme: global vs. local approaches Network Analysis
Graph primer Network Analysis
Graph primer - basics • V = vertices, N = |V| • A = arcs, M = |A| (x points to y) Network Analysis
Graph primer - concepts • Neighborhood: • Degree: • Path: Similar concepts for undirected graphs G=(V,E) Network Analysis
Graph primer – graph types 1. 2. 3. Models for these graphs by: Erdős-Renyi (1959) Tsvetovat-Carley (2005) Barabási-Albert (1999) Network Analysis
Graph primer – degree distributions Degree distributions: what is the chance a node has degree k? • Erdős-Renyi: number of vertices N, each edge occurs with probability p • Barabási-Albert: start with a small set of vertices and add new ones. Each new vertex is connected to others with a probability based on their degree Poisson Power-law (scale-free) Network Analysis
Graph primer – small world effect • Famous experiment by Milgram (1967) • Everyone on the world is connected to everyone else in at most 6 steps • Social graphs exhibit the ‘small world effect’: the diameter of a social graph scales logarithmically with N Network Analysis
Centrality Network Analysis
Centrality • Importance, control of flow • Ranking of most important (control) to least important (control) Network Analysis
Node centrality measures 1/4 • Degree • Immediate effect Network Analysis
Node centrality measures 2/4 • Closeness • ETA of flow to v cC inverted for visualization Network Analysis
Node centrality measures 3/4 • Eigenvector • Influence or risk Network Analysis
Node centrality measures 4/4 • Betweenness • Volume of flow/traffic Network Analysis
Obtaining cB • Fastest current algorithm by Brandes in O(nm) • Solves all shortest paths in one pass • For each vertex, consider all d=1 nearest neighbors, then d=2 and so on • For each shortest path, store which vertices are on it • Derive cB Network Analysis
Local approach • No known algorithms calculate cB(v) faster than cB(v) for all v! • We only want to rank nodes of interest, not all • Local approach • Find cB for some specific nodes • If we can estimate cB, we can rank relevant nodes Network Analysis
Ego betweenness • Ego-net: and corresponding edges • Calculate cB considering only ego(v) • Let A be the adjacency matrix: Network Analysis
No direct link between cB and cEB Red circles + ego form a n+1 node star Green triangles form an p node complete graph Kp Red circles + ego form a p+1 node star Green triangles + ego form an n node complete graph Kn Network Analysis
Correlation cB and cEB • Very strong positive correlation! Network Analysis
Graph Clustering Network Analysis
Types of clustering • What is a cluster? • Supervised vs. unsupervised • Partitional vs. hierarchical Network Analysis
Clustering quality – modularity Cluster adjacency matrix Cluster adjacency matrix E Network Analysis
Newman & Girvan clustering algorithm • Edges that are the most ‘between’ connect large parts of the graph • Calculate edge betweenness Aij in n x n matrix A • Remove edge with highest score • Recalculate edge betweenness for affected edges • Goto 2 until no edges remain • O(m2n), may be smaller on graphs with strong clustering Network Analysis
Greedy clustering algorithm • Maximize Q to find clustering • Greedy approach: • Creates a bottom-up dendogram • Cut corresponding to maximum Q is optimal clustering • Still a costly process, O(n2) C := V; repeat (i,j) := argmax{∆Q|Ci, Cj ϵ C}; C := C - Cj; Ci := Ci + Cj; until |C| = 1 Network Analysis
Practical applications of social clusters • Find people related to someone • Find out if people belong to the same cluster • This does not require a partitioning of the entire network! Network Analysis
Local modularity C: cluster U: universe B: boundary C = collection nodes v ∈ V with known link structure U(C) = all nodes outside C to which nodes from C point: U(C) = {u ∈ V-C|A(C,u) ≠ ∅} B(C) = all nodes in C with at least one neighbor outside C: B(C) = {b ∈ C|A(b,U) ≠ ∅} Network Analysis
Local cluster algorithm ∆R(C,u) = R(C+u) – R(C) C := Ø; v := v0; repeat C := C+v; v := argmax{R(C+u)|u∈U(C)} until |C| = k or R ≥ d Arcs removed from arcs(B(C),V) Arcs newly added to arcs(B(C),V) Arcs removed from arcs(B(C),C) Arcs newly added to arcs(B(C),C) ∆R(C+v4) = 1/3 – 1/4 = 1/12 Network Analysis
Example 1 on Zachary’s Karate Club (d=0.65) Network Analysis
Example 2 on Zachary’s Karate Club (d=0.65) Network Analysis
Local cluster quality vs. global clusters • For each node v in each global cluster i • Find the local cluster with the same size • Average Network Analysis
Preliminary results on real graphs • Experiment too small for real conclusions, but • edge vertices ruin the fun, • edge betweenness? • Usefulness of local approach depends on the seed node Network Analysis
LIGA Local intelligence in global applications Network Analysis
Web graph • ‘Social’ network of blogs and news sites • Most graph models are static, but the Web is highly dynamic • Stored copy is infeasible, continuous crawling intractable • Change in relevance -> change in link structure Network Analysis
Node roles • Frequently recurring sub graphs: motifs • Nodes share a role iff there is a permutation of nodes and edges that preserves motif structure • On the Web: Feedback with two mutual dyads (2 roles) Uplinked mutual dyad (2 roles) Fully connected triad (1 role) Network Analysis
Dynamic graphs • Changes in relevance cause changes in link structure • Changes in specific roles imply changes in other node roles • Fanbase links to itself and their authorities • Learning relevant links through affiliated sites • etc. • Relevance decays (half-life λ) Network Analysis
LIGA research questions • How to model (Web) node relevance ? • How does acquired or lost relevance change linkage? • How can we predict consequential changes? • How can such prediction models be approximated by local incremental algorithms? • A. m. o. ... Network Analysis
Putting it together • Networks can be analyzed using an array of tools • Network analysis is useful in various disciplines: • Information Retrieval • Security • But also in: • Sociology • (Statistical) physics • Bioinformatics • AI Network Analysis
Most cited literature • Centrality: • Borgatti S. P.: Centrality and Network Flow. Social Networks 27 (2005) 55-71 • Brandes U.: A Faster Algorithm for Betweenness Centrality. Journal of Mathematical Sociology 25(2) (2001) 163-177 • Freeman L. C.: A Set of Measures of Centrality Based on Betweennes. Sociometry 40 (1977) 35-41 • Clustering: • Clauset A.: Finding local community structure in networks. Physics Review E 72 (2005) 026132 • Girvan M., Newman M. E. J.: Community structure in social and biological networks. PNAS 99(12) (2002) 7821-7826 • Newman M. E. J.: Fast algorithm for detecting community structure in networks. Physics Review E 69 (2004) 066133 Network Analysis