VoG : Summarizing and Understanding Large Graphs

VoG: Summarizing and Understanding Large Graphs Danai Koutra JillesVreeken U Kang Christos Faloutsos SDM, 24-26 April 2014, Philadelphia, USA

Problem Definition:Graph Summarization Given: a graph a succinct summary with possibly overlappingsubgraphs Find: ≈ important graph structures. Danai Koutra – SDM’14

Why graph summarization? Visualization Guiding attention Danai Koutra – SDM’14

Why graph summarization? Graph Understanding Danai Koutra – SDM’14

Application: Wikipediacontroversy I don’t see anything!  Nodes: wiki editors Edges: co-edited Danai Koutra – SDM’14

Application: Wikipediacontroversy Stars: admins, bots, heavy users Bipartite cores: edit wars vandals Kiev vs. Kyiv Nodes: wiki editors Edges: co-edited Danai Koutra – SDM’14

Roadmap • Main Idea • Proposed Algorithm: VoG • VoG: Step-by-Step • Experiments • Conclusions Danai Koutra (CMU)

Main Idea • Use a graph vocabulary: • Best graph summary •  optimal compression (MDL) Shortest lossless description Danai Koutra – SDM’14

Background Minimum Description Length Principle M • Given a set of models M, • the best model MεM is argminL(M) + L(D|M) M M # bits for the data using M # bits for M Danai Koutra – SDM’14

Example Minimum Description Length Principle L(M) + L(D|M) errors a1 x + a0 VS. { } a10x10+ a9x9+ … + a0 [Example adapted from Akoglu’12] Danai Koutra – SDM’14

Formally:Minimum Graph Description Given: - a graph G with adjacency matrix A - vocabulary Ω Find: model M s.t. min L(G,M) = min L(M) + L(E) Error E Adjacency A Model M Danai Koutra – SDM’14

VoG: Overview ≈? MDL cost argmin ≈ Danai Koutra – SDM’14

VoG: Overview Summary some criterion Danai Koutra – SDM’14

We need candidatestructures… … How can we get them? Danai Koutra – SDM’14

Step 1: Graph Decomposition Coulduse: ANY graph decomposition method Weadapted a node reordering method: SlashBurn Danai Koutra – SDM’14

SlashBurn-based Graph Decomposition • Slash top-k hubs, burn edges • Repeat on the remaining GCC [SlashBurn: U Kang and Christos Faloutsos. ICDM’11] candidate structures Notice that the structures can overlap! GCC Before After Danai Koutra – SDM’14

We got candidatestructures. Now, how can we ‘label’ them? Danai Koutra – SDM’14

Step 2: Graph Labeling ≈? 1 MDL cost 2 argmin ≈ Danai Koutra – SDM’14

Graph Representation DETAILS 1 . . . hub? 1 n “best” node split? 45 n missing edges? “best” node ordering? 80 Danai Koutra – SDM’14

Graph Representation DETAILS Hub: top-deg node Spokes: the rest n=7 hub LN(|st|−1) +logn+ log( ) +L(E+ ) + L(E− ) Star structure Errors # of spokes n−1 spokes IDs missing hub ID extra |st|−1 6 Danai Koutra – SDM’14

DETAILS Graph Representation Max bipartite graph: NP-hard Heuristic: Belief Propagation with heterophily for node classification (blue/red) Bipartite graph structure Errors + logn+ log( ) +L(E+ ) + L(E− ) # of rednodes # of blue nodes n−1 their IDs missing extra |st|−1 Danai Koutra – SDM’14

Graph Representation DETAILS 1 . . . 1 Longest path: NP-hard Heuristic: BFS + local search n 45 n 80 + Chain structure Errors missing extra Danai Koutra – SDM’14

Step 2: Graph Labeling ≈? MDL Cost L(m) + L(e) argmin ≈ Danai Koutra – SDM’14

Step 3: Summary Assembly P L A I N Summary Danai Koutra – SDM’14

Concepts DETAILS = # bits as structure - # bits as noise compression gain Savings Danai Koutra – SDM’14

Step 3: Summary Assembly TOP-k Summary Danai Koutra – SDM’14

Step 3: Summary Assembly Greedy&Forget DETAILS L(M) … structures Danai Koutra – SDM’14

Roadmap • Main Idea • Encoding Schema • Proposed Algorithm: VoG • Experiments • Conclusions Danai Koutra (CMU)

Application: Enron Top-1 NBC Top-3 Stars klay kenneth.lay @enron.com Ski excursion CEO jeff.skilling@enron.com Danai Koutra – SDM’14

Runtime VOG is near-linear on the number of edges of the input graph. Danai Koutra – SDM’14

Roadmap • Main Idea • Encoding Schema • Proposed Algorithm: VoG • Experiments • Conclusions Danai Koutra (CMU)

Conclusions • Formulation: info-theoretic graph summarization approach • Algorithm: VoG is near-linear on the edges • Experiments on real graphs Danai Koutra – SDM’14

Code www.cs.cmu.edu/~dkoutra/SRC/vog.tar Danai Koutra – SDM’14

Thank you! Questions? www.cs.cmu.edu/~dkoutra/pub.htm danai@cs.cmu.edu Danai Koutra

Backup Slides: Future Work • Our vocabulary was: • Extend to more complicated structures, e.g.: “jellyfish” [Tauro’01] Danai Koutra

Backup Slides: Future Work • Combine graph decomposition methods • Community detection • Spectral method • METIS • SUBDUE • Cross-Associations • Eigenspokes • … Danai Koutra – SDM’14

Backup slides:Quantitative Analysis 4292729 bits as noise Real graphs have structure (we save bits by encoding structures). Danai Koutra – SDM’14

Backup slides:Quantitative Analysis Main structures: Danai Koutra – SDM’14

VoG : Summarizing and Understanding Large Graphs