420 likes | 742 Views
VoG : Summarizing and Understanding Large Graphs. Danai Koutra Jilles Vreeken U Kang Christos Faloutsos. SDM, 24-26 April 2014, Philadelphia, USA. Problem Definition: Graph Summarization. Given : a graph. a succinct summary with possibly overlapping subgraphs. Find : . ≈
E N D
VoG: Summarizing and Understanding Large Graphs Danai Koutra JillesVreeken U Kang Christos Faloutsos SDM, 24-26 April 2014, Philadelphia, USA
Problem Definition:Graph Summarization Given: a graph a succinct summary with possibly overlappingsubgraphs Find: ≈ important graph structures. Danai Koutra – SDM’14
Why graph summarization? Visualization Guiding attention Danai Koutra – SDM’14
Why graph summarization? Graph Understanding Danai Koutra – SDM’14
Application: Wikipediacontroversy I don’t see anything! Nodes: wiki editors Edges: co-edited Danai Koutra – SDM’14
Application: Wikipediacontroversy Stars: admins, bots, heavy users Bipartite cores: edit wars vandals Kiev vs. Kyiv Nodes: wiki editors Edges: co-edited Danai Koutra – SDM’14
Roadmap • Main Idea • Proposed Algorithm: VoG • VoG: Step-by-Step • Experiments • Conclusions Danai Koutra (CMU)
Main Idea • Use a graph vocabulary: • Best graph summary • optimal compression (MDL) Shortest lossless description Danai Koutra – SDM’14
Background Minimum Description Length Principle M • Given a set of models M, • the best model MεM is argminL(M) + L(D|M) M M # bits for the data using M # bits for M Danai Koutra – SDM’14
Example Minimum Description Length Principle L(M) + L(D|M) errors a1 x + a0 VS. { } a10x10+ a9x9+ … + a0 [Example adapted from Akoglu’12] Danai Koutra – SDM’14
Formally:Minimum Graph Description Given: - a graph G with adjacency matrix A - vocabulary Ω Find: model M s.t. min L(G,M) = min L(M) + L(E) Error E Adjacency A Model M Danai Koutra – SDM’14
Roadmap • Main Idea • Proposed Algorithm: VoG • VoG: Step-by-Step • Experiments • Conclusions Danai Koutra (CMU)
VoG: Overview ≈? MDL cost argmin ≈ Danai Koutra – SDM’14
VoG: Overview Summary some criterion Danai Koutra – SDM’14
Roadmap • Main Idea • Proposed Algorithm: VoG • VoG: Step-by-Step • Experiments • Conclusions Danai Koutra (CMU)
We need candidatestructures… … How can we get them? Danai Koutra – SDM’14
Step 1: Graph Decomposition Coulduse: ANY graph decomposition method Weadapted a node reordering method: SlashBurn Danai Koutra – SDM’14
SlashBurn-based Graph Decomposition • Slash top-k hubs, burn edges • Repeat on the remaining GCC [SlashBurn: U Kang and Christos Faloutsos. ICDM’11] candidate structures Notice that the structures can overlap! GCC Before After Danai Koutra – SDM’14
We got candidatestructures. Now, how can we ‘label’ them? Danai Koutra – SDM’14
Step 2: Graph Labeling ≈? 1 MDL cost 2 argmin ≈ Danai Koutra – SDM’14
Graph Representation DETAILS 1 . . . hub? 1 n “best” node split? 45 n missing edges? “best” node ordering? 80 Danai Koutra – SDM’14
Graph Representation DETAILS Hub: top-deg node Spokes: the rest n=7 hub LN(|st|−1) +logn+ log( ) +L(E+ ) + L(E− ) Star structure Errors # of spokes n−1 spokes IDs missing hub ID extra |st|−1 6 Danai Koutra – SDM’14
DETAILS Graph Representation Max bipartite graph: NP-hard Heuristic: Belief Propagation with heterophily for node classification (blue/red) Bipartite graph structure Errors + logn+ log( ) +L(E+ ) + L(E− ) # of rednodes # of blue nodes n−1 their IDs missing extra |st|−1 Danai Koutra – SDM’14
Graph Representation DETAILS 1 . . . 1 Longest path: NP-hard Heuristic: BFS + local search n 45 n 80 + Chain structure Errors missing extra Danai Koutra – SDM’14
Step 2: Graph Labeling ≈? MDL Cost L(m) + L(e) argmin ≈ Danai Koutra – SDM’14
Step 3: Summary Assembly P L A I N Summary Danai Koutra – SDM’14
Concepts DETAILS = # bits as structure - # bits as noise compression gain Savings Danai Koutra – SDM’14
Step 3: Summary Assembly TOP-k Summary Danai Koutra – SDM’14
Concepts DETAILS Summary Encoding cost L(M)=LN(|M|+1)+log()+Σ(-logP(x(s)|M)+L(s)) # of structures per type # of structures per type |M|+1 # of structures # of structures for each structure its encoding length for each structure its encoding length |Ω|+1 s its type : 1 : 1 : 1 its connectivity 3 Danai Koutra – SDM’14
Step 3: Summary Assembly Greedy&Forget DETAILS L(M) … structures Danai Koutra – SDM’14
Roadmap • Main Idea • Encoding Schema • Proposed Algorithm: VoG • Experiments • Conclusions Danai Koutra (CMU)
Application: Enron Top-1 NBC Top-3 Stars klay kenneth.lay @enron.com Ski excursion CEO jeff.skilling@enron.com Danai Koutra – SDM’14
Runtime VOG is near-linear on the number of edges of the input graph. Danai Koutra – SDM’14
Roadmap • Main Idea • Encoding Schema • Proposed Algorithm: VoG • Experiments • Conclusions Danai Koutra (CMU)
Conclusions • Formulation: info-theoretic graph summarization approach • Algorithm: VoG is near-linear on the edges • Experiments on real graphs Danai Koutra – SDM’14
Code www.cs.cmu.edu/~dkoutra/SRC/vog.tar Danai Koutra – SDM’14
Thank you! Questions? www.cs.cmu.edu/~dkoutra/pub.htm danai@cs.cmu.edu Danai Koutra
Backup Slides: Future Work • Our vocabulary was: • Extend to more complicated structures, e.g.: “jellyfish” [Tauro’01] Danai Koutra
Backup Slides: Future Work • Combine graph decomposition methods • Community detection • Spectral method • METIS • SUBDUE • Cross-Associations • Eigenspokes • … Danai Koutra – SDM’14
Backup slides:Quantitative Analysis 4292729 bits as noise Real graphs have structure (we save bits by encoding structures). Danai Koutra – SDM’14
Backup slides:Quantitative Analysis Main structures: Danai Koutra – SDM’14