630 likes | 768 Views
RTG: A Recursive Realistic Graph Generator using Random Typing. Leman Akoglu and Christos Faloutsos Carnegie Mellon University. Outline. Motivation Problem Definition Related Work A Little History Proposed Model Experimental Results Conclusion. Motivation - 1.
E N D
RTG: A Recursive Realistic Graph Generator using Random Typing Leman Akogluand Christos Faloutsos Carnegie Mellon University
Outline • Motivation • Problem Definition • Related Work • A Little History • Proposed Model • Experimental Results • Conclusion Akoglu, Faloutsos ECML PKDD 2009
Motivation - 1 Complex graphs --WWW, computer, biological, social networks, etc. exhibit many common properties: - power laws - small and shrinking diameter - community structure - … How can we produce synthetic but realistic graphs? http://www.aharef.info/static/htmlgraph/ Akoglu, Faloutsos ECML PKDD 2009
Motivation - 2 Why do we need synthetic graphs? • Simulation • Sampling/Extrapolation • Summarization/Compression • Motivation to understand pattern generating processes Akoglu, Faloutsos ECML PKDD 2009
Problem Definition Discover a graph generator that is: G1. simple: the more intuitive the better! G2. realistic: outputs graphs that obey all “laws” G3. parsimonious: requires few parameters G4. flexible: able to produce the cross-product of un/weighted, un/directed, uni/bipartite graphs G5. fast: generation should take linear time with the size of the output graph Akoglu, Faloutsos ECML PKDD 2009
Outline • Motivation • Problem Definition • Related Work • A Little History • Proposed Model • Experimental Results • Conclusion Akoglu, Faloutsos ECML PKDD 2009
Related Work • Graph Properties What we want to match 2. Graph Generators What has been proposed earlier Akoglu, Faloutsos ECML PKDD 2009
Related Work 1: Graph Properties Akoglu, Faloutsos ECML PKDD 2009
Related Work 2: Graph Generators • Erdős-Rényi (ER)model [Erdős, Rényi `60] • Small-world model [Watts, Strogatz `98] • Preferential Attachment [Barabási, Albert `99] • Winners don’t take all [Pennock et al. `02] • Forest Fire model [Leskovec, Faloutsos `05] • Butterfly model [McGlohon et al. `08] Akoglu, Faloutsos ECML PKDD 2009
Related Work 2: Graph Generators • Model somestatic graph property • Neglectdynamic properties • Cannot produce weightedgraphs. • Erdős-Rényi (ER)model [Erdős, Rényi `60] • Small-world model [Watts, Strogatz `98] • Preferential Attachment [Barabási, Albert `99] • Winners don’t take all [Pennock et al. `02] • Forest Fire model [Leskovec, Faloutsos `05] • Butterfly model [McGlohon et al. `08] Akoglu, Faloutsos ECML PKDD 2009
Related Work 2: Graph Generators • Random dot-product graphs [Kraetzl, Nickel `05] [Young, Scheinerman `07] • Utility-based models [Fabrikant et al. ’02] [Even-Bar et al. `07] [Laoutaris, `08] • Kroneckergraphs [Leskovec et al. `07] [Akoglu et al. `08] Akoglu, Faloutsos ECML PKDD 2009
Related Work 2: Graph Generators • Produces onlyundirected graphs • Cannot produce weightedgraphs. • Requires quadratictime • Random dot-product graphs [Kraetzl, Nickel `05] [Young, Scheinerman `07] • Utility-based models [Fabrikant et al. ’02] [Even-Bar et al. `07] [Laoutaris, `08] • Kroneckergraphs [Leskovec et al. `07] [Akoglu et al. `08] Akoglu, Faloutsos ECML PKDD 2009
Related Work 2: Graph Generators • Produces onlyundirected graphs • Cannot produce weightedgraphs. • Requires quadratictime • Random dot-product graphs [Kraetzl, Nickel `05] [Young, Scheinerman `07] • Utility-based models [Fabrikant et al. ’02] [Even-Bar et al. `07] [Laoutaris, `08] • Kroneckergraphs [Leskovec et al. `07] [Akoglu et al. `08] • Hardto analyze Akoglu, Faloutsos ECML PKDD 2009
Related Work 2: Graph Generators • Produces onlyundirected graphs • Cannot produce weightedgraphs. • Requires quadratictime • Random dot-product graphs [Kraetzl, Nickel `05] [Young, Scheinerman `07] • Utility-based models [Fabrikant et al. ’02] [Even-Bar et al. `07] [Laoutaris, `08] • Kronecker graphs [Leskovec et al. `07] [Akoglu, `08] • Hardto analyze • Multinomial/Lognormal distrib. • Fixed number of nodes Akoglu, Faloutsos ECML PKDD 2009
Outline • Motivation • Problem Definition • Related Work • A Little History • Proposed Model • Experimental Results • Conclusion Akoglu, Faloutsos ECML PKDD 2009
A Little History - 1 [Zipf, 1932] In many natural languages, the rank r and the frequency frof words follow a power law: fr ∝ 1/r count rank Akoglu, Faloutsos ECML PKDD 2009
A Little History - 2 [Mandelbrot, 1953] “Humans optimize avg. information per unit transmission cost.” Akoglu, Faloutsos ECML PKDD 2009
A Little History - 2 [Miller, 1957] “A monkey types randomly on a keyboard: Distribution of words follow a power-law.” . . . . . $ a b λ + Space k equiprobable keys Akoglu, Faloutsos ECML PKDD 2009
A Little History - 2 [Conrad and Mitzenmacher, 2004] “Same relation still holds when keys have unequal probabilities.” . . . + a b λ $ Space Akoglu, Faloutsos ECML PKDD 2009
Outline • Motivation • Problem Definition • Related Work • A Little History • Proposed Model • Experimental Results • Conclusion Akoglu, Faloutsos ECML PKDD 2009
Preliminary Model 1RTG-IE: RTG with Independent Equiprobable keys Space Akoglu, Faloutsos ECML PKDD 2009
Lemma 1. W is super-linear on N (power law):Lemma 2. W is super-linear on E (power law):Lemma 3. In(out)-weight Wn of node n is super-linear on in(out)-degree dn (power law): Preliminary Model 1RTG-IE: RTG with Independent Equiprobable keys , where Please find the proofs in the paper. Akoglu, Faloutsos ECML PKDD 2009
Graph Properties Akoglu, Faloutsos ECML PKDD 2009
Lemma 1. W is super-linear on N (power law):Lemma 2. W is super-linear on E (power law):Lemma 3. In(out)-weight Wn of node n is super-linear on in(out)-degree dn (power law): Preliminary Model 1RTG-IE: RTG with Independent Equiprobable keys L05. Densification PL L11. Weight PL L10. Snapshot PL , where Please find the proofs in the paper. Akoglu, Faloutsos ECML PKDD 2009
Advantages of the Preliminary Model 1 G1 - Intuitive G1 - Easy to implement G2 - Realistic –provably follows several rules G3 - Handful of parameters –k, q, W G5 - Fast –generating random sequence of char.s Akoglu, Faloutsos ECML PKDD 2009
Problems of the Preliminary Model 1 1- Multinomial degree distributions count count rank in-degree Akoglu, Faloutsos ECML PKDD 2009
Problems of the Preliminary Model 1 2- No homophily, no community structure Node i connects to any node j with prob. di*djindependently, rather than connecting to ‘similar’ nodes. Akoglu, Faloutsos ECML PKDD 2009
Preliminary Model 2RTG-IU:RTG with Independent Un-equiprobable keys Solution to Problem 1: [Conrad and Mitzenmacher, 2004] count count . . . . . . . . λ b $ Space count a + a b λ $ + Space count rank in-degree in-degree rank Akoglu, Faloutsos ECML PKDD 2009
Proposed ModelRTG:Random Typing Graphs • Solution to Problem 2: • “2D keyboard” • Generate source- • destination labels • in one shot. • Pick one of the nine • keys randomly. Akoglu, Faloutsos ECML PKDD 2009
Proposed ModelRTG:Random Typing Graphs • Solution to Problem 2: • “2D keyboard” • Repeat recursively. • Terminate each label • when the space key • is typed on each • dimension (dark blue). Akoglu, Faloutsos ECML PKDD 2009
Proposed ModelRTG:Random Typing Graphs Solution to Problem 2: “2D keyboard” How do we choose the keys? Independent model does not yield community structure! pa*pa pa*pb pa*q pb*pa pb*pb pb*q q*q q*pa q*pb Akoglu, Faloutsos ECML PKDD 2009
Proposed ModelRTG:Random Typing Graphs • Solution to Problem 2: • “2D keyboard” • Boost probability • of diagonal keys and • decrease probability • of off-diagonal ones • (0<β<1: imbalance factor) Akoglu, Faloutsos ECML PKDD 2009
Proposed ModelRTG:Random Typing Graphs • Solution to Problem 2: • “2D keyboard” • Boost probability • of diagonal keys and • decrease probability • of off-diagonal ones • (0<β<1: imbalance factor) • Favoring of diagonal keys • creates homophily. Akoglu, Faloutsos ECML PKDD 2009
Proposed Model • Parameters • k: Number of keys • q: Probability of hitting • the space key S • W: Number of multi- • edges in output • graph G • β: imbalance factor Akoglu, Faloutsos ECML PKDD 2009
Proposed Model Up to this point, we discussed directed, weighted and unipartite graphs. Generalizations - Undirected graphs: Ignore edge directions; edge generation is symmetric. - Unweighted graphs: Ignore duplicate edges. - Bipartitegraphs: Different key sets on source and destination; labels are different. Akoglu, Faloutsos ECML PKDD 2009
Outline • Motivation • Problem Definition • Related Work • A Little History • Proposed Model • Experimental Results • Conclusion Akoglu, Faloutsos ECML PKDD 2009
Experimental Results How does RTG model real graphs? • Blognet: a social network of blogs based on citations undirected, unweighted and unipartite N = 27, 726; E = 126, 227; over 80 time ticks. • Com2Cand: the U.S. electoral campaign donations network from organizations to candidates directed, weighted ($amounts) and bipartite N = 23, 191; E = 877, 721; W = 4, 383, 105, 580 over 29 time ticks. Akoglu, Faloutsos ECML PKDD 2009
Experimental Results Blognet RTG count count degree degree L01. Power-law degree distribution [Faloutsos et al. `99, Kleinberg et al. `99, Chakrabarti et al. `04, Newman `04] Akoglu, Faloutsos ECML PKDD 2009
Experimental Results Blognet RTG count count triangles triangles L02.Triangle Power Law (TPL) [Tsourakakis `08] Akoglu, Faloutsos ECML PKDD 2009
Experimental Results 1 Blognet RTG λrank λrank rank rank L03.Eigenvalue Power Law (EPL) [Siganos et al. `03] Akoglu, Faloutsos ECML PKDD 2009
Graph Properties Akoglu, Faloutsos ECML PKDD 2009
Experimental Results 1 Blognet RTG #edges #edges #nodes #nodes L05. Densification Power Law (DPL) [Leskovec et al. `05] Akoglu, Faloutsos ECML PKDD 2009
Experimental Results Blognet RTG diameter diameter time time L06.Small and shrinking diameter [Albert and Barabási `99, Leskovec et al. `05] Akoglu, Faloutsos ECML PKDD 2009
Experimental Results Blognet RTG size size time time L07.Constant size 2nd and 3rd connected components [McGlohon et al. `08] Akoglu, Faloutsos ECML PKDD 2009
Experimental Results 1 Blognet RTG λ1 λ1 #edges #edges L08.Principal Eigenvalue Power Law (λ1PL) [Akoglu et al. `08] Akoglu, Faloutsos ECML PKDD 2009
Experimental Results 1 Blognet RTG entropy entropy resolution resolution L09. Bursty/self-similar edge/weight additions [Gomez and Santonja `98, Gribble et al. `98, Crovella and Bestavros `99, McGlohon et al. `08] Akoglu, Faloutsos ECML PKDD 2009
Graph Properties Akoglu, Faloutsos ECML PKDD 2009
Experimental Results 2 Com2Cand RTG diameter diameter time time size size time time Akoglu, Faloutsos ECML PKDD 2009
Experimental Results 2 Com2Cand RTG λ1 λ1 #edges #edges λrank λrank rank rank Akoglu, Faloutsos ECML PKDD 2009
Experimental Results 2 Com2Cand RTG count count in-degree in-degree entropy entropy resolution resolution Akoglu, Faloutsos ECML PKDD 2009