630 likes | 644 Views
This paper proposes a graph generator that produces synthetic but realistic graphs by obeying common graph properties and power laws. The proposed model is simple, parsimonious, and able to generate different types of graphs efficiently.
E N D
RTG: A Recursive Realistic Graph Generator using Random Typing Leman Akogluand Christos Faloutsos Carnegie Mellon University
Outline • Motivation • Problem Definition • Related Work • A Little History • Proposed Model • Experimental Results • Conclusion Akoglu, Faloutsos ECML PKDD 2009
Motivation - 1 Complex graphs --WWW, computer, biological, social networks, etc. exhibit many common properties: - power laws - small and shrinking diameter - community structure - … How can we produce synthetic but realistic graphs? http://www.aharef.info/static/htmlgraph/ Akoglu, Faloutsos ECML PKDD 2009
Motivation - 2 Why do we need synthetic graphs? • Simulation • Sampling/Extrapolation • Summarization/Compression • Motivation to understand pattern generating processes Akoglu, Faloutsos ECML PKDD 2009
Problem Definition Discover a graph generator that is: G1. simple: the more intuitive the better! G2. realistic: outputs graphs that obey all “laws” G3. parsimonious: requires few parameters G4. flexible: able to produce the cross-product of un/weighted, un/directed, uni/bipartite graphs G5. fast: generation should take linear time with the size of the output graph Akoglu, Faloutsos ECML PKDD 2009
Outline • Motivation • Problem Definition • Related Work • A Little History • Proposed Model • Experimental Results • Conclusion Akoglu, Faloutsos ECML PKDD 2009
Related Work • Graph Properties What we want to match 2. Graph Generators What has been proposed earlier Akoglu, Faloutsos ECML PKDD 2009
Related Work 1: Graph Properties Akoglu, Faloutsos ECML PKDD 2009
Related Work 2: Graph Generators • Erdős-Rényi (ER)model [Erdős, Rényi `60] • Small-world model [Watts, Strogatz `98] • Preferential Attachment [Barabási, Albert `99] • Winners don’t take all [Pennock et al. `02] • Forest Fire model [Leskovec, Faloutsos `05] • Butterfly model [McGlohon et al. `08] Akoglu, Faloutsos ECML PKDD 2009
Related Work 2: Graph Generators • Model somestatic graph property • Neglectdynamic properties • Cannot produce weightedgraphs. • Erdős-Rényi (ER)model [Erdős, Rényi `60] • Small-world model [Watts, Strogatz `98] • Preferential Attachment [Barabási, Albert `99] • Winners don’t take all [Pennock et al. `02] • Forest Fire model [Leskovec, Faloutsos `05] • Butterfly model [McGlohon et al. `08] Akoglu, Faloutsos ECML PKDD 2009
Related Work 2: Graph Generators • Random dot-product graphs [Kraetzl, Nickel `05] [Young, Scheinerman `07] • Utility-based models [Fabrikant et al. ’02] [Even-Bar et al. `07] [Laoutaris, `08] • Kroneckergraphs [Leskovec et al. `07] [Akoglu et al. `08] Akoglu, Faloutsos ECML PKDD 2009
Related Work 2: Graph Generators • Produces onlyundirected graphs • Cannot produce weightedgraphs. • Requires quadratictime • Random dot-product graphs [Kraetzl, Nickel `05] [Young, Scheinerman `07] • Utility-based models [Fabrikant et al. ’02] [Even-Bar et al. `07] [Laoutaris, `08] • Kroneckergraphs [Leskovec et al. `07] [Akoglu et al. `08] Akoglu, Faloutsos ECML PKDD 2009
Related Work 2: Graph Generators • Produces onlyundirected graphs • Cannot produce weightedgraphs. • Requires quadratictime • Random dot-product graphs [Kraetzl, Nickel `05] [Young, Scheinerman `07] • Utility-based models [Fabrikant et al. ’02] [Even-Bar et al. `07] [Laoutaris, `08] • Kroneckergraphs [Leskovec et al. `07] [Akoglu et al. `08] • Hardto analyze Akoglu, Faloutsos ECML PKDD 2009
Related Work 2: Graph Generators • Produces onlyundirected graphs • Cannot produce weightedgraphs. • Requires quadratictime • Random dot-product graphs [Kraetzl, Nickel `05] [Young, Scheinerman `07] • Utility-based models [Fabrikant et al. ’02] [Even-Bar et al. `07] [Laoutaris, `08] • Kronecker graphs [Leskovec et al. `07] [Akoglu, `08] • Hardto analyze • Multinomial/Lognormal distrib. • Fixed number of nodes Akoglu, Faloutsos ECML PKDD 2009
Outline • Motivation • Problem Definition • Related Work • A Little History • Proposed Model • Experimental Results • Conclusion Akoglu, Faloutsos ECML PKDD 2009
A Little History - 1 [Zipf, 1932] In many natural languages, the rank r and the frequency frof words follow a power law: fr ∝ 1/r count rank Akoglu, Faloutsos ECML PKDD 2009
A Little History - 2 [Mandelbrot, 1953] “Humans optimize avg. information per unit transmission cost.” Akoglu, Faloutsos ECML PKDD 2009
A Little History - 2 [Miller, 1957] “A monkey types randomly on a keyboard: Distribution of words follow a power-law.” . . . . . $ a b λ + Space k equiprobable keys Akoglu, Faloutsos ECML PKDD 2009
A Little History - 2 [Conrad and Mitzenmacher, 2004] “Same relation still holds when keys have unequal probabilities.” . . . + a b λ $ Space Akoglu, Faloutsos ECML PKDD 2009
Outline • Motivation • Problem Definition • Related Work • A Little History • Proposed Model • Experimental Results • Conclusion Akoglu, Faloutsos ECML PKDD 2009
Preliminary Model 1RTG-IE: RTG with Independent Equiprobable keys Space Akoglu, Faloutsos ECML PKDD 2009
Lemma 1. W is super-linear on N (power law):Lemma 2. W is super-linear on E (power law):Lemma 3. In(out)-weight Wn of node n is super-linear on in(out)-degree dn (power law): Preliminary Model 1RTG-IE: RTG with Independent Equiprobable keys , where Please find the proofs in the paper. Akoglu, Faloutsos ECML PKDD 2009
Graph Properties Akoglu, Faloutsos ECML PKDD 2009
Lemma 1. W is super-linear on N (power law):Lemma 2. W is super-linear on E (power law):Lemma 3. In(out)-weight Wn of node n is super-linear on in(out)-degree dn (power law): Preliminary Model 1RTG-IE: RTG with Independent Equiprobable keys L05. Densification PL L11. Weight PL L10. Snapshot PL , where Please find the proofs in the paper. Akoglu, Faloutsos ECML PKDD 2009
Advantages of the Preliminary Model 1 G1 - Intuitive G1 - Easy to implement G2 - Realistic –provably follows several rules G3 - Handful of parameters –k, q, W G5 - Fast –generating random sequence of char.s Akoglu, Faloutsos ECML PKDD 2009
Problems of the Preliminary Model 1 1- Multinomial degree distributions count count rank in-degree Akoglu, Faloutsos ECML PKDD 2009
Problems of the Preliminary Model 1 2- No homophily, no community structure Node i connects to any node j with prob. di*djindependently, rather than connecting to ‘similar’ nodes. Akoglu, Faloutsos ECML PKDD 2009
Preliminary Model 2RTG-IU:RTG with Independent Un-equiprobable keys Solution to Problem 1: [Conrad and Mitzenmacher, 2004] count count . . . . . . . . λ b $ Space count a + a b λ $ + Space count rank in-degree in-degree rank Akoglu, Faloutsos ECML PKDD 2009
Proposed ModelRTG:Random Typing Graphs • Solution to Problem 2: • “2D keyboard” • Generate source- • destination labels • in one shot. • Pick one of the nine • keys randomly. Akoglu, Faloutsos ECML PKDD 2009
Proposed ModelRTG:Random Typing Graphs • Solution to Problem 2: • “2D keyboard” • Repeat recursively. • Terminate each label • when the space key • is typed on each • dimension (dark blue). Akoglu, Faloutsos ECML PKDD 2009
Proposed ModelRTG:Random Typing Graphs Solution to Problem 2: “2D keyboard” How do we choose the keys? Independent model does not yield community structure! pa*pa pa*pb pa*q pb*pa pb*pb pb*q q*q q*pa q*pb Akoglu, Faloutsos ECML PKDD 2009
Proposed ModelRTG:Random Typing Graphs • Solution to Problem 2: • “2D keyboard” • Boost probability • of diagonal keys and • decrease probability • of off-diagonal ones • (0<β<1: imbalance factor) Akoglu, Faloutsos ECML PKDD 2009
Proposed ModelRTG:Random Typing Graphs • Solution to Problem 2: • “2D keyboard” • Boost probability • of diagonal keys and • decrease probability • of off-diagonal ones • (0<β<1: imbalance factor) • Favoring of diagonal keys • creates homophily. Akoglu, Faloutsos ECML PKDD 2009
Proposed Model • Parameters • k: Number of keys • q: Probability of hitting • the space key S • W: Number of multi- • edges in output • graph G • β: imbalance factor Akoglu, Faloutsos ECML PKDD 2009
Proposed Model Up to this point, we discussed directed, weighted and unipartite graphs. Generalizations - Undirected graphs: Ignore edge directions; edge generation is symmetric. - Unweighted graphs: Ignore duplicate edges. - Bipartitegraphs: Different key sets on source and destination; labels are different. Akoglu, Faloutsos ECML PKDD 2009
Outline • Motivation • Problem Definition • Related Work • A Little History • Proposed Model • Experimental Results • Conclusion Akoglu, Faloutsos ECML PKDD 2009
Experimental Results How does RTG model real graphs? • Blognet: a social network of blogs based on citations undirected, unweighted and unipartite N = 27, 726; E = 126, 227; over 80 time ticks. • Com2Cand: the U.S. electoral campaign donations network from organizations to candidates directed, weighted ($amounts) and bipartite N = 23, 191; E = 877, 721; W = 4, 383, 105, 580 over 29 time ticks. Akoglu, Faloutsos ECML PKDD 2009
Experimental Results Blognet RTG count count degree degree L01. Power-law degree distribution [Faloutsos et al. `99, Kleinberg et al. `99, Chakrabarti et al. `04, Newman `04] Akoglu, Faloutsos ECML PKDD 2009
Experimental Results Blognet RTG count count triangles triangles L02.Triangle Power Law (TPL) [Tsourakakis `08] Akoglu, Faloutsos ECML PKDD 2009
Experimental Results 1 Blognet RTG λrank λrank rank rank L03.Eigenvalue Power Law (EPL) [Siganos et al. `03] Akoglu, Faloutsos ECML PKDD 2009
Graph Properties Akoglu, Faloutsos ECML PKDD 2009
Experimental Results 1 Blognet RTG #edges #edges #nodes #nodes L05. Densification Power Law (DPL) [Leskovec et al. `05] Akoglu, Faloutsos ECML PKDD 2009
Experimental Results Blognet RTG diameter diameter time time L06.Small and shrinking diameter [Albert and Barabási `99, Leskovec et al. `05] Akoglu, Faloutsos ECML PKDD 2009
Experimental Results Blognet RTG size size time time L07.Constant size 2nd and 3rd connected components [McGlohon et al. `08] Akoglu, Faloutsos ECML PKDD 2009
Experimental Results 1 Blognet RTG λ1 λ1 #edges #edges L08.Principal Eigenvalue Power Law (λ1PL) [Akoglu et al. `08] Akoglu, Faloutsos ECML PKDD 2009
Experimental Results 1 Blognet RTG entropy entropy resolution resolution L09. Bursty/self-similar edge/weight additions [Gomez and Santonja `98, Gribble et al. `98, Crovella and Bestavros `99, McGlohon et al. `08] Akoglu, Faloutsos ECML PKDD 2009
Graph Properties Akoglu, Faloutsos ECML PKDD 2009
Experimental Results 2 Com2Cand RTG diameter diameter time time size size time time Akoglu, Faloutsos ECML PKDD 2009
Experimental Results 2 Com2Cand RTG λ1 λ1 #edges #edges λrank λrank rank rank Akoglu, Faloutsos ECML PKDD 2009
Experimental Results 2 Com2Cand RTG count count in-degree in-degree entropy entropy resolution resolution Akoglu, Faloutsos ECML PKDD 2009