120 likes | 361 Views
??. ??. Sampling from Large Graphs poster# 305. Jurij (Jure) Leskovec Christos Faloutsos Carnegie Mellon University. Problems and recommendations. Q: How to sample from a large graph? A: FF, RN Q: Which properties to preserve? A: (at least) the 13 ones we list
E N D
?? ?? Leskovec & Faloutsos
Sampling from Large Graphsposter# 305 Jurij (Jure) Leskovec Christos Faloutsos Carnegie Mellon University Leskovec & Faloutsos
Problems and recommendations • Q: How to sample from a large graph? • A: FF, RN • Q: Which properties to preserve? • A: (at least) the 13 ones we list • Q: How to measure success/similarity? • A: K-S, towards ‘back-in-time’ version Leskovec & Faloutsos
Criteria STATIC TEMPORAL • Densification power law • shrinking diameter • normalized size of largest c.c. • first eigenvalue • in-degree; out-degree distribution • distr. of WCC; SCC • hop-plot; hop-plot for WCC • distr. of first left singular vector values • scree plot • distr. of clustering coefficient Leskovec & Faloutsos
Targets • scale-down (= fewer nodes; same diameter, same degree etc) • back-in-time (match an earlier, real, smaller version of the graph) Leskovec & Faloutsos
Sampling Methods • RN random nodes • RPN pageRank random nodes • RDN random nodes, degree-biased • RE random edges • RNE • HYB (Hybrid) • RNN • RJ random jump • RW random walk • FF Forest fire Leskovec & Faloutsos
4 Datasets • Arxiv (author-paper) • Citation (HEP-TH, HEP-PH) • A.S. • epinions.com 26K - 500K edges Leskovec & Faloutsos
Diameter vs N; CC vsdegree Leskovec & Faloutsos
degree distribution; avg CC vs N Leskovec & Faloutsos
diameter DPL Leskovec & Faloutsos
D-statistic vs sample size better scale-down back-in-time Leskovec & Faloutsos
Conclusions • random nodes + a little exploration -> FF • (RN, RJ are close) • 15% sample seems enough • back-in-time concept Leskovec & Faloutsos