290 likes | 536 Views
Estimating PageRank on Graph Streams. Atish Das Sarma (Georgia Tech) Sreenivas Gollapudi, Rina Panigrahy (Microsoft Research). PageRank. PageRank Determine Ranking of nodes in graphs Typically large graphs - WWW, Social Networks Run daily by commercial search engines.
E N D
Estimating PageRank on Graph Streams Atish Das Sarma (Georgia Tech) Sreenivas Gollapudi, Rina Panigrahy (Microsoft Research)
PageRank • PageRank • Determine Ranking of nodes in graphs • Typically large graphs - WWW, Social Networks • Run daily by commercial search engines
PageRank computation a b u c
PageRank Computation a b u c Our Approach: No Matrix-Vector Multiplication!
Our Result Many Random Walk Samples Efficiently. Approximate PageRank u
Other results from Random Walks We can estimate: Mixing Time Conductance Using Streams G u
Streaming Input is a “stream” e1, e2, e3, e4, e5, e6, e7, …. Few Passes Frequency moments, quantiles 010001011 011101011 0100110111 Graphs: Edges, arbitrary order Small RAM working memory
Related Work • Sparsifiers (Benczur-Karger 96, Spielman-Teng 01, Spielman-Srivastava 08) • Given an undirected graph, produces a sparse one • approximately preserves x’Lx • Can be used to compute sparse cuts • Streaming version of BK96 (Ahn, Guha 09) • Sparse cuts in 1 pass and O(n) space. • Accelarated Page Rank (McSherry 08) • heuristics ~
Key Idea One walk from u length l efficiently v l Later extend to Many walks u
Single Random Walk - Naive Algo. One Step with every Pass! s Constant Space Passes
Second Naive Algo Single Pass Sample sufficient edges! s If , then sample 2 out-edges from each node. (store order)
Comparison Naive (single walk): l Our Result: u In fact walks! Automatically:
Insight: Merge Short Walks Sample fraction of nodes (centers) w w s passes - length walks w a b w w w Merge and extend short walks! Two problems: End up at node second time End up at non-sampled node w
Stuck Nodes w Sample an edge from stuck. w s w Again. w w w And again... Slow? If new nodes, good in passes! w
Stuck nodes Stuck on same Nodes? w w s Sample s edges from each s s w w Must include to set previous seen centers w w s w w s s s progress OR new node! w s
Summary • Perform short walks from sampled centers • Concatenate walks until stuck • Sample edges from stuck • Make local progress until new node • Local progress = s • New node : center with prob • Amortized progress, every pass w w s s s w w w w s w w s s w s
Summary w Total number of passes : Total Space : w s s s w w w w s w w s s w s
Summary w Set Number of passes = Space = w s s s w w w w s w w s s w s
Many Walks Naive Space Bound: w We show: w s s s w w w w Observation: Many short walks not used in Single RW. s w w s s w s
Many Random Walks • : probability node ’s short walk used in single RW. • If known : save lot of space! • Perform K random walks • Total number of short walks required is about • Don’t know . But can estimate.
Estimating • Run K = (log n) walks of length • Gives a crude estimate of • Sufficient to double K • Continue doubling K • Gives K walks in space • Passes l u
Distributions samples Space Passes Distribution: u
Mixing Time, Conductance • Undirected graphs: Compare Distribution with Steady State. • Estimating difference: samples. [Batu et. al.’ 01] • approximate mixing time. • Directed, till distribution “stabilizes”: samples. • Conductance: • Recall space for walks:
Results recap • - Mixing Time for Undirected Graphs : • Quadratic Approximation to Conductance • PageRank to accuracy
Open Questions? • Improve passes for random walks. In particular, sub-linear space and constant passes. • Graph Cuts and Graph Sparsification for directed graphs • Better (streaming) algorithms for computing eigenvectors
Summary • Perform short walks from sampled centers • Concatenate walks until stuck • Sample edges from stuck • Make local progress until new node • Local progress = s • New node = nodes gives center • Amortized, every pass -
Summary • Perform short walks from sampled centers • Concatenate walks until stuck • Sample edges from stuck • Make local progress until new node • Local progress = s • New node = nodes gives center • Amortized, every pass -
Analysis • Total number of passes : • Total Space : • Set • Number of passes = • Space =