Fast Shortest Path Distance Estimation in Large Networks

Fast Shortest Path Distance Estimation in Large Networks Michalis Potamias Francesco BonchiCarlos Castillo Aristides Gionis

Context-aware Search …use shortest-path distance in wikipedia links-graph! Shortest Paths in Large Networks @ CIKM 2009

Social Search • John searches Mary • Ranking: • Mary A • Mary B • Mary C …use shortest-path distance in friendship graph! Shortest Paths in Large Networks @ CIKM 2009

Problem and Solutions • DB: Graph G = (V,E) • Query: Nodes s and t in V • Goal: Compute fast shortest path d(s,t) • Exact Solution • BFS - Dijkstra • Bidirectional- Dijkstra with A* (aka ALT methods) • [Ikeda, 1994] [Pohl, 1971] [Goldberg and Harrelson, SODA 2005] • Heuristic Solution • Random Landmarks • [Kleinberg et al, FOCS 2004] [Vieira et al, CIKM 2007] • Better Landmarks! Shortest Paths in Large Networks @ CIKM 2009

The Landmarks’ Method • Offline • Precompute distance of all nodes to a small set of nodes (landmarks) • Each node is associated with a vector with its SP-distance from each landmark (embedding) • Query-time • d(s,t) = ? • Combine the embeddings of s and t to get an estimate of the query Shortest Paths in Large Networks @ CIKM 2009

Contribution • Proved that covering the network w. landmarks is NP-hard. • Devised heuristics for good landmarks. • Experiments with 5 large real-world networks and more than 30 heuristics. Comparison with state of the art. • Application to Social Search. Shortest Paths in Large Networks @ CIKM 2009

Algorithmic Framework • Triangle Inequality • Observation: the case of equality Shortest Paths in Large Networks @ CIKM 2009

The Landmarks’ Method • Selection: Select klandmarks • Offline: Run kBFS/Dijkstra and store the embeddings of each node: Φ(s)= <dG(u1, s), dG(u2, s), … , dG(uk, s)> = <s1, s2, …, sd> • Query-time: dG(s,t) = ? • Fetch Φ(s)and Φ(t) • Compute mini{si + ti} (i.e. inf of UB) ... in time O(k) Shortest Paths in Large Networks @ CIKM 2009

Example query: d(s,t) Shortest Paths in Large Networks @ CIKM 2009

Coverage Using Upper Bounds • A landmark ucovers a pair (s, t), if ulies on a shortest path from s to t • Problem Definition : find a set of k landmarks that cover as many pairs (s,t) in V x V • NP-hard • k = 1 : node with the highest betweenness centrality • k > 1 : greedy set-cover (too expensive) Shortest Paths in Large Networks @ CIKM 2009

Basic Heuristics • Random (baseline) • Choose central nodes! • Degree • Closeness centrality • Closeness of u is the average distance of u to any vertex in G • Caveat: The selected landmarks may cover the same pairs: we need to make sure that landmarks cover different pairs!! Shortest Paths in Large Networks @ CIKM 2009

Constrained Heuristics • Spread the landmarks in the graph! • Rank all nodes according to Degree or Centrality • Iteratively choose the highest ranking nodes. Remove h-neighbors of each selected node from candidate set • Denote as • Degree/h • Closeness/h • Best results for h = 1 Shortest Paths in Large Networks @ CIKM 2009

Partitioning-based Heuristics • Use partitioning to spread nodes! • Utilize any partitioning scheme and… • Degree/P • Pick the node with the highest degree in each partition • Closeness/P • Pick the node with the highest closeness in each partition • Border/P • Pick the nodes close to the border in each partition. Maximize the border-value that is given from the following formula: Shortest Paths in Large Networks @ CIKM 2009

Border/P d1(u) = 3 d2(u) = 3 d3(u) = 2 b(u) = d1(u)*(d2(u) + d3(u))= 3(3 + 2) Shortest Paths in Large Networks @ CIKM 2009

Versus Random - error Shortest Paths in Large Networks @ CIKM 2009

Versus Random - triangulation Shortest Paths in Large Networks @ CIKM 2009

Versus ALT - efficiency Shortest Paths in Large Networks @ CIKM 2009

Social Search Task Shortest Paths in Large Networks @ CIKM 2009

Conclusion • Heuristic landmarks yield remarkable tradeoffs for SP-distance estimation in huge graphs • Hard to find the optimal landmarks • Border/P and Centrality heuristics outperform Random even by a factor of 250. • For a 10% error, thousand times faster than state of the art exact algorithms (ALT) • Novel search paradigms need distance as primitive • Approximations should be computed in milliseconds • Future Work • Provide fast estimation for more graph primitives! Shortest Paths in Large Networks @ CIKM 2009

Thank you! ? Shortest Paths in Large Networks @ CIKM 2009

Fast Shortest Path Distance Estimation in Large Networks