230 likes | 387 Views
Tom Crecelius Ralf Schenkel MPI Informatik Saarland University. Pay-as-you-go Maintenance of Precomputed Nearest Neighbors in Large Graphs. Motivation: User-based Recommendations. harry potter. harry potter. harry potter. harry potter. harry potter. harry potter.
E N D
Tom CreceliusRalf Schenkel MPI Informatik Saarland University Pay-as-you-go Maintenance of Precomputed Nearest Neighbors in Large Graphs
Motivation: User-based Recommendations harrypotter harrypotter harrypotter harrypotter harrypotter harrypotter harrypotter ? Predict rating for book based on ratings of transitive neighbors Ralf Schenkel
Efficient Computation of Neighbors Requirement: Access neighbors of a node in ascending order of distance in a large, disk-resident graph • Run shortest path algorithm at query timebut: many disk accesses to gather edges • Precompute neighbor list & store on diskbut: storage, maintenance under edge updates Usually enough to use prefix of all neighbors(based on distance threshold or rank) Ralf Schenkel
Main Data Structure: Neighbor List Neighbor list (increasing distance) H A D E:1 C B G:2 J:3 F D E G • This talk:Maintaining these lists under • Edge insertion • Reduction of edge weight J Ralf Schenkel
Problem Statement Sequence of directed graphs that differ in exactly one • added edge (focus of this talk; edge weights=1) • edge with reduced weight time=0 time=1 time=2 time=3 • Determine neighbor list for node at time t, based on • Neighbor list for node at time t‘≤ t • Neighbor list of other nodes at times ≤ t • Added edges not yet included in neighbor lists Ralf Schenkel
Impact of a Single New Edge newly reachable nodes maybe change in distance unchanged distance, butmaybe changed rank unaffected nodes H A C B F D E G J Single new edge affects many entries of a neighbor list Ralf Schenkel
Impact on other Nodes‘ Lists node with affected list H A C B F D E G J Single new edge affects many different neighbor lists Ralf Schenkel
Assumptions & Principles of our Approach • Most neighbor lists rarely queried • Update load on lists much higher than query load (transitive effects!) • List often stale again after maintenance before next query arrives • Delay list maintenance as much as possible • Maintain list during query execution Ralf Schenkel
Extended Data Structures D timestamp ts( ):last maintenance of ‘s list D D pointer tp( ):last valid entry as of ts( ) D Neighbor list (increasing distance) H A D E:1 C B on disk tp G:2 J:3 F D E G inmemory J D Set OP( ): Edge updates not yet in list Ralf Schenkel
Algorithmic Framework 2 core functions: • update( ): add edges from OP( ) to its list • merge( , ): add transitive effect of new edges of to ‘s list Reading list entries of : • Initially update( ) if OP( ) not empty • Read next list entry :perform update( ) and merge( , ) if needed D D D A A D D D D A A D A Two implementations:Eager Propagation (EAP) and Lazy Propagation (LAP) Ralf Schenkel
Eager Propagation (EAP) D G 2) merge( , ) D 1) update( ) D D D D E:1 E:1 E:1 E:1 M:2 M:2 G G:1 G:1 F:1 M:2 M:2 J:3 J:3 K:2 J:3 K:4 F:2 L:3 J:3 K:4 K:4 E:4 K:3 L:4 E:5 Example: D G OP( )= More efficient if maximal length of lists can be limited Principle: Merge always combines complete lists Ralf Schenkel
Eager Propagation (EAP) M D G M 3) merge( , ) 2) merge( , ) M 1) update( ) D D D M D E:1 E:1 E:1 J:1 E:1 M:2 M:2 G G:1 F:1 J:3 J:3 F:2 K:2 F:3 K:4 K:3 L:3 K:4 K:4 L:4 E:4 L:5 E:5 E:6 Only merge with read node needed if OP(node) empty,but timestamp( )<timestamp(node) D Example: M G OP( )= Potential disadvantage: many full list merges Ralf Schenkel
Lazy Propagation (LAP): On-Demand Merge D E E:1 L:1 M:2 P:2 J:3 Q:3 R:4 K:4 Main principle:Do not materialize merges, but perform themincrementally on-demand while entries are read Read next entry with shortest distancefrom list or queue (+ remove duplicates) Queue with currentclosest entries ofmerged lists (E,2) (M,3) +1 +2 M • After query finishes: write • read list prefix • current queue state J:1 Ralf Schenkel
Lazy Propagation (LAP) D E 2) merge( , ) D update( ) E D (G,1) (E,2) E:1 L:1 P:2 M:2 G G:1 +1 F:1 Q:3 J:3 +1 K:2 R:4 K:4 L:3 E:4 Example: D G OP( )= 1) Ralf Schenkel
Summary: EAP vs. LAP EAP: • Update immediately inserts all available edges • Merge eagerly propagates all available information • Next reads current list entry • Overhead: reads complete lists LAP: • Update inserts new edges into queue • Merge lazily propagates only single entries • Next determines next entry and maintains queue • Overhead: queue maintenance Ralf Schenkel
Evaluation Two real datasets • Excerpt of LibraryThing user graph(7,793 nodes, 28,853 edges) • Excerpt of Twitter user graph(2 million nodes) Setup: • Removed 30% of edges & built neighbor lists • Reinsert 1 edges after 100 random top-200 queries Implemented on slow relational storage backend Ralf Schenkel
Runtime for EAP on LibraryThing number of read list entries Runtime around 20s/query too highcaused by full merge of long lists Ralf Schenkel
Runtime of EAP for Librarything, limit entries max=500 max=200 Runtime improves a lot with length limitand does not increase over time Ralf Schenkel
Runtime of LAP for LibraryThing number of read list entries Runtime comparable to EAP, but flexible list length Much fewer entries read than EAP Ralf Schenkel
Runtime of LAP for Twitter number of readlist entries Absolute runtime slower (about 1s) than LibraryThingdespite fewer read entries backend speed! Ralf Schenkel
Conclusions and Future Work • Pay-as-you-go maintenance of neighbor lists for directed graphs • Eager propagation: merges full lists • Lazy propagation: merges incrementally • Large-scale experiments with two large graphs Future Work: • More efficient storage backend • Removal of edges, increased edge weight • Hybrid methods for update-heavy cases Ralf Schenkel