1 / 21

Pay-as-you-go Maintenance of Precomputed Nearest Neighbors in Large Graphs

Tom Crecelius Ralf Schenkel MPI Informatik Saarland University. Pay-as-you-go Maintenance of Precomputed Nearest Neighbors in Large Graphs. Motivation: User-based Recommendations. harry potter. harry potter. harry potter. harry potter. harry potter. harry potter.

season
Download Presentation

Pay-as-you-go Maintenance of Precomputed Nearest Neighbors in Large Graphs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tom CreceliusRalf Schenkel MPI Informatik Saarland University Pay-as-you-go Maintenance of Precomputed Nearest Neighbors in Large Graphs

  2. Motivation: User-based Recommendations harrypotter harrypotter harrypotter harrypotter harrypotter harrypotter harrypotter ? Predict rating for book based on ratings of transitive neighbors Ralf Schenkel

  3. Efficient Computation of Neighbors Requirement: Access neighbors of a node in ascending order of distance in a large, disk-resident graph • Run shortest path algorithm at query timebut: many disk accesses to gather edges • Precompute neighbor list & store on diskbut: storage, maintenance under edge updates Usually enough to use prefix of all neighbors(based on distance threshold or rank) Ralf Schenkel

  4. Main Data Structure: Neighbor List Neighbor list (increasing distance) H A D E:1 C B G:2 J:3 F D E G • This talk:Maintaining these lists under • Edge insertion • Reduction of edge weight J Ralf Schenkel

  5. Problem Statement Sequence of directed graphs that differ in exactly one • added edge (focus of this talk; edge weights=1) • edge with reduced weight time=0 time=1 time=2 time=3 • Determine neighbor list for node at time t, based on • Neighbor list for node at time t‘≤ t • Neighbor list of other nodes at times ≤ t • Added edges not yet included in neighbor lists Ralf Schenkel

  6. Impact of a Single New Edge newly reachable nodes maybe change in distance unchanged distance, butmaybe changed rank unaffected nodes H A C B F D E G J Single new edge affects many entries of a neighbor list Ralf Schenkel

  7. Impact on other Nodes‘ Lists node with affected list H A C B F D E G J Single new edge affects many different neighbor lists Ralf Schenkel

  8. Assumptions & Principles of our Approach • Most neighbor lists rarely queried • Update load on lists much higher than query load (transitive effects!) • List often stale again after maintenance before next query arrives • Delay list maintenance as much as possible • Maintain list during query execution Ralf Schenkel

  9. Extended Data Structures D timestamp ts( ):last maintenance of ‘s list D D pointer tp( ):last valid entry as of ts( ) D Neighbor list (increasing distance) H A D E:1 C B on disk tp G:2 J:3 F D E G inmemory J D Set OP( ): Edge updates not yet in list Ralf Schenkel

  10. Algorithmic Framework 2 core functions: • update( ): add edges from OP( ) to its list • merge( , ): add transitive effect of new edges of to ‘s list Reading list entries of : • Initially update( ) if OP( ) not empty • Read next list entry :perform update( ) and merge( , ) if needed D D D A A D D D D A A D A Two implementations:Eager Propagation (EAP) and Lazy Propagation (LAP) Ralf Schenkel

  11. Eager Propagation (EAP) D G 2) merge( , ) D 1) update( ) D D D D E:1 E:1 E:1 E:1 M:2 M:2 G G:1 G:1  F:1 M:2 M:2 J:3 J:3 K:2 J:3 K:4 F:2 L:3 J:3 K:4 K:4 E:4 K:3 L:4 E:5 Example: D G OP( )= More efficient if maximal length of lists can be limited Principle: Merge always combines complete lists Ralf Schenkel

  12. Eager Propagation (EAP) M D G M 3) merge( , ) 2) merge( , ) M 1) update( ) D D D M D E:1 E:1 E:1 J:1 E:1 M:2 M:2 G  G:1 F:1  J:3 J:3 F:2 K:2 F:3 K:4 K:3 L:3 K:4 K:4 L:4 E:4 L:5 E:5 E:6 Only merge with read node needed if OP(node) empty,but timestamp( )<timestamp(node) D Example: M G OP( )= Potential disadvantage: many full list merges Ralf Schenkel

  13. Lazy Propagation (LAP): On-Demand Merge D E E:1 L:1 M:2 P:2 J:3 Q:3 R:4 K:4 Main principle:Do not materialize merges, but perform themincrementally on-demand while entries are read Read next entry with shortest distancefrom list or queue (+ remove duplicates) Queue with currentclosest entries ofmerged lists (E,2) (M,3) +1 +2 M • After query finishes: write • read list prefix • current queue state J:1 Ralf Schenkel

  14. Lazy Propagation (LAP) D E 2) merge( , ) D update( ) E D (G,1) (E,2) E:1 L:1 P:2 M:2 G G:1 +1 F:1 Q:3 J:3 +1 K:2 R:4 K:4 L:3 E:4 Example: D G OP( )= 1) Ralf Schenkel

  15. Summary: EAP vs. LAP EAP: • Update immediately inserts all available edges • Merge eagerly propagates all available information • Next reads current list entry • Overhead: reads complete lists LAP: • Update inserts new edges into queue • Merge lazily propagates only single entries • Next determines next entry and maintains queue • Overhead: queue maintenance Ralf Schenkel

  16. Evaluation Two real datasets • Excerpt of LibraryThing user graph(7,793 nodes, 28,853 edges) • Excerpt of Twitter user graph(2 million nodes) Setup: • Removed 30% of edges & built neighbor lists • Reinsert 1 edges after 100 random top-200 queries Implemented on slow relational storage backend Ralf Schenkel

  17. Runtime for EAP on LibraryThing number of read list entries Runtime around 20s/query too highcaused by full merge of long lists Ralf Schenkel

  18. Runtime of EAP for Librarything, limit entries max=500 max=200 Runtime improves a lot with length limitand does not increase over time Ralf Schenkel

  19. Runtime of LAP for LibraryThing number of read list entries Runtime comparable to EAP, but flexible list length Much fewer entries read than EAP Ralf Schenkel

  20. Runtime of LAP for Twitter number of readlist entries Absolute runtime slower (about 1s) than LibraryThingdespite fewer read entries  backend speed! Ralf Schenkel

  21. Conclusions and Future Work • Pay-as-you-go maintenance of neighbor lists for directed graphs • Eager propagation: merges full lists • Lazy propagation: merges incrementally • Large-scale experiments with two large graphs Future Work: • More efficient storage backend • Removal of edges, increased edge weight • Hybrid methods for update-heavy cases Ralf Schenkel

More Related