480 likes | 645 Views
Evaluating Reachability Queries over Path Collections*. P. Bouros 1 , S. Skiadopoulos 2 , T. Dalamagas 3 , D. Sacharidis 3 , T. Sellis 1,3 1 National Technical University of Athens 2 University of Peleponnese 3 Institute for Management of Information Systems – R.C. Athena.
E N D
Evaluating Reachability Queries over Path Collections* P. Bouros1, S. Skiadopoulos2, T. Dalamagas3, D. Sacharidis3, T. Sellis1,3 1National Technical University of Athens 2University of Peleponnese 3Institute for Management of Information Systems – R.C. Athena * Long version of SSDBM’09 paper HDMS'09
Introduction (I) • Several applications store and query large collections of data sequences • Recent advances in GIS and geoservices resulted in large volumes of routes (e.g., Points of Interest (POIs) sequences) • Route collections • Points => nodes • Sequences => routes HDMS'09
Introduction (II) • Web sites retain huge collections of routes • ShareMyRoutes.com • TravelByGPS.com • People visiting Athens • Track their sightseeing • Create routes of interesting places • Frequent updates • Users upload new routes HDMS'09
Problem • Route collections • Too large to fit in main memory • Frequently updated, adding new routes • Reachability queries • Q: path from Academy to Zappeion • A: Academy -> University of Athens (change to route p2) -> Parliament-> Zappeion HDMS'09
Problem • Route collections • Too large to fit in main memory • Frequently updated, adding new routes • Reachability queries • Q: path from Academy to Zappeion • A: Academy -> University of Athens (change to route p2) -> Parliament-> Zappeion HDMS'09
Why not a graph-based solution? • Transform route collection P into graph GP • Searching:depth or breadth-first search • Low storage and maintance cost • Slow query evaluation • Enconding transitive closure: • Fast query evaluation • Expensive precomputation, not for frequently updated graphs • 2-hop [CH+02], HOPI [STW05] • DAGs: Geometric-based & partitioning 2-hop [CY+06,08], interval LB [AB+89] • GRIPP [TL07] HDMS'09
Outline • The pfs algorithm • Indexing route collections • Indexing route transitions • Index maintenance • Experimental evaluation • Conclusions and Further work HDMS'09
The pfs algorithm (I) • Path-first search, basic idea: • Examine part of routes at once, not single nodes • Extend depth-first search • Work with routes instead of graph edges • For each route p containing current node v • Visit each node after v (successor) in p • Push to dfs stack set of successors at once HDMS'09
The pfs algorithm (II) • Find a path from node Fto C HDMS'09
The pfs algorithm (II) • Find a path from node Fto C HDMS'09
The pfs algorithm (II) • Find a path from node Fto C HDMS'09
The pfs algorithm (II) • Find a path from node Fto C • Answer: • (F, D, N, B, C) HDMS'09
P-Index • Inverted index on route collections • For each node store routes containing it • Access paths containing current node • Better termination condition => pfsP • Identify a path containing current node before target HDMS'09
P-Index • Inverted index on route collections • For each node store routes containing it • Access paths containing current node • Better termination condition => pfsP • Identify a path containing current node before target HDMS'09
P-Index • Inverted index on route collections • For each node store routes containing it • Access routes containing current node • Better termination condition => pfsP • Identify a route containing current node before target HDMS'09
The pfsP algorithm • Find a path from Fto T HDMS'09
The pfsP algorithm • Find a path from Fto T JOIN HDMS'09
The pfsP algorithm • Find a path from F to T JOIN HDMS'09
The pfsP algorithm • Find a path from F to T • Answer: (F, D, N, B, T) JOIN HDMS'09
H-graph (I) • Graph representation of collection • Nodes • Routes of the collection • Edges (pi, pj, v) • All possible transitions among routes • Edge label v => share node, link • Better termination condition => pfsH • Identify an edge on H-graph HDMS'09
H-graph (I) • Graph representation of collection • Nodes • Routes of the collection • Edges (pi, pj, v) • All possible transitions among routes • Edge label v => share node, link • Better termination condition => pfsH • Identify an edge on H-graph HDMS'09
H-graph (I) • Graph representation of collection • Nodes • Routes of the collection • Edges (pi, pj, v) • All possible transitions among routes • Edge label v => share node, link • Better termination condition => pfsH • Identify an edge on H-graph HDMS'09
H-graph (I) • Graph representation of collection • Nodes • Routes of the collection • Edges (pi, pj, v) • All possible transitions among routes • Edge label v => share node, link • Better termination condition => pfsH • Identify an edge on H-graph HDMS'09
H-graph (I) • Graph representation of collection • Nodes • Routes of the collection • Edges (pi, pj, v) • All possible transitions among routes • Edge label v => share node, link • Better termination condition=> pfsH • Identify an edge on H-graph HDMS'09
H-graph (II) • Find a path from node Fto J HDMS'09
H-graph (II) • Find a path from node Fto J HDMS'09
H-graph (II) • Find a path from node Fto J • Answer: (F, D, J) HDMS'09
H-Index • In practice, H-Index, adj. lists of H-graph HDMS'09
H-Index • In practice, H-Index, adj. lists of H-graph HDMS'09
H-Index • In practice, H-Index, adj. lists of H-graph HDMS'09
H-Index • In practice, H-Index, adj. lists of H-graph HDMS'09
H-Index • In practice, H-Index, adj. lists of H-graph HDMS'09
H-Index • In practice, H-Index, adj. lists of H-graph B,D p1 p2 HDMS'09
The pfsH algorithm • Find a path from F to J, routes[F] = {<p2,2>, <p4,4>, <p5,2>} routes[J] = {<p1,5>} HDMS'09
The pfsH algorithm • Find a path from F to J, routes[F] = {<p2,2>, <p4,4>, <p5,2>} routes[J] = {<p1,5>} HDMS'09
The pfsH algorithm • Find a path from F to J, routes[F] = {<p2,2>, <p4,4>, <p5,2>} routes[J] = {<p1,5>} JOIN HDMS'09
The pfsH algorithm • Find a path from F to J, routes[F] = {<p2,2>, <p4,4>, <p5,2>} routes[J] = {<p1,5>} JOIN HDMS'09
The pfsH algorithm • Find a path from F to J, routes[F] = {<p2,2>, <p4,4>, <p5,2>} routes[J] = {<p1,5>} • Answer: (F, D, J) JOIN HDMS'09
Index maintenance • P-Index, H-Index as inverted files on disk • Updates -> adding new routes • Not consider each new route separately • Batch updates, consider set of new routes • Basic idea: • Build memory resident P-Index, H-Index for new routes • Merge disk-based indices with memory resident ones HDMS'09
Outline • The pfs algorithm • Indexing route collections • Indexing route transitions • Index maintenance • Experimental evaluation • Conclusions and Further work HDMS'09
Setup • Synthetic route collections • |P|, lavg, |V|, zipf, U • Compare • Convert collection to graph, dfs & adjacency lists • pfsP & P-Index • pfsH & P-Index, H-Index • Construction cost, query evaluation, vary one of |P|, lavg, |V|, zipf • Maintenance cost, vary U HDMS'09
Index construction |P| (x 103) lavg = 10, |V| = 100000, zipf = 0.8 |V| (x 103) |P| = 100000, lavg = 10, zipf = 0.8 HDMS'09
Query evaluation (I) |P| (x 103) lavg = 10, |V| = 100000, zipf = 0.8 lavg |P| = 100000, |V| = 100000, zipf = 0.8 HDMS'09
Query evaluation (II) |V| (x 103) |P| = 100000, lavg = 10, zipf = 0.8 zipf |P| = 100000, lavg = 10, |V| = 100000 HDMS'09
Index maintenance U (%) U (%) |P| = 100000, lavg = 10, |V| = 100000, zipf = 0.8 HDMS'09
Conclusions • Reachability queries over frequently updatedroute collections • The path-first search (pfs) algorithm • Indexing route collections: P-Index & pfsP • Indexing route transitions: H-Index & pfsH • Handling frequent updates, adding new routes • Experimental evaluation • P-Index & pfsP, low construction & maintance cost • H-Index, P-Index & pfsH, fast query evaluation HDMS'09
Further work • Ongoing • New index that combines P-Index & H-Index advantages • Low constructing and maintenance cost • Fast query evaluation • Future work • Other types of queries • Considering constraints HDMS'09
Thank you! HDMS'09