1 / 26

Efficient Processing of Top-k Spatial Preference Queries

This paper outlines an innovative approach to efficiently process top-k spatial preference queries, combining spatial and non-spatial scores for optimal query results. The framework maps data objects to a distance-score space, enabling quick and accurate query processing. Current approaches and our novel method are compared, showcasing the advantages and effectiveness of our approach in instances of complex location-based queries. Experimental evaluations and conclusion highlight the significance of this method in handling increasing numbers of specialized web information systems.

eslinger
Download Presentation

Efficient Processing of Top-k Spatial Preference Queries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficient Processing of Top-k Spatial Preference Queries João B. Rocha-Junior, Akrivi Vlachou, Christos Doulkeridis, and Kjetil Nørvåg VLDB’ 2011 - Seattle, USA

  2. Outline VLDB’ 2011 - Seattle, USA • Top-k spatial preference queries • Current approaches • Our approach • Mapping to distance-score space • Query processing • Materialization (index construction) • Experimental evaluation • Conclusion

  3. Motivation VLDB’ 2011 - Seattle, USA • Increasing number of Web information systems specialized in location-based queries • Systems are limited to simple spatial queries • Example: return objects in a given spatial location • Top-k spatial preference query • Ranks data objects based on the score of feature objects in their spatial neighborhood • Combines spatial and non-spatial scores

  4. Top-k spatial preference queries • Given a set of data objects and scored feature objects • Query • Spatial neighborhood • Features of interest (e.g., bars) • Score of a data object • Obtained from feature objects in its spatial neighborhood • Returns • Ranked set of k best data objects hotel café bar y c2(0.4) b1(0.9) c3(0.2) b2(0.6) b3(0.3) c1(0.6) Top-1 Top-1 Top-1 p2 p3 p1 c4(0.8) x VLDB’ 2011 - Seattle, USA

  5. Score function VLDB’ 2011 - Seattle, USA • Aggregation of partial scores • Any monotone function: sum, max, and min • Partial score • Score of a data object for a set of feature objects • Defined by the score of a single feature object • Highest score • Satisfies the spatial constraint • Spatial constraint • Range, nearest neighbor, and influence

  6. Example (agg=sum) Range Nearest neighbor Influence score(p)=1.5 score(p)=1.0 score(p)=0.6 VLDB’ 2011 - Seattle, USA

  7. Current approaches [1] Yiu, M.L., Dai, X., Mamoulis, N., Vaitis, M., : “Top-k spatial preference queries”, ICDE, 2007. [2] Yiu, M.L., Lu, H., Mamoulis, N., Vaitis, M.: “Ranking spatial data by quality preferences”, TKDE, 2011. VLDB’ 2011 - Seattle, USA • Naïve • Compute the score of all objects, select the top-k • Very costly • State-of-the-art [1,2] • Data objects and feature objects are indexed by multi-dimensional indices

  8. Current approaches VLDB’ 2011 - Seattle, USA • Probing algorithms (SP and GP) • Requires computing the score for all objects • Branch and bound algorithms (BB and BB*) • Compute an upper-bound score for the entries in the data objects R-tree • Prune entries whose upper-bound score is smaller than the score of the k-th object found • Feature join algorithm (FJ) • Create combinations of feature sets with high score • Combinations whose score is smaller than the score of the k-th object found are pruned

  9. Motivation behind our idea… • Few feature objects are necessary to compute the score of a data object • Features not dominated by any other feature in terms of both distance and score • Nice properties • Small size in practice • Sufficient to support any neighborhood condition and query parameter y c3(0.2) c2(0.6) c1(0.5) c4(0.4) ? p1 c5(0.8) x hotel café VLDB’2011 - Seattle, USA

  10. Our framework VLDB’ 2011 - Seattle, USA • Mapping to distance-score space • Pairs of objects (p, t) with t Fi to be examined • Identify SKY(p, Fi) • Minimum set of pairs required to compute the score of p according to Fi for any query • Materialize SKY(p, Fi) • Stored in a R-tree, one R-tree Ri per feature set Fi • Efficient query processing and maintenance • Query processing algorithm

  11. Mapping to the distance-score space hotel pair (p1,c) café • Mapping • Pairs (object, feature) • Space [distance X score] (p2,c1) (p1,c1) c1(0.9) c2(0.7) (p1,c2) c4(0.3) (p2,c3) (p2,c2) (p1,c3) p1 (p2,c4) (p1,c4) c3(0.5) • Skyline • Minimize: distance • Maximize: score p2 pair (p2,c) VLDB’ 2011 - Seattle, USA

  12. Theoretical properties VLDB’ 2011 - Seattle, USA • SKY(p, Fi) is sufficient to determine the partial score of p for any spatial preference query • Maintaining SKY(p, Fi) is sufficient to answer any spatial preference query (stored in an R-tree) • SKY(p, Fi) is the minimum set required • The data required to process range queries permits processing nn and influence queries • The proofs of the theorems can be found in the paper

  13. Access to partial scores r=3 • Only node entries that satisfy the spatial constraint are accessed • Items are retrieved in decreasing order of score • Minor modifications to support nn and influence Max-heap: <p3(0.8),p2(0.6)> Max-heap: <e1(0.8) > root: e1 e2 e1: (p3,t4) (p2,t1) (p1,t3) e2: (p3,t4) (p2,t4) (p3,t4) VLDB’ 2011 - Seattle, USA

  14. Query processing VLDB’ 2011 - Seattle, USA • Compute top-k data objects progressively aggregating partial scores retrieved from Ri • Similar to Fagin’s algorithm (NRA) • Algorithm • Each time an object p is retrieved from Ri, any unseen object p’ in Ri has a score(p’) ≤ score(p) • Keep track of lower and upper-bound score of the seen objects • Terminates when the lower-bound of the k-th object is better than the upper-bound of the remaining objects

  15. Example (range, r=4.5) r=4.5 r=4.5 R1 R2 p3(0.8) p1(0.9) = 1.7 + hotel X restaurant hotel X bar 0.8 - 1.7 1.7 p1 p3 - 0.9 0.9 0.8 VLDB’ 2011 - Seattle, USA

  16. Example (range, r=4.5) r=4.5 r=4.5 R1 R2 p2(0.6) p2(0.6) = 1.2 + 1.4 1.5 0.6 1.2 p2 0.6 1.2 VLDB’ 2011 - Seattle, USA

  17. Example (range, r=4.5) r=4.5 r=4.5 Top-1 R1 R2 p1(0.2) p3(0.3) = 0.5 + 0.2 1.1 1.1 0.3 1.1 1.1 VLDB’ 2011 - Seattle, USA

  18. Materialization VLDB’ 2011 - Seattle, USA • Objects are partitioned into regions • The distance among objects in the same region is small • The skyline set of the objects in the same region is similar with high probability • Compute SKY(R, Fi) for the region R • SKY(p, Fi)SKY(R, Fi), ∀pR • Advantage • The feature set is accessed only once to compute the dynamic skyline of all objects in the region

  19. Experimental evaluation [1] Yiu, M.L., Dai, X., Mamoulis, N., Vaitis, M., : “Top-k spatial preference queries”, ICDE, 2007. [2] Yiu, M.L., Lu, H., Mamoulis, N., Vaitis, M.: “Ranking spatial data by quality preferences”, TKDE, 2011. VLDB’ 2011 - Seattle, USA We compare our approach (SFA) against SP, GP, BB, BB*, and FJ algorithms [1,2] All approaches are implemented in Java Measures: response time, I/O, update time, index construction time, and index size

  20. Variables studied VLDB’ 2011 - Seattle, USA • Data distribution • Uniform (UN), Synthetic (CN), Real (RL) • Cardinality (object and features) • 50K, 100K, 200K, 400K, 800K, 1600K • Number of results (k) • 10, 20, 30, 40, 50 • Number of feature sets • 1, 2, 3, 4 5 • Query range (r), for range and influence queries • 10, 40, 160, 640, 2560

  21. Datasets VLDB’ 2011 - Seattle, USA

  22. Number of features a) I/O varying the number of feature sets b) response time varying the number of feature sets VLDB’ 2011 - Seattle, USA

  23. Scalability b) response time varying |O| a) response time varying |Fi| VLDB’ 2011 - Seattle, USA

  24. Real datasets a) range b) influence c) nearest neighbor VLDB’ 2011 - Seattle, USA

  25. Conclusion VLDB’ 2011 - Seattle, USA • Top-k spatial preference queries are a useful tool for novel location-based applications • We propose a new approach for processing top-k spatial preference queries efficiently • We find and materialize SKY(p, Fi) • We prove that SKY(p, Fi) is sufficient to determine the partial score of p for any spatial preference query • The size of SKY(p, Fi) is small in practice • We propose algorithms to process queries using our index • The efficiency of our approach is verified through experiments on synthetic and real datasets

  26. Thanks! VLDB’ 2011 - Seattle, USA More information: João B. Rocha-Junior joao@idi.ntnu.no http://www.idi.ntnu.no/~joao

More Related