Efficient Processing of Top-k Spatial Preference Queries

Efficient Processing of Top-k Spatial Preference Queries João B. Rocha-Junior, Akrivi Vlachou, Christos Doulkeridis, and Kjetil Nørvåg VLDB’ 2011 - Seattle, USA

Outline VLDB’ 2011 - Seattle, USA • Top-k spatial preference queries • Current approaches • Our approach • Mapping to distance-score space • Query processing • Materialization (index construction) • Experimental evaluation • Conclusion

Motivation VLDB’ 2011 - Seattle, USA • Increasing number of Web information systems specialized in location-based queries • Systems are limited to simple spatial queries • Example: return objects in a given spatial location • Top-k spatial preference query • Ranks data objects based on the score of feature objects in their spatial neighborhood • Combines spatial and non-spatial scores

Top-k spatial preference queries • Given a set of data objects and scored feature objects • Query • Spatial neighborhood • Features of interest (e.g., bars) • Score of a data object • Obtained from feature objects in its spatial neighborhood • Returns • Ranked set of k best data objects hotel café bar y c2(0.4) b1(0.9) c3(0.2) b2(0.6) b3(0.3) c1(0.6) Top-1 Top-1 Top-1 p2 p3 p1 c4(0.8) x VLDB’ 2011 - Seattle, USA

Score function VLDB’ 2011 - Seattle, USA • Aggregation of partial scores • Any monotone function: sum, max, and min • Partial score • Score of a data object for a set of feature objects • Defined by the score of a single feature object • Highest score • Satisfies the spatial constraint • Spatial constraint • Range, nearest neighbor, and influence

Example (agg=sum) Range Nearest neighbor Influence score(p)=1.5 score(p)=1.0 score(p)=0.6 VLDB’ 2011 - Seattle, USA

Current approaches [1] Yiu, M.L., Dai, X., Mamoulis, N., Vaitis, M., : “Top-k spatial preference queries”, ICDE, 2007. [2] Yiu, M.L., Lu, H., Mamoulis, N., Vaitis, M.: “Ranking spatial data by quality preferences”, TKDE, 2011. VLDB’ 2011 - Seattle, USA • Naïve • Compute the score of all objects, select the top-k • Very costly • State-of-the-art [1,2] • Data objects and feature objects are indexed by multi-dimensional indices

Current approaches VLDB’ 2011 - Seattle, USA • Probing algorithms (SP and GP) • Requires computing the score for all objects • Branch and bound algorithms (BB and BB*) • Compute an upper-bound score for the entries in the data objects R-tree • Prune entries whose upper-bound score is smaller than the score of the k-th object found • Feature join algorithm (FJ) • Create combinations of feature sets with high score • Combinations whose score is smaller than the score of the k-th object found are pruned

Motivation behind our idea… • Few feature objects are necessary to compute the score of a data object • Features not dominated by any other feature in terms of both distance and score • Nice properties • Small size in practice • Sufficient to support any neighborhood condition and query parameter y c3(0.2) c2(0.6) c1(0.5) c4(0.4) ? p1 c5(0.8) x hotel café VLDB’2011 - Seattle, USA

Our framework VLDB’ 2011 - Seattle, USA • Mapping to distance-score space • Pairs of objects (p, t) with t Fi to be examined • Identify SKY(p, Fi) • Minimum set of pairs required to compute the score of p according to Fi for any query • Materialize SKY(p, Fi) • Stored in a R-tree, one R-tree Ri per feature set Fi • Efficient query processing and maintenance • Query processing algorithm

Mapping to the distance-score space hotel pair (p1,c) café • Mapping • Pairs (object, feature) • Space [distance X score] (p2,c1) (p1,c1) c1(0.9) c2(0.7) (p1,c2) c4(0.3) (p2,c3) (p2,c2) (p1,c3) p1 (p2,c4) (p1,c4) c3(0.5) • Skyline • Minimize: distance • Maximize: score p2 pair (p2,c) VLDB’ 2011 - Seattle, USA

Theoretical properties VLDB’ 2011 - Seattle, USA • SKY(p, Fi) is sufficient to determine the partial score of p for any spatial preference query • Maintaining SKY(p, Fi) is sufficient to answer any spatial preference query (stored in an R-tree) • SKY(p, Fi) is the minimum set required • The data required to process range queries permits processing nn and influence queries • The proofs of the theorems can be found in the paper

Access to partial scores r=3 • Only node entries that satisfy the spatial constraint are accessed • Items are retrieved in decreasing order of score • Minor modifications to support nn and influence Max-heap: <p3(0.8),p2(0.6)> Max-heap: <e1(0.8) > root: e1 e2 e1: (p3,t4) (p2,t1) (p1,t3) e2: (p3,t4) (p2,t4) (p3,t4) VLDB’ 2011 - Seattle, USA

Query processing VLDB’ 2011 - Seattle, USA • Compute top-k data objects progressively aggregating partial scores retrieved from Ri • Similar to Fagin’s algorithm (NRA) • Algorithm • Each time an object p is retrieved from Ri, any unseen object p’ in Ri has a score(p’) ≤ score(p) • Keep track of lower and upper-bound score of the seen objects • Terminates when the lower-bound of the k-th object is better than the upper-bound of the remaining objects

Example (range, r=4.5) r=4.5 r=4.5 R1 R2 p3(0.8) p1(0.9) = 1.7 + hotel X restaurant hotel X bar 0.8 - 1.7 1.7 p1 p3 - 0.9 0.9 0.8 VLDB’ 2011 - Seattle, USA

Example (range, r=4.5) r=4.5 r=4.5 R1 R2 p2(0.6) p2(0.6) = 1.2 + 1.4 1.5 0.6 1.2 p2 0.6 1.2 VLDB’ 2011 - Seattle, USA

Example (range, r=4.5) r=4.5 r=4.5 Top-1 R1 R2 p1(0.2) p3(0.3) = 0.5 + 0.2 1.1 1.1 0.3 1.1 1.1 VLDB’ 2011 - Seattle, USA

Materialization VLDB’ 2011 - Seattle, USA • Objects are partitioned into regions • The distance among objects in the same region is small • The skyline set of the objects in the same region is similar with high probability • Compute SKY(R, Fi) for the region R • SKY(p, Fi)SKY(R, Fi), ∀pR • Advantage • The feature set is accessed only once to compute the dynamic skyline of all objects in the region

Experimental evaluation [1] Yiu, M.L., Dai, X., Mamoulis, N., Vaitis, M., : “Top-k spatial preference queries”, ICDE, 2007. [2] Yiu, M.L., Lu, H., Mamoulis, N., Vaitis, M.: “Ranking spatial data by quality preferences”, TKDE, 2011. VLDB’ 2011 - Seattle, USA We compare our approach (SFA) against SP, GP, BB, BB*, and FJ algorithms [1,2] All approaches are implemented in Java Measures: response time, I/O, update time, index construction time, and index size

Variables studied VLDB’ 2011 - Seattle, USA • Data distribution • Uniform (UN), Synthetic (CN), Real (RL) • Cardinality (object and features) • 50K, 100K, 200K, 400K, 800K, 1600K • Number of results (k) • 10, 20, 30, 40, 50 • Number of feature sets • 1, 2, 3, 4 5 • Query range (r), for range and influence queries • 10, 40, 160, 640, 2560

Datasets VLDB’ 2011 - Seattle, USA

Number of features a) I/O varying the number of feature sets b) response time varying the number of feature sets VLDB’ 2011 - Seattle, USA

Scalability b) response time varying |O| a) response time varying |Fi| VLDB’ 2011 - Seattle, USA

Real datasets a) range b) influence c) nearest neighbor VLDB’ 2011 - Seattle, USA

Conclusion VLDB’ 2011 - Seattle, USA • Top-k spatial preference queries are a useful tool for novel location-based applications • We propose a new approach for processing top-k spatial preference queries efficiently • We find and materialize SKY(p, Fi) • We prove that SKY(p, Fi) is sufficient to determine the partial score of p for any spatial preference query • The size of SKY(p, Fi) is small in practice • We propose algorithms to process queries using our index • The efficiency of our approach is verified through experiments on synthetic and real datasets

Thanks! VLDB’ 2011 - Seattle, USA More information: João B. Rocha-Junior joao@idi.ntnu.no http://www.idi.ntnu.no/~joao

Efficient Processing of Top-k Spatial Preference Queries

Efficient Processing of Top-k Spatial Preference Queries

Presentation Transcript

Spatial Join Queries

Top-k Query Processing

Spatial Queries

Evaluating Top- K Selection Queries

Answering Top-k Queries Using Views

Continuous Processing of Preference Queries in Data Streams : a Survey

Top- k Queries on Uncertain Data

RankReduce – Processing K-Nearest Neighbors Queries on Top of MapReduce

Supporting Efficient Top-k Queries in Type-A h ead Search

Spatial Queries

Efficient Processing of Top- k Queries in Uncertain Databases

Efficient Processing of XPath Queries Using Indexes

Efficient processing of XPath queries with structured overlay networks

Efficient Algorithm For Processing XPath Queries

Continuous Top-k Dominating Queries

Efficient Processing of Metric Skyline Queries

Reverse Top- k Queries

Spatial Queries

Evaluation of Conditional Preference Queries

Basic Spatial Queries

Spatial Queries

Spatial Queries