1 / 18

SLICE: Reviving Regions-Based Pruning for Reverse k Nearest Neighbors Queries

SLICE: Reviving Regions-Based Pruning for Reverse k Nearest Neighbors Queries. Shiyu Yang 1 , Muhammad Aamir Cheema 2,1 , Xuemin Lin 1,3 , Ying Zhang 4,1. 1 The University of New South Wales, Australia 2 Monash University, Australia 3 East China Normal University, China

dixon
Download Presentation

SLICE: Reviving Regions-Based Pruning for Reverse k Nearest Neighbors Queries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SLICE: Reviving Regions-Based Pruning for Reverse k Nearest Neighbors Queries Shiyu Yang1, Muhammad AamirCheema2,1,Xuemin Lin1,3, Ying Zhang4,1 1The University of New South Wales, Australia 2 Monash University, Australia 3 East China Normal University, China 4University of Technology, Sydney, Australia

  2. Introduction • k Nearest Neighbor Query • Find the facility that is one of k-closest facilities to the query user. • Reverse k Nearest Neighbor Query • Find every user for which the query facility is one of the k-closest facilities. • RkNNs are the potential customers of a facility u1 u2 u3 f1 f3 f2 K=1

  3. Related Work Six-regions (SIGMOD 2000) TPL (VLDB 2004) FINCH (VLDB 2008) Boost (SIGMOD 2010) InfZone (ICDE2011) Half-space Six-regions (SIGMOD 2000) Region-based TPL (VLDB 2004), FINCH (VLDB 2008), InfZone (ICDE 2011)

  4. Related Work k=2 • Regions-based Pruning: -Six-regions(SIGMOD 2000) • Divide the whole space centred at the query q into six equal regions • Find the k-th nearest neighbor in each Partition. • The k-th nearest facility of q in each region defines the area that can be pruned a u1 u2 b d c q The user points that cannot be pruned should be verified by range query

  5. Related Work k=2 • Half-space Pruning: the space that is contained by khalf- spaces can be pruned -TPL(VLDB 2004) • Find the nearest facility f in the unpruned area. • Draw a bisector between q and f, prune by using the half-space • Iteratively access the nearest facility in unpruned area. a b d c q

  6. Related Work k=2 • Half-space Pruning: -InfZone(ICDE 2011) • The influence zone corresponds to the unpruned area when the bisectors of all the facilities have been considered for pruning. • A point p is a RkNN of q if and only if p lies inside unprunedarea. • No verification phase. a b d c q Half-space pruningis expensive especially when k is large.

  7. Related Work VS Regions-based SLICE Half-space O(km2) O(m log m) O(m log k) Pruning Cost m is the # of facilities considered for pruning High High Pruning Power Low • Range query Verification Cost O(k) O(log m) • Can regions-based pruning do better?

  8. Notations • Partition: P • Subtended angle: ∠a • Maximum (minimal) subtended angle w.r.t P (, ) • Upper (lower) arc • Center: q • Radius: = P p f q θmax Upper θmin a Lower

  9. Observation -- Pruning P • A facility f prunes every point p ∈ P for which dist(p,q) > (UpperArc) < 90◦ • We can prove a < b. • a2=b2+c2-2bc∙cos() • b> = • c2-2bc∙cos() < c2-2c∙cos() = c2(1- ) <0 • Facility prunes area outside the upper arc of f for every partition P for which < 90◦ a p f b c q Upper θmax θ

  10. Comparison with Six-regions VS Six-region SLICE f dist(f,q) Area pruned q One < 90o Partitions Pruned any 6 No. of Partitions

  11. Pruning Algorithm • Divide space into tpartitions • Compute the upper arc of each partition for facilities. • The area outside the k-th smallest upper arc(rB)in each partition can be pruned. • Users in the pruned area can be pruned • Users in the unpruned area will be verified by accessing significant facilities f1 f2 u1 u2 q k=2

  12. Significant Facility Verification P • Significant facility: • A facility f that prunes at least one point p ∈ P lying inside the bounding arc of P. • Verification for a candidate 2 q SLICE Regions-based Issuing range query for each candidate Accessing significant facilities (O(k)) M N Significant facility cannot be in red area High I/O cost No additional I/O cost

  13. Theoretical Analyses • Number of significant facilities • More analyses can be found in paper • I/O Cost • Pruning phase: • Same as circular range query centered at q with radius 2rB • Verification phase: • Same as circular range query centered at q with radius rB 2.34k ( θ ⇒ 0) 9k ( θ = 60o)

  14. Experiments • Data Set : • Synthetic data : • Size:50000, 100000, 150000 or 200000 • Distribution: Uniform or Normal • Real data: The real data set consists of 175, 812 points in North America • Algorithms: • Six-regions, InfZone and SLICE • Page size 4KB and number of buffers for Six-regionsis 10 • Number of partitions for SLICE is 12

  15. Experiments • Effect of different values of k CPU I/O

  16. Experiments • Effect of data distribution • Effect of % users

  17. Experiments • Effect of partitions • Number of significant facilities Number of partitions Value of k

  18. Thanks! Q&A

More Related