Efficient Method for Maximizing Bichromatic Reverse Nearest Neighbor

Efficient Method for Maximizing Bichromatic Reverse Nearest Neighbor Raymond Chi-Wing Wong (Hong Kong University of Science and Technology) M. Tamer Ozsu (University of Waterloo) Philip S. Yu (University of Illinois at Chicago) Ada Wai-Chee Fu (Chinese University of Hong Kong) Lian Liu (Hong Kong University of Science and Technology) Presented by Raymond Chi-Wing Wong Presented by Raymond Chi-Wing Wong

Outline • Introduction • Related work – Bichromatic Reverse Nearest Neighbor • Problem - MaxBRNN • Algorithm - MaxOverlap • Empirical Study • Conclusion

1. Introduction • Bichromatic Reverse Nearest Neighbor (BRNN or RNN) • Given • P and O are two sets of points in the same data space • Problem • Given a point pP, a BRNN query finds all the points oO whose nearest neighbor (NN) in P are p.

o3 o1 p1 o4 o2 p2 o5 1. Introduction Convenience stores NN: Nearest neighbor RNN: Reverse nearest neighbor P = {p1, p2} Customers O = {o1, o2, o3 , o4, o5} NN in P = p1 NN in P = p2 NN in P = p2 NN in P = p1 RNN = {o1, o2} RNN = {o3, o4 , o5} NN in P = p2

o3 o1 p1 p o4 o2 p2 o5 Placement 1 RNN = {o1, o2} 2 1. Introduction Convenience stores NN: Nearest neighbor RNN: Reverse nearest neighbor P = {p1, p2} Customers O = {o1, o2, o3 , o4, o5} Placement 1 Suppose that we want to set up a new convenience store p Where should we set up? RNN = {o1, o2} Influence value = 2

o3 o1 p1 p o4 o2 p2 o5 Placement 1 RNN = {o1, o2} 2 1. Introduction Placement 2 RNN = {o1, o2 , o3, o4 , o5} 5 Which placement is better? Placement 2 Convenience stores NN: Nearest neighbor RNN: Reverse nearest neighbor P = {p1, p2} Customers O = {o1, o2, o3 , o4, o5} Placement 2 Suppose that we want to set up a new convenience store p Where should we set up? Different placements of p may have different RNN sets RNN = {o1, o2 , o3, o4 , o5} Influence value = 5

o3 o1 p1 p o4 o2 p2 o5 Placement 1 RNN = {o1, o2} 2 1. Introduction Placement 2 RNN = {o1, o2 , o3, o4 , o5} 5 Placement 3 RNN = {o1, o2 , o3, o4 , o5} 5 Convenience stores NN: Nearest neighbor RNN: Reverse nearest neighbor P = {p1, p2} Customers O = {o1, o2, o3 , o4, o5} Placement 3 Suppose that we want to set up a new convenience store p Where should we set up? Different placements of p may have the same RNN set RNN = {o1, o2 , o3, o4 , o5} Influence value = 5

o3 o1 p1 o4 o2 p2 o5 Placement 1 RNN = {o1, o2} 2 1. Introduction Placement 2 RNN = {o1, o2 , o3, o4 , o5} 5 Placement 3 RNN = {o1, o2 , o3, o4 , o5} 5 Convenience stores NN: Nearest neighbor RNN: Reverse nearest neighbor P = {p1, p2} Customers O = {o1, o2, o3 , o4, o5} Suppose that we want to set up a new convenience store p Where should we set up? Problem: We want to find a region R (or area) such that when p is placed in R, the influence value of p is maximized.

1. Introduction • Related Work • Arrangement • Running Time = O(|O| log |P| + |O|2 +2(|O|))where (|O|) is a function on |O| and is (|O|) • Our Proposed Algorithm MaxOverlap • Running Time = O(|O| log |P| + k2 |O| +k |O| log |O|)where k << |O| • Significant improvementon Running Time Problem: We want to find a region R (or area) such that when p is placed in R, the influence value of p is maximized.

o3 o1 p1 p o4 o2 p2 o5 2. Problem Convenience stores NN: Nearest neighbor RNN: Reverse nearest neighbor P = {p1, p2} Customers O = {o1, o2, o3 , o4, o5} RNN = {o1, o2 , o3, o4 , o5} Problem: We want to find a region R (or area) such that when p is placed in R, the influence value of p is maximized.

o3 o1 p1 p o4 o2 p2 o5 2. Problem Convenience stores NN: Nearest neighbor RNN: Reverse nearest neighbor P = {p1, p2} Customers O = {o1, o2, o3 , o4, o5} RNN = {o1, o2 , o3, o4 , o5} Consistent region Problem: We want to find a region R (or area) such that when p is placed in R, the influence value of p is maximized. Influence value = 5 For any two possible placements in this region, their RNN sets are the same

o3 o1 p1 o4 o2 p2 o5 2. Problem Convenience stores NN: Nearest neighbor RNN: Reverse nearest neighbor P = {p1, p2} Customers O = {o1, o2, o3 , o4, o5} Problem: We want to find a region R (or area) such that when p is placed in R, the influence value of p is maximized.

o3 o1 p1 p o4 o2 p2 o5 2. Problem Convenience stores NN: Nearest neighbor RNN: Reverse nearest neighbor P = {p1, p2} Customers O = {o1, o2, o3 , o4, o5} RNN = {o1, o2} Problem: We want to find a region R (or area) such that when p is placed in R, the influence value of p is maximized.

o3 o1 p1 p o4 o2 p2 o5 2. Problem Convenience stores NN: Nearest neighbor RNN: Reverse nearest neighbor P = {p1, p2} Customers O = {o1, o2, o3 , o4, o5} RNN = {o1, o2 , o3, o4 , o5} Problem: We want to find a region R (or area) such that when p is placed in R, the influence value of p is maximized. Non-Consistent region

o3 o1 p1 o4 o2 p2 o5 2. Problem Convenience stores NN: Nearest neighbor RNN: Reverse nearest neighbor P = {p1, p2} Customers O = {o1, o2, o3 , o4, o5} Consistent region Problem: We want to find a region R (or area) such that when p is placed in R, the influence value of p is maximized.

o3 o1 p1 o4 o2 p2 o5 2. Problem Convenience stores NN: Nearest neighbor RNN: Reverse nearest neighbor P = {p1, p2} Customers O = {o1, o2, o3 , o4, o5} Consistent region Problem: We want to find a region R (or area) such that when p is placed in R, the influence value of p is maximized. Influence value = 5 Many consistent regions!

o3 o1 p1 o4 o2 p2 o5 2. Problem Convenience stores NN: Nearest neighbor RNN: Reverse nearest neighbor P = {p1, p2} Customers O = {o1, o2, o3 , o4, o5} Maximal consistent region Problem: We want to find a region R (or area) such that when p is placed in R, the influence value of p is maximized. Influence value = 5 There does not exist another consistent region R’ where (1) R’ covers R and (2) the RNN sets of R and R’ are equal

o3 o1 p1 o4 o2 p2 o5 2. Problem Convenience stores NN: Nearest neighbor RNN: Reverse nearest neighbor P = {p1, p2} Customers O = {o1, o2, o3 , o4, o5} Maximal consistent region Maximal consistent region Problem: We want to find a region R (or area) such that when p is placed in R, the influence value of p is maximized. Influence value = 5 There does not exist another consistent region R’ where (1) R’ covers R and (2) the RNN sets of R and R’ are equal

o3 o1 p1 o4 o2 p2 o5 Problem: We want to find a maximal consistent region R such that when the influence value of R is maximized. 2. Problem We call this problem Maximizing Bichromatic Reverse Nearest Neighbor (MaxBRNN) Maximal consistent region Problem: We want to find a region R (or area) such that when p is placed in R, the influence value of p is maximized.

o3 o1 p1 o4 o2 p2 o5 Problem: We want to find a maximal consistent region R such that when the influence value of R is maximized. 2. Problem We call this problem Maximizing Bichromatic Reverse Nearest Neighbor (MaxBRNN) Two challenges: Challenge 1: It is difficult to find a maximal consistent region Challenge 2: We need to return the maximal consistent region with the greatest influence value

p1 o1 o2 p2 Problem: We want to find a maximal consistent region R such that when the influence value of R is maximized. 2. Problem Nearest location circle (NLC) We call this problem Maximizing Bichromatic Reverse Nearest Neighbor (MaxBRNN) Convenience stores P = {p1, p2} Customers Construct a circle centered at o2 with radius |p2, o2| O = {o1, o2} NN in P = p2 Two challenges: Challenge 1: It is difficult to find a maximal consistent region Challenge 2: We need to return the maximal consistent region with the greatest influence value Construct a circle centered at o1 with radius |p1, o1| NN in P = p1

p1 o1 o2 p2 Problem: We want to find a maximal consistent region R such that when the influence value of R is maximized. 2. Problem Nearest location circle (NLC) We call this problem Maximizing Bichromatic Reverse Nearest Neighbor (MaxBRNN) Convenience stores P = {p1, p2} Customers O = {o1, o2} Two challenges: Challenge 1: It is difficult to find a maximal consistent region A Challenge 2: We need to return the maximal consistent region with the greatest influence value

p1 o1 o2 p2 Problem: We want to find a maximal consistent region R such that when the influence value of R is maximized. 2. Problem Nearest location circle (NLC) We call this problem Maximizing Bichromatic Reverse Nearest Neighbor (MaxBRNN) Convenience stores P = {p1, p2} Customers O = {o1, o2} Two challenges: Challenge 1: It is difficult to find a maximal consistent region B A Challenge 2: We need to return the maximal consistent region with the greatest influence value

p1 o1 o2 p2 Problem: We want to find a maximal consistent region R such that when the influence value of R is maximized. 2. Problem Nearest location circle (NLC) We call this problem Maximizing Bichromatic Reverse Nearest Neighbor (MaxBRNN) Convenience stores P = {p1, p2} Customers O = {o1, o2} Two challenges: Challenge 1: It is difficult to find a maximal consistent region B C A Challenge 2: We need to return the maximal consistent region with the greatest influence value

p1 o1 o2 p2 Problem: We want to find a maximal consistent region R such that when the influence value of R is maximized. 2. Problem Nearest location circle (NLC) We call this problem Maximizing Bichromatic Reverse Nearest Neighbor (MaxBRNN) Convenience stores P = {p1, p2} Customers O = {o1, o2} Two challenges: D Challenge 1: It is difficult to find a maximal consistent region B C A Challenge 2: We need to return the maximal consistent region with the greatest influence value

p1 o1 o2 p2 Problem: We want to find a maximal consistent region R such that when the influence value of R is maximized. Four maximal consistent regions 2. Problem Solution: Region A Intersection between two NLCs Nearest location circle (NLC) We call this problem Maximizing Bichromatic Reverse Nearest Neighbor (MaxBRNN) Convenience stores P = {p1, p2} Customers O = {o1, o2} 0 RNN set = {} Two challenges: D Challenge 1: It is difficult to find a maximal consistent region B C A Challenge 2: We need to return the maximal consistent region with the greatest influence value RNN set = {o1} RNN set = {o2} 1 1 RNN set = {o1, o2} 2

p1 o1 o2 p2 Problem: We want to find a maximal consistent region R such that when the influence value of R is maximized. Four maximal consistent regions 2. Problem Solution: Region A Intersection between two NLCs Nearest location circle (NLC) We call this problem Maximizing Bichromatic Reverse Nearest Neighbor (MaxBRNN) Lemma: The solution of MaxBRNN can be represented by an intersection of multiple nearest location circles. Two challenges: D Challenge 1: It is difficult to find a maximal consistent region B C A Challenge 2: We need to return the maximal consistent region with the greatest influence value

Problem: We want to find a maximal consistent region R such that when the influence value of R is maximized. 2. Problem We call this problem Maximizing Bichromatic Reverse Nearest Neighbor (MaxBRNN) Two challenges: We propose an algorithm called MaxOverlap Challenge 1: It is difficult to find a maximal consistent region Challenge 2: We need to return the maximal consistent region with the greatest influence value

3. Algorithm • Make use of the principle of region-to-point transformation • Search a limited number of points • Find the optimal point This optimal point can be mapped to the optimal region in Optimal Region Search Problem Optimal Point Search Problem Optimal Region Search Problem

p3 p4 o4 o3 o5 o2 p2 o6 p5 o1 p1 3. Algorithm Convenience stores P = {p1, p2 , p3 , p4 , p5} Customers O = {o1, o2, o3 , o4, o5 , o6}

3. Algorithm p3 p4 o4 o3 o5 o2 p2 o6 p5 o1 p1

Solution Intersection of c1, c2 and c3 3. Algorithm NLC c3 o4 o3 o5 NLC c2 o2 o6 The maixmal consistent region which maximizes the RNN set o1 NLC c1 Intersection of c1, c2 and c3

3. Algorithm • Algorithm MaxOverlap • Three-Step Algorithm

3. Algorithm Step 1 (Finding Intersection Point) o4 o3 o5 o2 o6 o1

3. Algorithm Step 1 (Finding Intersection Point) q7 o4 o3 q6 o5 q8 o2 q9 q1 o6 q4 q3 q5 q2 o1

3. Algorithm Step 2 (Point Query) q7 o4 o3 q6 o5 q8 o2 q9 q1 o6 q4 q3 q5 q2 o1 Point query for q4

3. Algorithm Step 2 (Point Query) c1 Result for q4 = { } q7 o4 o3 q6 o5 q8 o2 q9 q1 o6 q4 q3 q5 q2 o1 Point query for q4

3. Algorithm Step 2 (Point Query) c1 Result for q4 = { } , c3 q7 o4 o3 q6 o5 q8 o2 q9 q1 o6 q4 q3 q5 q2 o1 Point query for q4

3. Algorithm Step 2 (Point Query) c1 Result for q4 = { } , c3 q7 o4 o3 q6 o5 q8 o2 q9 q1 o6 q4 q3 q5 q2 o1 Point query for q3

3. Algorithm Step 2 (Point Query) c1 Result for q4 = { } , c3 c1 Result for q3 = { } q7 o4 o3 q6 o5 q8 o2 q9 q1 o6 q4 q3 q5 q2 o1 Point query for q3

3. Algorithm Step 2 (Point Query) c1 Result for q4 = { } , c3 c1 , c2 Result for q3 = { } q7 o4 o3 q6 o5 q8 o2 q9 q1 o6 q4 q3 q5 q2 o1 Point query for q3

3. Algorithm Step 2 (Point Query) c1 Result for q4 = { } , c3 c1 , c2 , c3 Result for q3 = { } q7 o4 o3 q6 o5 q8 o2 q9 q1 o6 q4 q3 q5 q2 o1 Point query for q3

3. Algorithm Step 2 (Point Query) c1 Result for q4 = { } , c3 c1 , c2 , c3 Result for q3 = { } q7 c1, c2, c3 Result for q1 = { } o4 o3 q6 o5 q8 o2 q9 q1 o6 q4 q3 q5 q2 o1 Point query for q1

3. Algorithm Step 2 (Point Query) c1 Result for q4 = { } , c3 c1 , c2 , c3 Result for q3 = { } q7 c1, c2, c3 Result for q1 = { } o4 c1, c2, c3 Result for q5 = { } o3 q6 o5 q8 o2 … q9 q1 o6 q4 q3 q5 q2 o1 Point query for q5

3. Algorithm The intersection of c1, c2 and c3 corresponds to the solution. Step 3 (Finding Maximum Size) c1 Result for q4 = { } , c3 c1 , c2 , c3 Result for q3 = { } q7 c1, c2, c3 Result for q1 = { } o4 c1, c2, c3 Result for q5 = { } o3 q6 o5 q8 o2 … q9 q1 o6 q4 q3 q5 q2 o1 Optimal Point Search Problem Optimal Region Search Problem

3. Algorithm • Theorem: The running time of algorithm MaxOverlap is O(|O| log |P| + k2|O| + k |O| log |O|)where • k is typically much smaller than |O|

3. Algorithm • Enhancement 1: We process the intersection points q in a pre-defined order • Enhancement 2: • Step 2 and Step 3 can be combined • We introduce a pruning technique such that some intersection points will not be processed.

4. Empirical Study • Synthetic Dataset • P: Gaussian distribution • O: Zipfian distribution • Real Dataset • Rtree Portalhttp://www.rtreeportal.org/spatial.html • CA (62,556) • LB (53,145) • GR (23,268) • GM (36,334) • P: one of the above datasets • O: one of the above datasets

4. Empirical Study • Measurements • Execution Time • Storage • Our proposed algorithms • MaxOverlap-P • MaxOverlap with Pruning • MaxOverlap-NP • MaxOverlap without pruning • Comparison with adapted algorithms • Arrangement • Buffer-Adapt

4. Empirical Study • Small dataset

Efficient Method for Maximizing Bichromatic Reverse Nearest Neighbor

Efficient Method for Maximizing Bichromatic Reverse Nearest Neighbor

Presentation Transcript

Nearest Neighbor Classifiers

Reverse Nearest Neighbor Aggregates

Nearest Neighbor

Nearest neighbor matching

Nearest-Neighbor Classifiers

The Reverse Nearest Neighbor (RNN) Query

Continuous Reverse Nearest Neighbor Monitoring

Nearest Neighbor and Reverse Nearest Neighbor Queries for Moving Objects

Monochromatic and Bichromatic Reverse Nearest Neighbor Queries on Land Surfaces

Reverse Spatial and Textual k Nearest Neighbor Search

Nearest Neighbor

Reverse Spatial and Textual k Nearest Neighbor Search

Reverse Spatial and Textual k Nearest Neighbor Search

Reverse Spatial and Textual k Nearest Neighbor Search

Reverse Spatial and Textual k Nearest Neighbor Search

K-Nearest Neighbor

A vector quantization method for nearest neighbor classifier design

Learning: Nearest Neighbor

Nearest Neighbor Classifier

Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data

Classification Nearest Neighbor