220 likes | 325 Views
On Spatial-Range Closest Pair Query. Jing Shan , Donghui Zhang and Betty Salzberg College of Computer and Information Science Northeastern University. Outline. Problem Definition Straightforward Approach Existing Technique Our Method Performance. Problem Definition.
E N D
On Spatial-Range Closest Pair Query Jing Shan, Donghui Zhang and Betty Salzberg College of Computer and Information Science Northeastern University
Outline • Problem Definition • Straightforward Approach • Existing Technique • Our Method • Performance SSTD03 --- Santorini, Greece
Problem Definition • Given a spatial data set S, the Range Closest Pair query regarding a spatial range R finds a pair of objects (s1, s2) with s1 and s2 R such that the distance between s1 and s2 is the smallest distance between two objects inside range R. j R Query result is (e, f). SSTD03 --- Santorini, Greece
Outline • Problem Definition • Straightforward Approach • Existing Technique • Our Method • Performance SSTD03 --- Santorini, Greece
Straightforward Approach • Use an R-tree to select the objects in the query range. • Find the closest pair by checking objects in the selection result. • We could do nested-loop; • Or better approaches e.g. plane sweep with Voronoi diagram method is O(n log n). • Problems: Have to access all data pages of R-tree which intersect the query range. Query range data may not fit in memory SSTD03 --- Santorini, Greece
Note on Existing Techniques • [Hjaltason and Samet 98]: incremental join. • [Corral, Manolopoulos, Theodoridis and Vassilakopoulos 00]: an improved version, using pruning. • They addressed a slightly different problem: • No query range. • Joining two different R-trees. • Existing techniques do not perform well if there is overlap between the two R-trees. In case the two R-trees are identical, there is extensive overlap. SSTD03 --- Santorini, Greece
MinDist • Given two MBRs A, B of R-tree nodes, MinDist(A, B)isthe smallest distance between A and B boundaries. • object o1 A and o2B, distance(o1, o2) MinDist(A, B). MinDist A B SSTD03 --- Santorini, Greece
Existing Technique • T=; closestpair=NULL. • Push the pair of root entries into priority queue Q. • While Q is not empty • Pop (e1, e2) from Q whose MinDist is the smallest. • If e1 points to an index node, For every child entry se1 in Node(e1) and child entry se2 in Node(e2) If MinDist(se1, se2)<T, push (se1, se2) into Q. • Else /* e1 point a leaf node */ For every object o1 in Node(e1) and object o2 in Node(e2) If distance(o1, o2)<T, update T=distance(o1,o2) and closestpair=(o1,o2) and remove pairs from Q with MinDist no smaller than T. SSTD03 --- Santorini, Greece
R A A B C D D C B a,b f,i c,e,g d,h Example T = ; closestpair=NULL (R,R) (A,A) (B,B) (C,C) (D,D) (A,C) (B,C) (A,B) (C,D) (A,D) (B,D) SSTD03 --- Santorini, Greece
R A A B C D D C B a,b f,i c,e,g d,h Example T = distance(a, b); closestpair=(a, b) (R,R) (A,A) (B,B) (C,C) (D,D) (A,C) (B,C) (A,B) (C,D) (A,D) (B,D) SSTD03 --- Santorini, Greece
R A A B C D D C B a,b f,i c,e,g d,h Example T = distance(f, e); closestpair=(f, e) (R,R) (A,A) (B,B) (C,C) (D,D) (A,C) (B,C) (A,B) (C,D) (A,D) (B,D) SSTD03 --- Santorini, Greece
MinExistDist • Given two MBRs A, B of R-tree nodes, MinExistDist(A, B)isthe minimum distance which guarantees that there exists a pair of objects, one in A and the other in B, with distance closer than the metric. • object o1 A and o2B, distance(o1, o2) MinExistDist(A, B). • Usage [CMT+00]: if MinExistDist(A, B) is smaller than T, update T. This can increase the chance of eliminating pairs from Q at early time. MinDist A B MinExistDist SSTD03 --- Santorini, Greece
MinDist MinExistDist Involving a Query Range • We extend the MinExistDist… MinDist MinExistDist = ∞ SSTD03 --- Santorini, Greece
Outline • Problem Definition • Straightforward Approach • Existing Technique • Our Method • Performance SSTD03 --- Santorini, Greece
Motivation for Our Method • The existing technique joins all self-pairs, e.g. (A,A), (B,B), … • Reason: the MinDist of any self pair is 0. • Challenge: is it possible to make it non-zero? If MinDist(A,A) T, no need to process (A,A) ! • We propose two ways to augment the R-tree with additional information. We call the augmented structures the Self-Range Closest-Pair Tree. In short, SRCP-tree. SSTD03 --- Santorini, Greece
SRCP-tree (version 1) • Along with each index entry, store the closest pair of objects in the sub-tree. • Check the closest pair stored along with the root entry. If both objects are inside the query range R, return. • Along with each self pair to be pushed into Q, use the distance of the local closest pair (rather than 0) as the MinDist. • If we encounter an index entry where both objects in the closest pair are inside R, compare their distance with T. May decrease T. SSTD03 --- Santorini, Greece
At each such entry, let the original local closest pair be (a,b). Needs to updated only if distance(o, o’) < distance (a,b) for some object o’ in the sub-tree. distance (a,b) (a,b) o o Insertion • When a new object o is inserted, only need to update the augmented information along the insertion path. (But need to visit subtrees.) SSTD03 --- Santorini, Greece
SRCP-tree (version 2) • Idea: while version 1 tries to avoid processing self pairs, version 2 of the structure tries to avoid processing sibling pairs. • E.g. if R has children A, B, C, D, version 1 cannot avoid pair (A,B), unless MinDist(A,B) T. Similarly, it has to process (A,C), (A,D), (B,C), (B,D), (C,D). • In version 2, every index entry e stores the “local-parent closest pair”: the closest pair between an object in the sub-tree pointed by e and an object in the sub-tree pointed by Parent(e). • E.g. along with A, we store the closest pair of objects (o1, o2), where o1 is in subtree(A) and o2 is in subtree(R). • Now, if the distance of object pair stored at A is no smaller than T, no need to process any pair involving A. Namely, (A,A), (A,B), (A,C), (A,D). SSTD03 --- Santorini, Greece
Performance • Dell Pentium 4, 2.66GHz CPU • XXL library, Java • Both synthetic and real data: • uniform data (80,000 objects) • US National Mapping Information (26,700 Massachusetts sites) URL = http://mappings. usgs.gov/www/gnis/ • Focus on query time. SSTD03 --- Santorini, Greece
Small Query Range SSTD03 --- Santorini, Greece
Large Query Range SSTD03 --- Santorini, Greece
Conclusions • We have addressed the spatial closest pair query with query range. • We have proposed two versions of an index structure called SRCP-tree. • Our approaches have much better query performance than the existing techniques, especially when the query range is large. • In particular, version 2 of the SRCP-tree is universally the best. SSTD03 --- Santorini, Greece