600 likes | 905 Views
Spatial Query Processing using the R-tree. Donghui Zhang CCIS, Northeastern University Feb 8, 2005. Content. The R-tree Range Query Aggregation Query NN Query RNN Query Closest Pair Query Close Pair Query Skyline Query. R-Tree Motivation. y axis. 10. m. g. h. l. 8. k. f. e.
E N D
Spatial Query Processing using the R-tree Donghui Zhang CCIS, Northeastern University Feb 8, 2005
Content • The R-tree • Range Query • Aggregation Query • NN Query • RNN Query • Closest Pair Query • Close Pair Query • Skyline Query
R-Tree Motivation y axis 10 m g h l 8 k f e 6 i j d 4 b a 2 c x axis 10 0 8 2 4 6 Range query: find the objects in a given range. E.g. find all hotels in Boston. No index: scan through all objects. NOT EFFICIENT!
Range Query y axis 10 m g h l 8 k f e E 2 6 i j E d 1 4 b a 2 c x axis 10 0 8 2 4 6 Root E E 1 2 E E E E E E 1 E 3 4 5 6 7 2 e a c d g b f j m i l h k E E E E E 4 5 3 6 7
Range Query y axis 10 m g h l 8 k f e E 2 6 i j E d 1 4 b a 2 c x axis 10 0 8 2 4 6 Root E E 1 2 E E E E E E 1 E 3 4 5 6 7 2 e a c d g b f j m i l h k E E E E E 4 5 3 6 7
Aggregation Query • Given a range, find some aggregate value of objects in this range. • COUNT, SUM, AVG, MIN, MAX • E.g. find the total number of hotels in Massachusetts. • Straightforward approach: reduce to a range query. • Better approach: along with each index entry, store aggregate of the sub-tree.
Aggregation Query y axis 10 m g h l 8 k f e E 2 6 i j E d 1 4 b a 2 c x axis 10 0 8 2 4 6 Root E :8 E :5 1 2 E E :3 E :2 E :3 E :3 E :2 1 E 3 4 5 6 7 2 e a c d g b f j m i l h k E E E E E 4 5 3 6 7
Aggregation Query y axis 10 m g h l 8 k f e E 2 6 i j E d 1 4 b a Subtree pruned! 2 c x axis 10 0 8 2 4 6 Root E :8 E :5 1 2 E E :3 E :2 E :3 E :3 E :2 1 E 3 4 5 6 7 2 e a c d g b f j m i l h k E E E E E 4 5 3 6 7
Content • The R-tree • Range Query • Aggregation Query • NN Query • RNN Query • Closest Pair Query • Close Pair Query • Skyline Query
Nearest Neighbor (NN) Query • Given a query location q, find the nearest object. • E.g.: given a hotel, find its nearest bar. a q
A Useful Metric: MINDIST • Minimum distance between q and an MBR. • It is an lower bound of d(o, q) for every object o in E1. • MINDIST(o, q) = d(o, q). E1 q
NN Basic Algorithm E1 q • Keep a heap H of index entries and objects, ordered by MINDIST. • Initially, H contains the root. • While H • Extract the element with minimum MINDIST • If it is an index entry, insert its children into H. • If it is an object, return it as NN. • End while
NN Query Example y axis 10 m E E 7 2 g Action Heap h l 8 E 6 Visit Root E E k 1 2 1 2 f E e 5 6 E i j 4 E 1 query d 4 E 3 b a 2 c x axis 10 0 8 2 4 6 Root E E 1 2 1 2 E E E E E E 3 4 5 6 7 1 E 2 9 5 13 5 2 a c b j i k e d g f m l h 2 10 13 E E E E E 4 5 3 6 7
NN Query Example y axis 10 m E E 7 2 g Action Heap h l 8 E 6 Visit Root E E k 1 2 1 2 f E e 5 E follow E E E E 6 5 5 9 1 3 5 4 2 E 2 i j 4 E 1 query d 4 E 3 b a 2 c x axis 10 0 8 2 4 6 Root E E 1 2 1 2 E E E E E E 3 4 5 6 7 1 E 2 9 5 13 5 2 a c b j i k e d g f m l h 2 10 13 E E E E E 4 5 3 6 7
NN Query Example y axis 10 m E E 7 2 g Action Heap h l 8 E 6 Visit Root E E k 1 2 1 2 f E e 5 E follow E E E E 6 5 5 9 1 3 5 4 2 E 2 i j 4 E E follow E E E E E 1 query 5 5 9 2 13 d 3 5 4 2 7 6 4 E 3 b a 2 c x axis 10 0 8 2 4 6 Root E E 1 2 1 2 E E E E E E 3 4 5 6 7 1 E 2 9 5 13 5 2 a c b j i k e d g f m l h 2 10 13 E E E E E 4 5 3 6 7
NN Query Example y axis 10 m E E 7 2 g Action Heap h l 8 E 6 Visit Root E E k 1 2 1 2 f E e 5 E follow E E E E 6 5 5 9 1 3 5 4 2 E 2 i j 4 E E follow E E E E E 1 query 5 5 9 2 13 d 3 5 4 2 7 6 4 E E follow E j i E E k E 3 13 10 6 5 5 13 b 2 9 7 3 5 a 4 2 c x axis 10 0 8 2 4 6 Root E E 1 2 1 2 E E E E E E 3 4 5 6 7 1 E 2 9 5 13 5 2 a c b j i k e d g f m l h 2 10 13 E E E E E 4 5 3 6 7
NN Query Example y axis 10 m E E 7 2 g Action Heap h l 8 E 6 Visit Root E E k 1 2 1 2 f E e 5 E follow E E E E 6 5 5 9 1 3 5 4 2 E 2 i j 4 E E follow E E E E E 1 query 5 5 9 2 13 d 3 5 4 2 7 6 4 E E follow E j i E E k E 3 13 10 6 5 5 13 b 2 9 7 3 5 a 4 Report i and terminate 2 c x axis 10 0 8 2 4 6 Root E E 1 2 1 2 E E E E E E 3 4 5 6 7 1 E 2 9 5 13 5 2 a c b j i k e d g f m l h 2 10 13 E E E E E 4 5 3 6 7
Pruning 1 in NN Query • If we see an object o, prune every MBR whose MINDIST > d(o, q). • Side notice: at most one object in H! E1 E2 E3 o E4 q
Pruning 2 using MINMAXDIST Object o in sub-tree of E2 s.t. d(o, q) MINMAXDIST(q, E2) • Prune even before we see an object! • Prune E1 if exists E2 s.t. MINDIST(q, E1) > MINMAXDIST(q, E2). • MINMAXDIST: compute max dist between q and each edge of E2, then take min. E1 E2 q
NN Full-Blown Algorithm • Keep a heap H of index entries and objects, ordered by MINDIST. • Initially, H contains the root. • Setδ= +. • While H • Extract the element e with minimum MINDIST. • If it is an object, return it as NN. • For every entry se in PAGE(e) whose MINDISTδ • Insert se into H. • Decreaseδto MINMAXDIST(q, se) if possible. • End while
Content • The R-tree • Range Query • Aggregation Query • NN Query • RNN Query • Closest Pair Query • Close Pair Query • Skyline Query
Reverse NN: Definition • Given a set of points, and a query location q. • Find the points whose NN is q. • RNN(q)={p1, p2}, NN(q)= p3. p p 4 1 p 3 q p 2
Half-plane pruning p p 1 1 p 2 First NN N 1 First NN N 2 p' p ^ q ( ) , 1 ^ ( p , q ) p ^ q ( ) , q 2 q Second NN
(TPL) Algorithm • Two logical steps: • Filter step: Find the set Scnd of candidate points • Find NN; • Prune space; • Find NN in unpruned space; • … • Till no more object left. • Refinement step: eliminate false positives • For each point p in Scnd, check whether its NN is not q. • The two steps are combined in a single tree traversal, by keeping all pruned MBRs/objects in Srfn.
Content • The R-tree • Range Query • Aggregation Query • NN Query • RNN Query • Closest Pair Query • Close Pair Query • Skyline Query
Closest Pair (CP) Query • Given two sets of objects R and S, • Find the pair of objects (rR, sS) with minimum distance. • CP = (r2, s1) • E.g. find the closest pair of hotel-bar. s 3 r r r 3 s 2 1 1 r 5 r s 4 2
CP Solution Idea • Assume R and S are indexed by R-trees with same height. • Similar to the NN query algorithm. • MINDIST, MINMAXDIST for a pair of MBRs: Sj Ri MINDIST MINMAXDIST
CP Basic Algorithm • Keep a heap H of pairs of index entries and pairs of objects, ordered by MINDIST. • Initially, H contains the pair of roots. • While H • Extract the pair (eR, eS) with minimum MINDIST. • If it is a pair of objects, return it as CP. • For every entry seR in PAGE(eR ) and every entry seS in PAGE(eS) • Insert(eR, eS) into H. • End while
CP Full-Blown Algorithm • Keep a priority queue H of pairs of index entries and pairs of objects, ordered by MINDIST. • Initially, H contains the pair of roots. • Setδ= +. • While H • Extract the pair (eR, eS) with minimum MINDIST. • If it is a pair of objects, return it as CP. • For every entry seR in PAGE(eR ) and every entry seS in PAGE(eS) whose MINDISTδ • Insert( seR, seS ) into H. • Decreaseδto MINMAXDIST( seR, seS) if possible. • End while
Content • The R-tree • Range Query • Aggregation Query • NN Query • RNN Query • Closest Pair Query • Close Pair Query • Skyline Query
Close Pair Query • Given two sets of objects R and S, plus a threshold α, • Find every pair of objects (rR, sS) with distance <α. • Close pairs = (r1, s1), (r2, s1), and (r3, s3). α s 3 r r r 3 s 2 1 1 r 5 r s 4 2
Close Pair Solution Idea • Observation: if d(r, s) <α, mbrR, mbrSthat contain r and s, respectively, we have: MINDIST(mbrR, mbrS) <α. • Solution idea: • start with the pair of root nodes, • Join pairs of index entries whose MINDIST <α, • Till we reach leaf level.
Close Pair Algorithm • Push the pair of root nodes into stack. • While stack • Pop a pair (eR, eS) from stack. • For every entry seR in PAGE(eR) and seS in PAGE(eS) where MINDIST(seR, seS) <α • Push (seR, seS) into stack if seR is an index entry; • Otherwise report (seR, seS) as one close pair. • End while
Content • The R-tree • Range Query • Aggregation Query • NN Query • RNN Query • Closest Pair Query • Close Pair Query • Skyline Query
Skyline of Manhattan • Which buildings can we see? • Higher or nearer
Finding A Hotel Close to the Beach • Which one is better? • i or h? (i, because its price and distance dominate those of h) • i or k?
Skyline Queries • Retrieve points not dominated by any other point.