510 likes | 617 Views
Donghui Zhang Northeastern University Coauthors: Yang Du, Tian Xia. The Optimal-Location Query. Motivation. “ What is the optimal location in Boston area to build a new McDonald’s store?” Optimality: maximize the number of customers who think the new store is closer to them.
E N D
Donghui Zhang Northeastern University Coauthors: Yang Du, Tian Xia The Optimal-Location Query
Motivation • “What is the optimal location in Boston area to build a new McDonald’s store?” • Optimality: maximize the number of customers who think the new store is closer to them.
We consider the L1 distance: |x1 - x2|+|y1 - y2| Formal Definition • Given a set S of sites, a set O of weighted objects, and a query range Q , • Find a location lQ which maximizes oOo.weight s.t. sS, d(o,l) d(o,s).
We consider the L1 distance: |x1 - x2|+|y1 - y2| Formal Definition • Given a set S of sites, a set O of weighted objects, and a query range Q , • Find a location lQ which maximizes oOo.weight s.t. sS, d(o,l) d(o,s).
Q Example o :3 2 o :6 4 o :5 3 o :4 s 1 2 s 1
Example Q o :3 2 l1 19 o :6 4 22 o :5 10 3 o :4 s 1 12 2 s 1 The “Influence” of l1 is 5+6=11.
Example Q o :3 19 2 l2 l1 o :6 4 22 o :5 18 3 o :4 s 1 12 2 s 1 The “Influence” of l1 is 5+6=11. The Influence of l2 is 5.
Content • Problem Definition • Straightforward Solution • Problem Transformation • The R-tree-based solution • The OL-tree • The VOL-tree • Performance
Using the RNN Algorithm… o :3 2 l1 19 o :6 4 22 o :5 10 3 o :4 s 1 12 2 s 1 The RNNs of l1 are O3 and O4.
Straightforward Solution o :3 2 o :6 4 o :5 3 o :4 s 1 2 s 1 Compute the influence for every location in Q. Problematic: infinite number of candidates!.
Content • Problem Definition • Straightforward Solution • Problem Transformation • The R-tree-based Solution • The OL-tree • The VOL-tree • Performance
nn_buffer of O4. • Any location within the nn_buffer is a closer site if built. • nn_buffer is a diamond. nn_buffer of an Object O2:3 O3:5 O4:6 O1:4 S2 S1
Any location here is an optimal location! • Find a location with maximum overlap among objects’ nn_buffer. Problem Transformation O2:3 Q O3:5 O4:6 O1:4 S2 S1
o 45 • Rotate the coordinate 45°. • All nn_buffers become axis-parallel squares. • Focus on the rotated coordinate. The Rotated Coodinate Y X' o y x' Y' y' x X
Content • Problem Definition • Straightforward Solution • Problem Transformation • The R-tree-based Solution • The OL-tree • The VOL-tree • Performance
Store the objects in an R-tree. • Retrieve the objects whose nn_buffers intersect Q. • Plane sweep to find a region which has maximum overlap. The R-tree-based Solution
Object retrieval: • Store point objects, • but retrieve nn_buffersin increasing order of lower X. • Plane sweep: • Straightforwardly: O(n2). • Our method: O(n log n). Two Contributions
Keep a heap of index entries + objects. • Sorted in increasing order of nn_buffer’s lower X. t t • While heap is not empty, pop an entry. • If pop an object, send it to plane sweep. • If pop an index entry, push its children (intersecting Q). Best-first Retrieval
0 5 12 7 3 0 2 5 8 9 12 -∞ +∞ 4 Naïve Plane Sweep Y 12 O2:3 9 O1:4 8 5 O3:5 2 O4:6 X
0 5 12 7 3 0 2 5 8 9 12 -∞ +∞ +2 0 7 14 9 5 3 0 2 5 8 9 11 12 -∞ +∞ Suppose next insertion: add 2 to the Y-range [2,11]. Not Efficient! O(n2)
0 5 12 7 3 0 2 5 8 9 12 -∞ +∞ 0 0 0 5 9 -∞ +∞ The aSB-tree Extended from the SB-tree [YW01]: • keeps max overlap information at index entries. • handle a query range Q.
0 5 12 7 3 0 2 5 8 9 12 -∞ +∞ The aSB-tree Suppose next insertion: add 2 to the Y range [2,11]. +2 0 0 0 5 9 -∞ +∞
0 5 12 7 3 0 2 5 8 9 12 -∞ +∞ The aSB-tree Suppose next insertion: add 2 to the Y range [2,11]. 0 2 0 5 9 -∞ +∞ +2 +2
The aSB-tree Suppose next insertion: add 2 to the Y range [2,11]. 0 2 0 5 9 -∞ +∞ 0 7 12 7 5 3 0 2 5 8 9 11 12 -∞ +∞
Content • Problem Definition • Straightforward Solution • Problem Transformation • The R-tree-based Solution • The OL-tree • The VOL-tree • Performance
Idea: partition the space, and keep max overlapped region for each partition! • Like a k-d-B-tree. • Stores nn_buffers. 1 3 2 4 • An nn_buffer may have multiple copies. The OL-tree 1: add to fullcover. 2,3,4: recursively insert.
Index entry has, besides range: • fullcover: total weight of nn_buffers fully covering the whole area; • localmax: among the nn_buffers inserted into the sub-tree, max overlap. • maxrange: the region where localmax occurred. • Leaf entry: • A rectangle and its weight. Stored Information
r ( , 0, 9) root r ( , 2, 7) 3 r ( , 0, 4) 1 r ( , 1, 4) 2 r ( , 1, 2) 33 r ( , 2, 3) r ( , 4, 3) 32 31 sub-trees omitted
maxrange: where localmax occurred fullcover: 2 nn_buffers fully cover r3 r ( , 0, 9) root r localmax: Among those inserted, max overlap is 7 ( , 2, 7) 3 r ( , 0, 4) 1 r ( , 1, 4) 2 r ( , 1, 2) 33 r ( , 2, 3) r ( , 4, 3) 32 31 sub-trees omitted
Query Processing • Start with root, insert index entries into heap. • Sorting key: upper bound of real max overlap in the sub-tree. • localmax + fullcovers of ancestor entries. • Accurate if Q intersects with maxrange.
r ( , 0, 9) root Real max overlap = 0+2+1 +localmax = 5 r ( , 2, 7) 3 r ( , 0, 4) 1 r ( , 1, 4) 2 r ( , 1, 2) 33 localmax r ( , 2, 3) r ( , 4, 3) 32 31 sub-trees omitted
Query Processing • Start with root, insert index entries into heap. • Sorting key: upper bound of real max overlap in the sub-tree. • localmax + fullcovers of ancestor entries. • Accurate if Q intersects with maxrange. • Keep a running value: max overlap M. • Pruning 1: Q intersects with maxrange. • Pruning 2: upper bound of max overlap < M.
r ( , 0, 9) • r2 is pruned since Q intersects r2.maxrange. M = 0+1+4=5. Q root r ( , 2, 7) 3 r ( , 0, 4) 1 • r1 is pruned since the upper bound of overlap • = 4 < M. r ( , 1, 4) 2 r ( , 1, 2) 33 r ( , 2, 3) r ( , 4, 3) 32 31 sub-trees omitted
r ( , 0, 9) root Sometimes, we need to examine a leaf node. Plane sweep it! r ( , 2, 7) 3 r ( , 0, 4) 1 r ( , 1, 4) 2 r ( , 1, 2) 33 r ( , 2, 3) r ( , 4, 3) 32 31 sub-trees omitted
OL-tree is not practical • worst-case space complexity O(n2) • complex re-organization • How to improve? • Only keep a few top levels of the OL-tree. ==> Virtual OL-tree! OL-tree VOL-tree
Comparison with R-tree Approach • The R-tree approach examines all nn_buffers intersecting with Q. • By using a small, in-memory VOL-tree, the new approach can prune the search space.
To insert an nn_buffer here, recompute! Challenge • With dynamic updates, to keep localmax and maxrange is expensive.
Index entry (range, fullcover, maxrange, localmax) lowermax, uppermax • lowermax ≤ localmax ≤ uppermax Solution
Index entry (range, fullcover, maxrange, localmax) lowermax, uppermax • lowermax ≤ localmax ≤ uppermax • Any location in maxrange has overlap = lowermax. • At a location outside maxrange, the overlap can be more than lowermax, but < uppermax. Solution
Case 1: increase uppermax. Case 2: increase uppermax and lowermax. Update • Case 1: the new nn_buffer does not intersect with maxrange. • Case 2: intersects.
Query • Similar to the OL-tree. • To compute upper bound of max overlap, use uppermax. • When Q intersects maxrange, may or may not prune.
Content • Problem Definition • Straightforward Solution • Problem Transformation • The R-tree-based Solution • The OL-tree • The VOL-tree • Performance
Setup • Digital Chart from the R-tree Portal. • O: 24,493 populated places. • S: 9,203 cultural landmarks. • Pagesize: 1KB. Buffersize: 256 pages. • Object R-tree: 753 pages. • Pentium IV Dell PC, 3.2GHz. • Java. • Measure total I/O of 100 random queries.