560 likes | 728 Views
Progressive Computation of The Min-Dist Optimal-Location Query. Donghui Zhang , Yang Du, Tian Xia, Yufei Tao* Northeastern University * Chinese University of Hong Kong. VLDB ’ 06, Seoul, Korea. Motivation.
E N D
Progressive Computation of The Min-Dist Optimal-Location Query Donghui Zhang, Yang Du, Tian Xia, Yufei Tao* Northeastern University * Chinese University of Hong Kong VLDB’06, Seoul, Korea
Motivation • “What is the optimal location in Boston area to build a new McDonald’s store?” • Suppose a customer drives to the closest McDonald’s. • Optimality: Minimize AVG driving distance. Optimal Location Query
Who will be interested? • Corporations • Chained restaurants (e.g. McDonald’s, Burger King, Starbucks) • Supermarkets (e.g. Wal-Mart, Costco, Stop & Shop) • Location-based service providers (e.g. Verizon, AT&T) • Computer Scientists especially in • Databases • Computational Geometry • Algorithms Optimal Location Query
min-dist OL 600 200 200 600 • Without any new site: AD = (200+200+600+600)/4 = 400. Optimal Location Query
min-dist OL 600 30 l1 30 600 • Without any new site: AD = (200+200+600+600)/4 = 400. • With new site l1: AD(l1) = (30+30+600+600)/4 = 315. Optimal Location Query
min-dist OL 200 30 l2 30 200 • Without any new site: AD = (200+200+600+600)/4 = 400. • With new site l1: AD(l1) = (30+30+600+600)/4 = 315. • With new site l2 : AD(l2) = (200+200+30+30)/4 = 115. Optimal Location Query
distance between o and its nearest site Formal Definition • Given a set S of sites, a set O of objects, and a query range Q , • min-dist OL is a location lQ which minimizes “Solution”: compute all AD(l). But… Optimal Location Query
Challenging • There are infinite number of locations in Q! How to produce a finite set of candidates (yet keeping optimality)? • How to avoid computing AD(l) for all candidates? Optimal Location Query
Solution Highlights • Algorithm to compute AD(l). • Theorems to limit #candidates. • Lower-bound of AD(l) for all locations l in a cell C. • Progressive algorithm. Optimal Location Query
L1 Distance • d(o, s) = |o.x– s.x|+|o.y– s.y| Optimal Location Query
Define l 1. Compute AD(l) • Remember • Let RNN(l) be the objects “attracted” by l. • AD(l)=AD if RNN(l)= RNN(l)= AD=AD(l) Optimal Location Query
RNN(l)={o7, o8} AD(l) < AD 1. Compute AD(l) • Remember • Define • Let RNN(l) be the objects “attracted” by l. • AD(l)=AD if RNN(l)= l Optimal Location Query
Average savings for customers in RNN(l) 1. Compute AD(l) • Remember • Define • Let RNN(l) be the objects “attracted” by l. • AD(l)=AD if RNN(l)= • AD(l)=AD - ? Optimal Location Query
1. Compute AD(l) • Theorem • S and O are “static” versus l. • AD can be pre-computed. • So is dNN(o, S) • To compute AD(l): • Find RNN(l) • oRNN(l), compute d(o, l) Optimal Location Query
How to compute RNN(l)? • This is an implementation detail, dealing with computational geometry and spatial databases. • Naïve solution: o O , compare with all sites and l. • More efficient: • Compute Voronoi cell of l. • Retrieve objects inside the Voronoi cell using a range search on R-tree. Optimal Location Query
l How to compute RNN(l)?(1) Compute Voronoi cell • Remember: RNN(l) is the set of objects close to l than to any existing site in S. • Consider all sites. Draw a spatial region close to l than to any site. Optimal Location Query
How to compute RNN(l)?(2) Retrieve objects • Standard range search. • Any spatial access methods, e.g. R-tree. Optimal Location Query
y axis 10 m g h l 8 k f e 6 i j d 4 b a 2 c x axis 10 0 8 2 4 6 Range query: find the objects in a given range. E.g. find all hotels in Boston. No index: scan through all objects. NOT EFFICIENT! Optimal Location Query
y axis 10 m g h l 8 k f e E 2 6 i j E d 1 4 b a 2 c x axis 10 0 8 2 4 6 Root E E 1 2 E E E E E E 1 E 3 4 5 6 7 2 e a c d g b f j m i l h k E E E E E 4 5 3 6 7 Optimal Location Query
y axis 10 m g h l 8 k f e E 2 6 i j E d 1 4 b a 2 c x axis 10 0 8 2 4 6 Root E E 1 2 E E E E E E 1 E 3 4 5 6 7 2 e a c d g b f j m i l h k E E E E E 4 5 3 6 7 Optimal Location Query
y axis 10 m g h l 8 k f e E 2 6 i j E d 1 4 b a 2 c x axis 10 0 8 2 4 6 Root E E 1 2 E E E E E E 1 E 3 4 5 6 7 2 e a c d g b f j m i l h k E E E E E 4 5 3 6 7 Optimal Location Query
2. Limit #candidates • Theorem: within the X/Y range of Q, draw grid lines crossing objects. Only need to consider intersections! Q Optimal Location Query
2. Limit #candidates • Theorem: within the X/Y range of Q, draw grid lines crossing objects. Only need to consider intersections! Q Optimal Location Query 5x6=30 candidates
δ l 2. Limit #candidates • Proof idea: suppose the OL is not, move it will produce a better (or equal) result. • Consider RNN(l). • Move to the right saves total dist. Optimal Location Query
2. VCU(Q) • A spatial region, enclosing the objects closer to Q than to sites in S. • It’s the Voronoi cell of Q versus sites in S. Q Optimal Location Query
5x6=30 candidates 2. Further Limit #candidates • Only consider objects in VCU(Q). Optimal Location Query
5x6=30 candidates 2. Further Limit #candidates • Only consider objects in VCU(Q). Optimal Location Query
4x4=16 candidates 2. Further Limit #candidates • Only consider objects in VCU(Q). Optimal Location Query
Naïve Algorithm • Derive candidates. • Compute AD(l) for each. • Pick smallest. • Not efficient! Too many candidates! To compute AD(l) for each one, need: • compute RNN(l) • retrieve all these objects… Optimal Location Query
Progressive Idea • Treat Q as a cell and consider its corners. Optimal Location Query
Progressive Idea • Divide the cell. Optimal Location Query
Progressive Idea • Divide the cell. Optimal Location Query
Progressive Idea • Recursively divide a sub-cell. Optimal Location Query
Progressive Idea • Recursively divide a sub-cell. • Able to check all candidates. Optimal Location Query
AD(lo) =50 C Progressive Idea • Q: What do you save? • A: Cell pruning, if its lower bound AD(l0) of some candidate l0. Suppose 60 is a lower bound for AD(l), l Optimal Location Query
3. LB(C): lower bound for AD(l), lC AD(c1)=1000 AD(c2)=3000 c AD(c3)=4000 AD(c4)=2500 Optimal Location Query
3. LB(C): lower bound for AD(l), lC • Theorem: AD(c1)=1000 AD(c2)=3000 c AD(c3)=4000 AD(c4)=2500 is a lower bound, where p is perimeter. • e.g. LB(C)=3500-p/4 Optimal Location Query
3. LB(C): lower bound for AD(l), lC • A better lower bound Theorem: • Comparing with the previous lower bound: • Higher quality since the lower bound is larger. • More computation. Optimal Location Query
4. The Progressive Algorithm • Maintain a heap of cells ordered by LB(). Initially one cell: Q. • Maintain the best candidate lopt • Pick the cell with minimum LB() and partition it. • Compute AD() for the corners of sub-cells. • Compute LB() for the sub-cells. • Insert sub-cell ci to heap if LB(ci)<AD(lopt) • Goto 3. Optimal Location Query
AD(best corner of Q) AD( real OL ) is inside the interval LB(Q) Time Progressiveness • The algorithm quickly reports a candidate OL with a confidence interval, and keeps refining. Optimal Location Query
AD( real OL ) is inside the interval Progressiveness • The algorithm quickly reports a candidate OL with a confidence interval, and keeps refining. AD(best candidate) LB(Q) Time Optimal Location Query
AD( real OL ) is inside the interval Progressiveness • The algorithm quickly reports a candidate OL with a confidence interval, and keeps refining. AD(best candidate) Min{ LB(C) | C in heap } Time • User may choose to terminate any time. Optimal Location Query
Batch Partitioning • To partition a cell, should partition into multiple sub-cells. • Reason: to compute AD(l), need to access the R*-tree of objects. When access the R*-tree, want to compute multiple AD(l). • Tradeoff: if partition too much: wasteful! Since some candidates could be pruned. Optimal Location Query
Performance Setup • O: 123,593 postal addresses in Northeastern part of US. Stored using an R*-tree. • S: randomly select 100 sites from O. • Buffer: 128 pages. • Dell Pentium IV 3.2GHz. • Query size: 1% in each dimension. Optimal Location Query
review slide 4x4=16 candidates 2. Further Limit #candidates • Only consider objects in VCU(Q). Optimal Location Query
Effect of VCU Computation Optimal Location Query
review slide 3. LB(C): lower bound for AD(l), lC • Theorem: AD(c1)=1000 AD(c2)=3000 c AD(c3)=4000 AD(c4)=2500 is a lower bound, where p is perimeter. • e.g. LB(C)=3500-p/4 Optimal Location Query