430 likes | 546 Views
On Computing Top- t Most Influential Spatial Sites. Tian Xia , Donghui Zhang, Evangelos Kanoulas, Yang Du Northeastern University Boston, USA. Outline. Problem Definition Related Work The New Metric: minExistDNN Data Structures and Algorithm Experimental Results Conclusions.
E N D
On Computing Top-t Most Influential Spatial Sites Tian Xia, Donghui Zhang, Evangelos Kanoulas, Yang Du Northeastern University Boston, USA VLDB 2005, Trondheim, Norway
Outline • Problem Definition • Related Work • The New Metric: minExistDNN • Data Structures and Algorithm • Experimental Results • Conclusions VLDB 2005, Trondheim, Norway
Problem Definition • Given: • a set of sites S • a set of weighted objects O • a spatial region Q • an integer t. • Top-t most influential sites query: • find t sites in Q with the largest influences. • influence of a site s = total weight of objects that consider s as the nearest site. VLDB 2005, Trondheim, Norway
Motivation • Which supermarket in Boston is the most influential among residential buildings? • Sites: supermarkets; • Objects: residential buildings; • Weight: # people in a building; • Query region: Boston; • Which wireless station in Boston is the most influential among mobile users? VLDB 2005, Trondheim, Norway
o2 o4 s2 s3 o5 o1 s4 s1 o3 o6 Example • Suppose all objects have weight = 1, Q is the whole space, and t = 1. • The most influential site is s1, with influence = 3. VLDB 2005, Trondheim, Norway
Example • Now that Q is the shadowed rectangle and t = 2. • Top-2 most influential sites: s4 and s2. o2 o4 s2 s3 o5 o1 s4 s1 o3 o6 VLDB 2005, Trondheim, Norway
Outline • Problem Definition • Related Work • The New Metric: minExistDNN • Data Structures and Algorithm • Experimental Results • Conclusions VLDB 2005, Trondheim, Norway
o2 o4 s2 s3 o5 o1 s4 s1 o3 o6 Related Work • Bi-chromatic RNN query: considers two datasets, sites and objects. • The RNNs of a site s S are the objects that consider s as the nearest site. VLDB 2005, Trondheim, Norway
o2 o4 s2 s3 o5 o1 s4 s1 o3 o6 Related Work • Solutions to the RNN query based on pre-computation [KM00, YL01]. VLDB 2005, Trondheim, Norway
Related Work • Solution to RNN query based on Voronoi diagram [SRAE01]. • Compute the Voronoi cell of s: a region enclosing the locations closer to s than to any other sites. • Querying the object R-tree using the Voronoi cell. VLDB 2005, Trondheim, Norway
Related Work [SRAE01] o2 o4 s2 s3 o5 o1 s4 s1 o3 o6 VLDB 2005, Trondheim, Norway
Our Problem vs. RNN Query • RNN query: • A single site as an input. • Interested in the actual set of the RNNs. • Top-t most influential sites query: • A spatial region as an input. • Interested in the aggregate weight of RNNs. VLDB 2005, Trondheim, Norway
Straightforward Solution 1 • For each site, pre-compute its influence. • At query time, find the sites in Q and return the t sites with max influences. • Drawback 1: Costly maintenance upon updates. • Drawback 2: binding a set of sites closely with a set of objects. VLDB 2005, Trondheim, Norway
Straightforward Solution 2 • An extension of the Voronoi diagram based solution to the RNN query. • Find all sites in Q. • For each such site, find its RNNs by using the Voronoi cell, and compute its influence. • Return the t sites with max influences. VLDB 2005, Trondheim, Norway
Straightforward Solution 2 • Drawback 1: All sites in Q need to be retrieved from the leaf nodes. • Drawback 2: The object R-tree and the site R-tree are browsed multiple times. • For each site in Q, browse the site R-tree to compute the Voronoi Cell. • For each such Voronoi Cell, browse the object R-tree to compute the influence. VLDB 2005, Trondheim, Norway
Features of Our Solution • Systematically browse both trees once. • Pruning techniques are provided based on a new metric, minExistDNN. • No need to compute the influences for all sites in Q, or even to locate all sites in Q. VLDB 2005, Trondheim, Norway
Outline • Problem Definition • Related Work • The New Metric: minExistDNN • Data Structures and Algorithm • Experimental Results • Conclusions VLDB 2005, Trondheim, Norway
O2 O1 S1 S2 O1 only affects S1, while O2 affects both S1 and S2. Motivation • Intuitively, if some object in Oi may consider some site in Sj as an NN, OiaffectsSj. • To estimate the influences of all sites in a site MBR Sj, we need to know whether an object MBR Oi will affectSj. VLDB 2005, Trondheim, Norway
minDist(O1,S2)=8 S2 O1 S1 maxDist(O1,S1)=10 maxDist – A Loose Estimation • If maxDist(O1, S1) < minDist(O1, S2), O1 does not affect S2. • Why not good enough? VLDB 2005, Trondheim, Norway
minDist(o1,S2) = 6 S2 o1 S1 minMaxDist(o1, S1) = 5 minMaxDist – A Tight Estimation? • An object o does not affect S2, if there exists S1 such that minMaxDist(o1, S1) < minDist(o1, S2) VLDB 2005, Trondheim, Norway
minDist(O1,S2) = 6 s1 S2 O1 7 S1 6 s2 o1 minMaxDist(O1, S1) = 5 minMaxDist – A Tight Estimation? • Not true for an object MBR O1. VLDB 2005, Trondheim, Norway
A Tight Estimation? • A metric m(O1, S1) should: • guarantee that, each location in O1 is within m(O1, S1) of a site in S1, • and be the smallest distance with this property. VLDB 2005, Trondheim, Norway
New Metric – minExistDNNS1(O1) • Definition: minExistDNNS1(O1) = max {minMaxDist(l, S1) | location l O1} • O1 does not affect S2, if there exists S1, s.t. minExistDNNS1(O1) < minDist(O1, S2). VLDB 2005, Trondheim, Norway
O1 O1 S1 S1 Examples of minExistDNNS1(O1) • How to calculate it? VLDB 2005, Trondheim, Norway
P1:b P2:c P3:a P4:d a c S1 b d P8:a P7:d P6:b P5:c Calculating minExistDNNS1(O1) • Step 1: Space partitioning Every location l in the same partition is associated with the second closest corner of S1 – the distance is minMaxDist(l, S1)! VLDB 2005, Trondheim, Norway
P1:b P2:c O1 a c S1 b d Space Partitioning • O1 is divided into multiple sub-regions, one in each partition. VLDB 2005, Trondheim, Norway
P1:b P2:c O1 minExistDNNS1(O1) a c S1 b d Calculating minExistDNNS1(O1) • Step 2: Choose up-to 8 locations on O1’ border and compute the minMaxDist’s to S1. • minExistDNN is the largest one! VLDB 2005, Trondheim, Norway
Outline • Problem Definition • Related Work • The New Metric: minExistDNN • Data Structures and Algorithm • Experimental Results • Conclusions VLDB 2005, Trondheim, Norway
Data Structure • Two R-trees: S of sites, O of objects. • Three queues: • queueSIN: entries of S inside Q. • queueSOUT: entries of S outside Q. • queueO: entries of O. VLDB 2005, Trondheim, Norway
O3 S3 S4 O2 O1 Q S1 O4 S2 Data Structure • queueSIN: • queueO: • queueSOUT: S1 S2 O1 S3 VLDB 2005, Trondheim, Norway
maxInfluence and minInfluence • For each entry Sj in queueSIN, • maxInfluence: total weight of entries in queueO that affect Sj. • minInfluence: total weight of entries in queueO that ONLY affect Sj, divided by the number of objects in Sj. • queueSIN is sorted in decreasing order of maxInfluence. VLDB 2005, Trondheim, Norway
Algorithm Overview • Expand an entry from one of the three queues. • Remove the entry from the queue. • Retrieve the referenced node, and insert the (unpruned) entries into the same queue. • Update maxInfluence and minInfluence if necessary. • If top-t entries in queueSIN are sites, with minInfluences ≥ maxInfluences of all remaining entries, return. VLDB 2005, Trondheim, Norway
S3 S8 O5 S9 O6 S1 S5 O1 S6 S7 Example • S6 is not affected by O1, prune S6. • O5 does not affect S5 and S7, prune O5. • queueSIN: S1 • queueO: O1 • queueSOUT: S3 • queueSIN: S5, S7 • queueO: O6 • queueSOUT: S9 Q VLDB 2005, Trondheim, Norway
minExistDNNS3(O1)=4 minDist(S2, O1)=5 S3 S2 O1 S4 minExistDNNS1(O1)=7 A Pruning Case • S2 is pruned because of minExistDNNS3(O1) < minDist(S2, O1) S1 Expand S1 VLDB 2005, Trondheim, Norway
Choosing an Entry to Expand • Expand top entries in queueSIN. • Expand the most important Oi. • Importance: |Oi| * #affected entries * area(Oi) • Expand Sj that contains the most important Oi. VLDB 2005, Trondheim, Norway
Q Q S1 S1 minDist(S1, O1)=5 minDist(S1, O1)=5 O1 O1 minExistDNNS2(O1)=6 minExistDNNS2(O1)=6 S2 S’2 Choosing an Entry to Expand • Estimate the probability of pruning Oi using some Sj in queueSOUT. • After expanding S2, O1 is likely not to affect S1. VLDB 2005, Trondheim, Norway
Outline • Problem Definition • Related Work • The New Metric: minExistDNN • Data Structures and Algorithm • Experimental Results • Conclusions VLDB 2005, Trondheim, Norway
Experimental Setup • Data sets: • 24,493 populated places in North America • 9,203 cultural landmarks in North America • R-tree page size: 1 KB • LRU buffer: 128 disk pages. • t = 4. • Comparing to the solution using Voronoi diagram. VLDB 2005, Trondheim, Norway
Selected Experimental Results #sites : #objects = 1 : 2.5 VLDB 2005, Trondheim, Norway
Selected Experimental Results #sites : #objects = 2.5 : 1 VLDB 2005, Trondheim, Norway
Outline • Problem Definition • Related Work • The New Metric: minExistDNN • Data Structures and Algorithm • Experimental Results • Conclusions VLDB 2005, Trondheim, Norway
Conclusions • We addressed a new problem: Top-t most influential sites query. • We proposed a new metric: minExistDNN. It can be used to prune search space in NN/RNN related problems. • We carefully designed an algorithm which systematically browses both R-trees once. • Experiments showed more than an order of magnitude improvement. VLDB 2005, Trondheim, Norway
Thank you! Q & A VLDB 2005, Trondheim, Norway