1 / 43

On Computing Top- t Most Influential Spatial Sites

On Computing Top- t Most Influential Spatial Sites. Tian Xia , Donghui Zhang, Evangelos Kanoulas, Yang Du Northeastern University Boston, USA. Outline. Problem Definition Related Work The New Metric: minExistDNN Data Structures and Algorithm Experimental Results Conclusions.

Download Presentation

On Computing Top- t Most Influential Spatial Sites

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On Computing Top-t Most Influential Spatial Sites Tian Xia, Donghui Zhang, Evangelos Kanoulas, Yang Du Northeastern University Boston, USA VLDB 2005, Trondheim, Norway

  2. Outline • Problem Definition • Related Work • The New Metric: minExistDNN • Data Structures and Algorithm • Experimental Results • Conclusions VLDB 2005, Trondheim, Norway

  3. Problem Definition • Given: • a set of sites S • a set of weighted objects O • a spatial region Q • an integer t. • Top-t most influential sites query: • find t sites in Q with the largest influences. • influence of a site s = total weight of objects that consider s as the nearest site. VLDB 2005, Trondheim, Norway

  4. Motivation • Which supermarket in Boston is the most influential among residential buildings? • Sites: supermarkets; • Objects: residential buildings; • Weight: # people in a building; • Query region: Boston; • Which wireless station in Boston is the most influential among mobile users? VLDB 2005, Trondheim, Norway

  5. o2 o4 s2 s3 o5 o1 s4 s1 o3 o6 Example • Suppose all objects have weight = 1, Q is the whole space, and t = 1. • The most influential site is s1, with influence = 3. VLDB 2005, Trondheim, Norway

  6. Example • Now that Q is the shadowed rectangle and t = 2. • Top-2 most influential sites: s4 and s2. o2 o4 s2 s3 o5 o1 s4 s1 o3 o6 VLDB 2005, Trondheim, Norway

  7. Outline • Problem Definition • Related Work • The New Metric: minExistDNN • Data Structures and Algorithm • Experimental Results • Conclusions VLDB 2005, Trondheim, Norway

  8. o2 o4 s2 s3 o5 o1 s4 s1 o3 o6 Related Work • Bi-chromatic RNN query: considers two datasets, sites and objects. • The RNNs of a site s  S are the objects that consider s as the nearest site. VLDB 2005, Trondheim, Norway

  9. o2 o4 s2 s3 o5 o1 s4 s1 o3 o6 Related Work • Solutions to the RNN query based on pre-computation [KM00, YL01]. VLDB 2005, Trondheim, Norway

  10. Related Work • Solution to RNN query based on Voronoi diagram [SRAE01]. • Compute the Voronoi cell of s: a region enclosing the locations closer to s than to any other sites. • Querying the object R-tree using the Voronoi cell. VLDB 2005, Trondheim, Norway

  11. Related Work [SRAE01] o2 o4 s2 s3 o5 o1 s4 s1 o3 o6 VLDB 2005, Trondheim, Norway

  12. Our Problem vs. RNN Query • RNN query: • A single site as an input. • Interested in the actual set of the RNNs. • Top-t most influential sites query: • A spatial region as an input. • Interested in the aggregate weight of RNNs. VLDB 2005, Trondheim, Norway

  13. Straightforward Solution 1 • For each site, pre-compute its influence. • At query time, find the sites in Q and return the t sites with max influences. • Drawback 1: Costly maintenance upon updates. • Drawback 2: binding a set of sites closely with a set of objects. VLDB 2005, Trondheim, Norway

  14. Straightforward Solution 2 • An extension of the Voronoi diagram based solution to the RNN query. • Find all sites in Q. • For each such site, find its RNNs by using the Voronoi cell, and compute its influence. • Return the t sites with max influences. VLDB 2005, Trondheim, Norway

  15. Straightforward Solution 2 • Drawback 1: All sites in Q need to be retrieved from the leaf nodes. • Drawback 2: The object R-tree and the site R-tree are browsed multiple times. • For each site in Q, browse the site R-tree to compute the Voronoi Cell. • For each such Voronoi Cell, browse the object R-tree to compute the influence. VLDB 2005, Trondheim, Norway

  16. Features of Our Solution • Systematically browse both trees once. • Pruning techniques are provided based on a new metric, minExistDNN. • No need to compute the influences for all sites in Q, or even to locate all sites in Q. VLDB 2005, Trondheim, Norway

  17. Outline • Problem Definition • Related Work • The New Metric: minExistDNN • Data Structures and Algorithm • Experimental Results • Conclusions VLDB 2005, Trondheim, Norway

  18. O2 O1 S1 S2 O1 only affects S1, while O2 affects both S1 and S2. Motivation • Intuitively, if some object in Oi may consider some site in Sj as an NN, OiaffectsSj. • To estimate the influences of all sites in a site MBR Sj, we need to know whether an object MBR Oi will affectSj. VLDB 2005, Trondheim, Norway

  19. minDist(O1,S2)=8 S2 O1 S1 maxDist(O1,S1)=10 maxDist – A Loose Estimation • If maxDist(O1, S1) < minDist(O1, S2), O1 does not affect S2. • Why not good enough? VLDB 2005, Trondheim, Norway

  20. minDist(o1,S2) = 6 S2 o1 S1 minMaxDist(o1, S1) = 5 minMaxDist – A Tight Estimation? • An object o does not affect S2, if there exists S1 such that minMaxDist(o1, S1) < minDist(o1, S2) VLDB 2005, Trondheim, Norway

  21. minDist(O1,S2) = 6 s1 S2 O1 7 S1 6 s2 o1 minMaxDist(O1, S1) = 5 minMaxDist – A Tight Estimation? • Not true for an object MBR O1. VLDB 2005, Trondheim, Norway

  22. A Tight Estimation? • A metric m(O1, S1) should: • guarantee that, each location in O1 is within m(O1, S1) of a site in S1, • and be the smallest distance with this property. VLDB 2005, Trondheim, Norway

  23. New Metric – minExistDNNS1(O1) • Definition: minExistDNNS1(O1) = max {minMaxDist(l, S1) |  location l O1} • O1 does not affect S2, if there exists S1, s.t. minExistDNNS1(O1) < minDist(O1, S2). VLDB 2005, Trondheim, Norway

  24. O1 O1 S1 S1 Examples of minExistDNNS1(O1) • How to calculate it? VLDB 2005, Trondheim, Norway

  25. P1:b P2:c P3:a P4:d a c S1 b d P8:a P7:d P6:b P5:c Calculating minExistDNNS1(O1) • Step 1: Space partitioning Every location l in the same partition is associated with the second closest corner of S1 – the distance is minMaxDist(l, S1)! VLDB 2005, Trondheim, Norway

  26. P1:b P2:c O1 a c S1 b d Space Partitioning • O1 is divided into multiple sub-regions, one in each partition. VLDB 2005, Trondheim, Norway

  27. P1:b P2:c O1 minExistDNNS1(O1) a c S1 b d Calculating minExistDNNS1(O1) • Step 2: Choose up-to 8 locations on O1’ border and compute the minMaxDist’s to S1. • minExistDNN is the largest one! VLDB 2005, Trondheim, Norway

  28. Outline • Problem Definition • Related Work • The New Metric: minExistDNN • Data Structures and Algorithm • Experimental Results • Conclusions VLDB 2005, Trondheim, Norway

  29. Data Structure • Two R-trees: S of sites, O of objects. • Three queues: • queueSIN: entries of S inside Q. • queueSOUT: entries of S outside Q. • queueO: entries of O. VLDB 2005, Trondheim, Norway

  30. O3 S3 S4 O2 O1 Q S1 O4 S2 Data Structure • queueSIN: • queueO: • queueSOUT: S1 S2 O1 S3 VLDB 2005, Trondheim, Norway

  31. maxInfluence and minInfluence • For each entry Sj in queueSIN, • maxInfluence: total weight of entries in queueO that affect Sj. • minInfluence: total weight of entries in queueO that ONLY affect Sj, divided by the number of objects in Sj. • queueSIN is sorted in decreasing order of maxInfluence. VLDB 2005, Trondheim, Norway

  32. Algorithm Overview • Expand an entry from one of the three queues. • Remove the entry from the queue. • Retrieve the referenced node, and insert the (unpruned) entries into the same queue. • Update maxInfluence and minInfluence if necessary. • If top-t entries in queueSIN are sites, with minInfluences ≥ maxInfluences of all remaining entries, return. VLDB 2005, Trondheim, Norway

  33. S3 S8 O5 S9 O6 S1 S5 O1 S6 S7 Example • S6 is not affected by O1, prune S6. • O5 does not affect S5 and S7, prune O5. • queueSIN: S1 • queueO: O1 • queueSOUT: S3 • queueSIN: S5, S7 • queueO: O6 • queueSOUT: S9 Q VLDB 2005, Trondheim, Norway

  34. minExistDNNS3(O1)=4 minDist(S2, O1)=5 S3 S2 O1 S4 minExistDNNS1(O1)=7 A Pruning Case • S2 is pruned because of minExistDNNS3(O1) < minDist(S2, O1) S1 Expand S1 VLDB 2005, Trondheim, Norway

  35. Choosing an Entry to Expand • Expand top entries in queueSIN. • Expand the most important Oi. • Importance: |Oi| * #affected entries * area(Oi) • Expand Sj that contains the most important Oi. VLDB 2005, Trondheim, Norway

  36. Q Q S1 S1 minDist(S1, O1)=5 minDist(S1, O1)=5 O1 O1 minExistDNNS2(O1)=6 minExistDNNS2(O1)=6 S2 S’2 Choosing an Entry to Expand • Estimate the probability of pruning Oi using some Sj in queueSOUT. • After expanding S2, O1 is likely not to affect S1. VLDB 2005, Trondheim, Norway

  37. Outline • Problem Definition • Related Work • The New Metric: minExistDNN • Data Structures and Algorithm • Experimental Results • Conclusions VLDB 2005, Trondheim, Norway

  38. Experimental Setup • Data sets: • 24,493 populated places in North America • 9,203 cultural landmarks in North America • R-tree page size: 1 KB • LRU buffer: 128 disk pages. • t = 4. • Comparing to the solution using Voronoi diagram. VLDB 2005, Trondheim, Norway

  39. Selected Experimental Results #sites : #objects = 1 : 2.5 VLDB 2005, Trondheim, Norway

  40. Selected Experimental Results #sites : #objects = 2.5 : 1 VLDB 2005, Trondheim, Norway

  41. Outline • Problem Definition • Related Work • The New Metric: minExistDNN • Data Structures and Algorithm • Experimental Results • Conclusions VLDB 2005, Trondheim, Norway

  42. Conclusions • We addressed a new problem: Top-t most influential sites query. • We proposed a new metric: minExistDNN. It can be used to prune search space in NN/RNN related problems. • We carefully designed an algorithm which systematically browses both R-trees once. • Experiments showed more than an order of magnitude improvement. VLDB 2005, Trondheim, Norway

  43. Thank you! Q & A VLDB 2005, Trondheim, Norway

More Related