210 likes | 221 Views
This research paper presents algorithms to find top-k accessible sites with the best accessibilities to amenities such as restaurants, bus stops, and zoos. The proposed algorithms outperform the baseline algorithm and reduce I/O costs. Real datasets and experiments are used to validate the effectiveness of the algorithms.
E N D
Finding the Sites with Best Accessibilities to Amenities Qianlu Lin, Chuan Xiao, Muhammad Aamir Cheema and Wei Wang University of New South Wales, Australia
Application Apartment • Find an apartment that is closest to restaurant, bus stop and zoo • ‘Closeness’ is measured by a monotonic scoring function Restaurant Bus Stop Zoo 2
Problem Definition • Given a set of query points S = {s1, s2, … sm} • Given n sets of data points T1, T2, … Tn • Find k query points in S, whose aggregated distances to T1, T2, … Tnare smallest: Distance(sj, {T1, T2, … Tn}) = f(d(sj, NN(sj, T1)), d(sj, NN(sj, T2)), … d(sj, NN(sj, Tn))) where NN(sj, Ti) is the nearest neighbour of sj in Ti d(sj, NN(sj, Ti) is the distance from sj to its nearest neighbour in Ti * For simplicity, we use: • d(x, y) is Euclidean Distance • f(x1, x2, ...xm) =sum(x1, x2, …, xm) 3
Related Literature • KNN – K Nearest Neighbour • Given a query point q and a set of data points I, find k data points in I that are nearest neighbour of q • RNN – Reverse Nearest Neighbour • Given a query point q and a set of data points I, find k data points of which q is the nearest neighbour • ANN – All Nearest Neighbour • Given a set of query points Q and a set of data points I, find nearest neighbour in I for each query point in Q • (Y.Chen, ICDE2007) Efficient evaluation of all-nearest-neighbor queries • In solving our problem, we can retrieve ANN in each type and find top k queries 4
Our Contribution • We introduced the problem of finding the sites with best accessibilities to amenities • We proposed two algorithms to find top-k accessible sites among a set of possible locations • We performed experiments on several real datasets 5
Apartment Restaurant Bus Stop Zoo Baseline ANN is used to retrieve the nearest neighbour of each query for each type. 6
Baseline - Disadvantage • I/O time • Query data will be accessed n times, n is the number of types of index objects • Memory usage • Need find NN for all the query points • Need to maintain a list of nearest neighbours of each type of each query 7
Apartment Restaurant Bus Stop Zoo Separate Tree (Index Construction) Index Tree Q2 Q1 B1 Z1 R1 B3 B4 B2 R3 R4 R2 R2 B3 R1 R1 B1 R3 Query Tree Q1 B2 Q2 Q3 Q4 Q4 Q3 B4 Z1 R4 8
Apartment Restaurant Bus Stop Zoo Separate Tree (Query Processing) Q2 Q1 Z1 R1 B1 Q1 current_k_best = 644 Q1 Z1 R1 B1 MIND={30, 0, 0} LBD=30 R1 R1 B1 UBD=644 MAXD={30, 305, 309} • MIND Minimum distance from Q1 to all the nodes in the list Q4 • MAXD Maximum distance from Q1 to all the nodes in the list Q3 Z1 • LBD Lower bound of the summed distance • UBD Upper bound of the summed distance 9
Apartment Restaurant Bus Stop Zoo Separate Tree (cont’d) B1 Z1 R1 Q2 Q1 B3 B4 B2 R3 R4 R2 Q1 Q2 Q3 Q4 current_k_best = 190 R2 R3 Q3 Z1 R4 B2 B3 R1 R1 B1 R3 MIND={30, 0, 0} LBD=30 UBD=190 B2 MAXD={30, 100, 60} Q4 B4 Q3 Q4 Z1 R4 B3 B4 Z1 R4 MIND={300, 60, 30} LBD=360 UBD=510 MAXD={300, 150, 60} 10
More Improvement? • Data points from different type can be put into one bounding box • To reduce I/O cost 11
Apartment Restaurant I7 I8 I13 I14 I15 I16 I9 I10 I18 Bus Stop Zoo I13 I14 I8 I7 I15 I16 One Tree (Index Construction) Index Tree I1 Q2 Q1 I2 I3 I4 I5 I6 I17 I11 I12 Each node has a bitmap that indicates what types are contained in the node I1 I17 I18 I10 I4 I9 I6 I2 Query Tree I5 Q4 Q3 Q1 I12 I3 Q2 Q3 Q4 I11 12
Apartment Restaurant Bus Stop Zoo One Tree (Query Processing) I1 Q1 Q1 current_k_best = 972 Q1 I1 MIND={0, 0, 0} MAXD={309, 309, 309} LBD=0 I1 UBD=309*3=927 13
Apartment Restaurant Bus Stop Zoo One Tree (cont’d) I1 Q1 Q2 I2 I3 I4 I5 I6 Q1 Q2 Q3 Q4 current_k_best = 130 I1 I4 Q3 I5 I4 I6 I2 MIND={0, 0, 30} MAXD={50, 50, 30} LBD=30 UBD=130 I5 Q4 Q3 Q4 I6 I5 I3 MIND={30, 30, 140} MAXD={50, 50, 140} LBD=100 UBD=240 14
Experiments • DataSet: • San Francisco Road Network (SF) & Road Network of North America (NA) • Spatial query dataset, 2 dimensions • Index: ~174k points (totally) • Query: ~17k points • Algorithm: • Baseline • Separate Tree • One Tree • Measurement: • CPU time • Number of leaf nodes access (I/O time) 15
Conclusion • We proposed two algorithms: • Separate tree: creates indexes for different types of points in separate R-trees • One tree: indexes all the points in a single R-tree • Both algorithms outperform the baseline algorithm with a speed-up up to 5.7 times • Also, both algorithms only need access the Query tree once, which reduces I/O cost on accessing Query tree 20
Thank you! Questions? 21