1 / 25

Retrieving k-Nearest Neighboring Trajectories by a Set of Point Locations

Retrieving k-Nearest Neighboring Trajectories by a Set of Point Locations. Lu-An Tang , Yu Zheng , Xing Xie, Jing Yuan, Xiao Yu, Jiawei Han. University of Illinois at Urbana-Champaign Microsoft Research Asia. Motivation: trajectory query by locations. Huge volume of spatial trajectories

ghita
Download Presentation

Retrieving k-Nearest Neighboring Trajectories by a Set of Point Locations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Retrieving k-Nearest Neighboring Trajectories by a Set of Point Locations Lu-An Tang, Yu Zheng, Xing Xie, Jing Yuan, Xiao Yu, Jiawei Han • University of Illinois at Urbana-Champaign Microsoft Research Asia

  2. Motivation: trajectory query by locations • Huge volume of spatial trajectories • Require to search trajectories by a set of point locations

  3. k-Nearest Neighboring trajectory query • The trajectories may not exactly pass those locations • Query the top k trajectories with the minimum aggregated distance to the given locations q3 q2 q1

  4. k-NNT query • Task Definition: Given the trajectory dataset D,anda set of query points, Q, the k-NNT query retrieves k trajectories K from D, K = {R1, R2, …, Rk} that for ∀ Ri ∈ K, ∀ Rj ∈ D - K, dist(Ri,Q) ≤ dist(Rj,Q). • Challenges • Huge trajectory dataset: High I/O cost to scan all the trajectories • Aggregated distance computation • Non-uniform distribution: • the trajectories are sparse/dense in different regions • the user-given query locations may be far from all the trajectories

  5. The aggregate distance in k-NNT query 1. Find out the closest point from a trajectory to each query point (i.e., shortest matching pairs) 3. Sum up the lengths of all matching pairs dist(R2, q1)= dist(p2,3, q1)= 30 m dist(R1, q1)= dist(p1,2, q1)= 20 m dist(R2, q2)= dist(p2,4, q2)= 5 m dist(R1, q2)= dist(p1,3, q2)= 50 m dist(R2, q3)= dist(p2,6, q3)= 40 m dist(R1, q3)= dist(p1,5, q3)= 15 m dist(R2, Q)=∑ dist(R2, qi)= 75 m dist(R1, Q)=∑ dist(R1, qi)= 85 m

  6. Related Work: k-BCT query • k-Best Connected Trajectory (k-BCT) query [SIGMOD2010] the similarity function between a trajectory R and query locations Q is • Problem: This function changes over units (inconsistent) • An example If query Q has two points q1 and q2; dist(R1, q1) = dist(R1, q2) = 2.4km = 1.48 miles, dist(R2, q1) = 1.5 km =0.93 miles, dist(R2, q2) = 5km = 3.1 miles Use unit “mile”, Sim(R1, Q) = 0.45 > Sim(R2, Q) = 0.43 Use unit “km”, Sim(R1, Q) = 0.18 < Sim(R2, Q) = 0.22

  7. Advantages of k-NNT over k-BCT • The distance function of k-BCT changes over units (inconsistent) • The distance function of k-BCT is sensitive to a query • k-BCT • k-BCT&k-NNT q3 • k-NNT q2 q1

  8. Query framework: candidate-generation-and-verification • Candidate generation • Best-first search based individual heaps • Coordination by a global heap • Candidate verification • Lower-bound estimation • Efficient pruning with the global heap • Qualifier expectation-based method

  9. Candidate Generation • Given a query Q = {q1, q2, …, qm}, generate a trajectory candidate set including all the k-NNTs (i.e., complete set) • Step 1: searching k-NN points using best-first-based individual heap • Step 2: generating the candidate trajectories by the global heap

  10. Step 2: generating candidate trajectories • Global heap • A minimum heap sorting matching pairs by the distance • Retrieves new matching pair from individual heaps • Pops the matching pairs to the candidate set

  11. Example: Search based on the global heap q1 Candidate Set Global Heap q2 h1 h2 h3 q3 …… …… …… <p1,2, q1> <p1,4, q2> <p1,6, q3> Individual Heaps

  12. Example: Search based on the global heap R1: (Partial Match) q1 Candidate Set Global Heap q2 <p1,4, q2> <p1,6, q3> <p1,2, q1> h1 h2 h3 q3 …… …… …… <p5,5, q2> Individual Heaps

  13. Example: Search based on the global heap <p1,4, q2> R1: (Partial Match) q1 Candidate Set Global Heap q2 <p1,6, q3> <p1,2, q1> <p5,5, q2> h1 h2 h3 q3 …… …… …… <p4,5, q3> Individual Heaps

  14. Example: Search based on the global heap <p1,4, q2> <p1,6, q3> R1: (Partial Match) R5: (Partial Match) q1 Candidate Set Global Heap q2 <p1,2, q1> <p5,5, q2> <p4,5, q3> h1 h2 h3 q3 …… …… …… <p4,4, q2> Individual Heaps

  15. Example: Search based on the global heap • Advantages • guarantee including allk-NNTs in candidate set • generate compact candidate sets R1: <p1,2, q1>, <p1,4, q2>, <p1,6, q3>. (Full Match) R4: <p4,5, q3>.(Partial Match) R5: <p5,5, q2>. (Partial Match) q1 Candidate Set Global Heap q2 <p1,2, q1>, <p4,4, q2>,<p1,5, q3> h1 h2 h3 q3 …… …… …… Stop critiria: when there isk full-matching candidates – Property 1: The candidate set is complete if G has popped out k full-matching candidates (In this example k=1) Individual Heaps

  16. Candidate verification R1: <p1,2, q1>, <p1,4, q2>, <p1,6, q3>. (Full Match) R4: <p4,5, q3>.(Partial Match) R5: <p5,5, q2>. (Partial Match) • The full-matching candidate may not be the final k-NNT • The system has to retrieve the partial-matching trajectories (R4 and R5) to compute their aggregate distance (I/O cost) • Question: can we compute a lower-bound for R4 and R5 without retrieving their details? • If LB(R4/5) > dist(R1,Q), we can prune it directly Candidate Set

  17. Candidate verification • The lower-bound of a partial-matching trajectory is • If the LB(R) is larger than the distance of full-matching candidate, R can be pruned directly R1: <p1,2, q1> <p1,4, q2> <p1,6, q3> dist(R1) = 95 R4: <p4,5, q3> R5: <p5,5, q2> Candidate Set LB(R4) =114 (pruned) LB(R5) =90 (passed) Global Heap <p1,2, q1> <p4,4, q2> <p1,5, q3> <p1,2, q1> <p1,2, q1> <p4,4, q2> <p4,4, q2> <p1,5, q3> <p1,5, q3>

  18. Problem of Outlier Query Location • A query location is an outlier if it is far from all the trajectories • Too many partial-matching candidates will be generated before finding a full-matching candidates

  19. Qualifier expectation based method • The system can make up the missing pairs of a partial-matching trajectory by retrieving all its points • Two key issues: • Guarantee the completeness of candidate set Property 2: If there are k made-up candidates (qualifier) with distance smaller than the sum of the pairs in global heap, the candidate set is complete • Which candidate should be selected to make up? The qualifier expectation measure

  20. Example of Qualifier Expectation dist(R1) =160m < sum(G), R1 is a qualifier R1: <p1,1, q1>, <p1,4, q2>, <p1,7, q3>. R1: 40m. R2: 30m. R4: 15m. Qualifier Expectation R1: <p1,1, q1>, <p1,4, q2>, . R2: <p2,1, q1>, <p2,5, q2>, . R4: ,<p4,4, q2>, . Candidate Set Global Heap, total dist sum(G) = 200m <p2,1, q1>, <p4,4, q2>,<p1,7, q3>

  21. Experiment Setup • Real Dataset: collected from the Microsoft GeoLifeandT-Driveprojects , with over 20,000 real trajectories • Synthetic datasets with both uniform distribution and biased distribution • Random generated query Q • The proposed methods are compared with Fagin’s Algorithm (FA) and Threshold Algorithm (TA) (used in k-BCT) GeoLife

  22. Evaluations on synthetic dataset (biased distribution) • GH (global heap) is faster than baselines with less I/O costs • QE( global heap+ qualifier expectation ) is an order of magnitude faster than others

  23. Evaluations on real dataset • When |Q| is small, the probability of outlier location is low, GH achieves the best performance • When |Q| is larger, the probability of outlier location is high, QE is more efficient

  24. Conclusion • k-Nearest Neighboring Trajectory (k-NNT) query • retrieve trajectories by a set of locations • Candidate-generation-and-verification framework • Generate candidate trajectories with global heap • Efficient lower-bound computation • Outlier query location: qualifier expectation-based method

  25. Released Datasets: T-Drive taxi trajectories GeoLifeGPS trajectories Thanks! Yu Zheng yuzheng@microsoft.com

More Related