1 / 27

Reverse Spatial and Textual k Nearest Neighbor Search

Reverse Spatial and Textual k Nearest Neighbor Search. Outline. Motivation & Problem Statement Related Work RSTkNN Search Strategy Experiments Conclusion. 1. Motivation. If add a new shop at Q, which shops will be influenced? Influence facts Spatial Distance Results: D, F

Download Presentation

Reverse Spatial and Textual k Nearest Neighbor Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reverse Spatial and Textual k Nearest Neighbor Search

  2. Outline Motivation & Problem Statement Related Work RSTkNN Search Strategy Experiments Conclusion 1

  3. Motivation • If add a new shop at Q, which shops will be influenced? • Influence facts • Spatial Distance • Results: D, F • Textual Similarity • Services/Products... • Results: F, C clothes food clothes clothes sports food clothes 2

  4. Problems of finding Influential Sets Traditional query Reverse k nearest neighbor query (RkNN) Our new query Reverse spatial and textual k nearest neighbor query (RSTkNN) 3

  5. Problem Statement • Spatial-Textual Similarity • describe the similarity between such objects based on both spatial proximity and textual similarity. • Spatial-Textual Similarity Function 4

  6. Problem Statement (con’t) • RSTkNN query • is finding objects which have the query object as one of their k spatial-textual similar objects. 5

  7. Outline Motivation & Problem Statement Related Work RSTkNN Search Strategy Experiments Conclusion 6

  8. Related Work • Pre-computing the kNN for each object • (Korn ect, SIGMOD2000, Yang ect, ICDE2001) • (Hyper) Voronio cell/planes pruning strategy • (Tao ect, VLDB2004, Wu ect, PVLDB2008, Kriegel ect, ICDE2009) • 60-degree-pruning method • (Stanoi ect, SIGMOD2000) • Branch and Bound (based on Lp-norm metric space) • (Achtert ect, SIGMOD2006, Achtert ect, EDBT2009) Challenging Features: • Lose Euclidean geometric properties. • High dimension in text space. • k and α are different from query to query. 7

  9. Baseline method For each object o in the database Precompute Threshold Algorithm Object o q is no more similar than o’ Spatial NNs Textual NNs Spatial-textual kNN o’ Give query q, k & α q is more similar than o’ Inefficient since lacking a novel data structure 8

  10. Outline Motivation & Problem Statement Related Work RSTkNN Search Strategy Experiments Conclusion 9

  11. Intersection and Union R-tree (IUR-tree) 10

  12. Main idea of Search Strategy Prune an entry E in IUR-Tree, when query q is no more similar than kNNL(E). Report an entry E to be results, when query q is more similar than kNNU(E). 11

  13. How to Compute the Bounds Similarity approximations MinST(E, E’): TightMinST(E, E’): MaxST(E, E’): 12

  14. Example for Computing Bounds Current traveled entries: N1, N2, N3 Given k=2, to compute kNNL(N1) andkNNU(N1). effect N1 N3 N1 N2 Compute kNNU(N1) Compute kNNL(N1) TightMinST(N1, N3) = 0.564 MinST(N1, N3) = 0.370 TightMinST(N1, N2) = 0.179 MinST(N1, N2) = 0.095 MaxST(N1, N3) = 0.432 MaxST(N1, N2) = 0.150 decrease decrease kNNU(N1) = 0.432 kNNL(N1) = 0.370 13

  15. Overview of Search Algorithm • RSTkNN Algorithm: • Travel from the IUR-tree root • Progressively update lower and upper bounds • Apply search strategy: • prune unrelated entries to Pruned; • report entries to be results Ans; • add candidate objects to Cnd. • FinalVerification • For objects in Cnd, check whether to results or not by updating the bounds for candidates using expanding entries in Pruned. 14

  16. Example: Execution of the RSTkNN Algorithm on IUR-tree, given k=2, alpha=0.6 N4 N1 N2 N3 p5 p3 p1 p2 p4 Initialize N4.CLs; EnQueue(U, N4); U N4, (0, 0) 15

  17. Mutual-effect N2 N1 N3 N1 N3 N2 Example: Execution of the RSTkNN Algorithm on IUR-tree, given k=2, alpha=0.6 N4 N1 N2 N3 p5 p3 p1 p2 p4 DeQueue(U, N4) EnQueue(U, N2) EnQueue(U, N3) Pruned.add(N1) Pruned N1(0.37, 0.432) U N4(0, 0) N3(0.323, 0.619 ) N2(0.21, 0.619 ) 16

  18. Mutual-effect p4 N2 p4,N2 p5 Example: Execution of the RSTkNN Algorithm on IUR-tree, given k=2, alpha=0.6 N4 N2 N3 N1 p5 p3 p1 p2 p4 DeQueue(U, N3) Answer.add(p4) Candidate.add(p5) Pruned Answer N1(0.37, 0.432) p4(0.21, 0.619 ) U Candidate N3(0.323, 0.619 ) N2(0.21, 0.619 ) p5(0.374, 0.374) 17

  19. Mutual-effect p4,p5 p2 p2,p4,p5 p3 Example: Execution of the RSTkNN Algorithm on IUR-tree, given k=2, alpha=0.6 N4 N2 N3 N1 p5 p3 p1 p2 p4 DeQueue(U, N2) Answer.add(p2, p3) So far since U=Cand=empty, algorithm ends. Results: p2, p3, p4. Pruned.add(p5) Pruned Answer N1(0.37, 0.432) p4 p2 p3 U Candidate N2(0.21, 0.619 ) p5(0.374, 0.374) 18

  20. Cluster IUR-tree: CIUR-tree IUR-tree: Texts in an index node could be very different. CIUR-tree: An enhanced IUR-tree by incorporating textual clusters. 19

  21. Optimizations • Motivation • To give a tighter bound during CIUR-tree traversal • To purify the textual description in the index node • Outlier Detection and Extraction (ODE-CIUR) • Extract subtrees with outlier clusters • Take the outliers into special account and calculate their bounds separately. • Text-entropy based optimization (TE-CIUR) • Define TextEntropy to depict the distribution of text clusters in an entry of CIUR-tree • Travel first for the entries with higher TextEntropy,i.e. more diverse in texts. 20

  22. Experimental Study • Experimental Setup • OS: Windows XP; CPU: 2.0GHz; Memory: 4GB • Page size: 4KB; Language: C/C++. • Compared Methods • baseline, IUR-tree, ODE-CIUR, TE-CIUR, and ODE-TE. • Datasets • ShopBranches(Shop), extended from a small real data • GeographicNames(GN), real data • CaliforniaDBpedia(CD), generated combining location in California and documents from DBpedia. • Metric • Total query time • Page access number 21

  23. Scalability (1) Log-scale version (2) Linear-scale version 22

  24. Effect of k (a) Query time (b) Page access 23

  25. Conclusion • Propose a new query problem RSTkNN. • Present a hybrid index IUR-Tree. • Present the efficient search algorithm to answer the queries. • Show the enhancing variant CIUR-Tree and two optimizations ODE-CIUR and TE-CIUR to further improve search processing. • Extensive experiments confirm the efficiency and scalability of our algorithms. 24

  26. Reverse Spatial and Textual k Nearest Neighbor Search Thanks! Q & A

  27. A straightforward method • Compute RSkNN and RTkNN, respectively; • Combine both results of RSkNN and RTkNN get RSTkNN results. No sensible way for combination. (Infeasible)

More Related