1 / 18

Nearest Neighbor Queries using R-trees

Nearest Neighbor Queries using R-trees. Based on notes by Yufei Tao. Nearest Neighbor Search. Find the object nearest to a query point q E.g., find the gas station nearest to the red point. k nearest neighbors : Find the k objects nearest to q

jrangel
Download Presentation

Nearest Neighbor Queries using R-trees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Nearest Neighbor Queries using R-trees Based on notes by Yufei Tao

  2. Nearest Neighbor Search • Find the object nearest to a query point q • E.g., find the gas station nearest to the red point. • k nearest neighbors: Find the k objects nearest to q • E.g., 1 NN = {h}, 2NN = {h, a}, 3NN = {h, a, i} CS4482 CityU of HK

  3. Nearest Neighbor Processing • The R-tree can accelerate NN search, too. • Concept: mindist(q, E) • The minimum distance between a point q and a rectangle E CS4482 CityU of HK

  4. Depth-first NN Algorithm • First load the root and compute the mindist from each entry to the query. • Visit the child of the entry with the smallest mindist. • In this case: E6 CS4482 CityU of HK

  5. Depth-first NN Algorithm (cont.) • Do this recursively at the next level. In the child node of E6, compute the mindist from every entry to the query. • Visit the child node of the entry having the smallest mindist. • In this case, E1 and E2 have the same mindist. • So the decision is random – say, E1 first. • Among all the points in the child node of E1, find the closest point a (our current result). CS4482 CityU of HK

  6. Depth-first NN Algorithm (cont.) • Then backtrack to the child node of E6, where the entry with the next mindist value is E2. • Its mindist 51/2 is however the same as the distance from q to a. • So, we know that no point in E2 can possibly be closer to q than a. • No result in E3 either – same reasoning. CS4482 CityU of HK

  7. Depth-first NN Algorithm (cont.) • We now backtrack to the root, where the entry with the next mindist is E7. • Its mindist 21/2 closer than the distance 51/2 from q to a. • Thus, its subtree may contain some point whose distance to q is smaller than the distance between q and a; so we have to visit it • At the child node of E7, compute the mindist of all entries to q. • E4 will be descended next. CS4482 CityU of HK

  8. Depth-first NN Algorithm (cont.) • In the child node of E4, we find a point h that is closer to q than a. • So h becomes our new nearest neighbor. • We backtrack to the child node of E7, where the entry with the next mindist is E5. • E5’s mindist 131/2 is larger than the distance 21/2 from q to a. So we prune its subtree. • The algorithm backtracks to the root and terminates. • Visited (in this order) root, and the child nodes of E6, E1, E7, E4. CS4482 CityU of HK

  9. Another Depth-first Example: 2 NN • Difference: entries must be pruned based on their distances to our 2nd current NN. • Root => child node of E6 => child node of E1 => find {a, b} here • Backtrack to child node of E6 => child node of E2 (its mindist < dist(q, b)) => update our result to {a, f} • Backtrack to child node of E6 => child node of E3 => backtrack to the root => child node of E7 => child node of E4 => update our result to {a, h} • Backtrack to child node of E7 => prune E5 => backtrack to the root => end. CS4482 CityU of HK

  10. Optimal Performance of kNN Search • What’s the best performance that can ever be achieved for a kNN? • Vicinity circle: Centered at query q, with radius equal to the distance of q to its k-th NN • All nodes that intersect the vicinity circle must be visited. • Child node of E6 must be accessed by any algorithm. • Although there’s no result in its subtree, this cannot be verified unless we visit it! CS4482 CityU of HK

  11. Best-first Algorithm (optimal algorithm) • BF maintains all the (leaf- and non-leaf) entries seen so far in the memory, and sorts them in ascending order by their mindist. • Each step processes the entry in memory with the smallest mindist. CS4482 CityU of HK

  12. Best-first Algorithm (cont.) • Insert all the entries in the child node of E6 into the sorted list. • E7 is the next one to be processed. CS4482 CityU of HK

  13. Best-first Algorithm (cont.) • Insert all the entries in the child node of E7 into the sorted list. • The next entry to be processed is E4. CS4482 CityU of HK

  14. Best-first Algorithm (cont.) • Insert all the entries in the child node of E4 into the sorted list. • The next entry to be processed is h, which is a leaf entry. • This is the first NN of q. CS4482 CityU of HK

  15. Best-first Algorithm: 2NN • Assume we want 2 NNs; then, the algorithm continues. • Report h as the 1st NN, and remove it from the heap • The next entry to be processed is E1 CS4482 CityU of HK

  16. Best-first Algorithm: 2NN (cont.) • Visit the child node of E1; enter all its entries into the sorted list. • The next entry is a, which is a leaf entry • The 2nd NN and the algorithm terminates. • Whenever we process a leaf entry in memory, it is the next NN for sure. CS4482 CityU of HK

  17. Best-first = Best Performance • To find the 1st NN, we visited the root, and the child nodes of E6, E7, E4. • To find the 2nd, in addition to the above 3 nodes, we also visited the child node of E1. • Both cases are optimal. • It can be proved that BF visits the nodes in the tree in ascending order of their mindist to the query point. CS4482 CityU of HK

  18. Retrospect: The Rationale Behind • What is the main reasoning of depth-first and best-first algorithms? • Use mindist to quantify the quality of the best point in a subtree. • If a node’s mindist is already greater than our current result, prune it. CS4482 CityU of HK

More Related