1 / 18

Nearest Neighbours Search using the PM-tree

Nearest Neighbours Search using the PM-tree. Tomáš Skopal 1 Jaroslav Pokorn ý 1 Václav Snášel 2. 1 Charles University in Prague Department of Software Engineering Czech Republic. 2 VSB - Technical University of Ostrav a Department of Computer Science Czech Republic.

chacha
Download Presentation

Nearest Neighbours Search using the PM-tree

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Nearest Neighbours Search using the PM-tree Tomáš Skopal1 Jaroslav Pokorný1 Václav Snášel2 1Charles University in PragueDepartment of Software Engineering Czech Republic 2VSB - Technical University ofOstravaDepartment of Computer ScienceCzech Republic DASFAA 2005, Beijing

  2. Presentation Outline • Similarity search in Metric Spaces • M-tree • the structure • k-NN search • PM-tree(an extension of M-tree) • motivation • the structure • k-NN search • Experimental Results DASFAA 2005, Beijing

  3. Similarity search in Metric Spaces • Similarity search • methods for content-based retrieval in multimedia databases • the similarity measure is often modelled by a metricd(satisfying triangular inequality, symmetry, reflexivity, non-negativity) • similarity queries (query by example)realized as metric queries • range query (Q, rQ) (specified by a query object Q and covering radius rQ) • k-NN query(Q, k) (specified by a query object Q and number of nearest neighbours k) Metric Access Methods (MAMs) • designed to search in metric datasets in order to keep the search costs minimal • search costs = number of distance computations + I/O costs • only distances between objects are used for indexing (the structure of object representation is not used for indexing) • many MAMs are not suitable for similarity search in large datasets • either a static method or high I/O search costs • M-tree and (recently) D-index are the only suitable candidates so far DASFAA 2005, Beijing

  4. range query Q (euclidean 2D space) M-tree (metric tree) • dynamic, balanced, and paged tree structure (like e.g. B+-tree, R-tree) • the leaves are clusters of indexed objects Oj (ground objects) • routing entries in the inner nodes represent hyper-spherical metric regions (Oi , rOi), recursively bounding the object clusters in leaves • the triangular inequality allows discarding of irrelevant M-tree branches (metric regions resp.) during query evaluation DASFAA 2005, Beijing

  5. k-NN search in the M-tree • branch-and-bound algorithm(similar to that of R-tree) • modification of range query algorithm, but the query radius rQ is dynamic • rQ decreasing from infinity to the distance to the k-th neighbour • utilized two structures: priority queue PR and sorted array NN • PR: stores requests for nodes not-filtered from the search yet • request of form [routing entry to a node N, dmin(N)], where dmin(N) is the lower bound distancefrom Q to all possible objects in N, i.e. dmin(rout. entry to N) = max {0 , d(Q , Oi)– rOi} where (Oi , rOi )is region of the N’s routing entry; (requests in PR sorted by dmin(N)) • NN: stores k candidate objects (or distance upper bounds) • at the end of algorithm run, NN contains the result, i.e. the k nearest neighbours • entry of form [candidate object Oi, d(Q,Oi)] or [ - , dmax(N)], where dmax(·) is the upper bound distance from Q to all possible objects in N, i.e dmax(rout. entry to N) = d(Q , Oi)+ rOi • PR stores only requests with dmin(·) < dmax(·), other requests are removed from PR • i.e. such requests are removed, which do not overlap the dynamic query region (Q , rQ) Query processing: the requests in PR are processed in FIFO manner → a node N is retrieved, while PR and NN structures are updates by routing/ground entries of N • PR is initialized to ([root,∞] ), NN is initialized by k entries [-,∞] to ( [- ,∞] , [- ,∞] , ... ) • optimal in I/O costs(the same I/O costs as range query (Q , d(Q , NN[5])) ) DASFAA 2005, Beijing

  6. rQ = ∞ dmax(I.) dmax(II.) read root read node(II.) dmin(I.) dmin(II.) = 0 k-NN search in M-tree: example (k=2) DASFAA 2005, Beijing

  7. dmax(C) dmin(C) dmax(D) dmax(O6) dmax(O5) dmin(D) read node(D) k-NN search in M-tree: example (k=2) DASFAA 2005, Beijing

  8. dmax(O4) read node(I.) dmin(B) read node(B) k-NN search in M-tree: example (k=2) 5 nodes accessed, the same nodes accessed byrange query (Q ,d(Q,O5) ) DASFAA 2005, Beijing

  9. PM-treemotivation • metric regions in M-treeare unnecessarily large indexing of large portions of empty space (the “dead” space) higher probability of intersection with query region less efficient search • reduction of metric region “volume” should lead to more effective discarding of irrelevant subtrees • the question is how to specify a compact metric region bounding all the objects more “tightly” generalization of the M-tree for another metric region shape representations DASFAA 2005, Beijing

  10. PM-tree region utilization of global pivots (inspired by LAESA-like methods) given a fixed set ofpglobal pivotsPi (selected from (a part of) the dataset) phyper-ring regions(Pi, HR[i]) are defined for each routing entry array HR of p intervals <HR[i].min , HR[i].max> each interval HR[i] bounds the distances of objects to the respective pivot Pi PM-tree region = M-tree region + HR array(pivots Pisharedby all PM-tree regions) intersection of the hyper-sphere and the hyper-rings forms a smaller region bounding all the objects in leaves the more pivots, the more tightly bounded region PM-tree is built the same way as M-tree is built, i.e. the hyper-rings only „cut off“ the M-tree sphere DASFAA 2005, Beijing

  11. query query PM-tree, query processing • distances d(Q, Pi) for all i ≤ p must be computed prior to processing a query • metric region (Oi , rOi , HR) is relevant to(intersected by) a range query (Q, rQ) just in case that all the hyper-rings and the hyper-sphere overlap the range query region  the more hyper-rings, the lower probability of intersection with query  no additional distance computations are needed for the intersection test Q Q M-tree region PM-tree region DASFAA 2005, Beijing

  12. k-NN search in the PM-tree 3 modifications of M-tree’s k-NN algorithm • different intersection test between query region (Q, rQ)and PM-tree region (Oi , rOi , HR) Λt=1..p d(Pt , Q) – rQ≤HR[t].max Λd(Pt , Q) + rQ≥HR[t].min • different dmin construction (+ possible distance increase to the farthest hyper-ring) dmin(rout. entry to N) = max {0, d(Q , Oi) – rOi , HRfarthest} HRfarthest= maxt=1..p { d(Pt , Q) – HR[t].max , HR[t].min – d(Pt , Q) } • different dmax construction (+ possible distance decrease to the farthest object in the nearest hyper-ring)dmax(rout. entry to N) = max { d(Q , Oi) +rOi , HRnearest }HRnearest= mint=1..p { d(Q , Oi) + HR[t].max } DASFAA 2005, Beijing

  13. dmax(I.) read root dmin(I.) dmax(II.) dmin(II.) read node(I.) k-NN search in PM-tree: example (k=2) DASFAA 2005, Beijing

  14. read node(II.) read node(B) k-NN search in PM-tree: example (k=2) DASFAA 2005, Beijing

  15. read node(D) k-NN search in PM-tree: example (k=2) 5 nodes accessed, the same nodes accessed byrange query (Q ,d(Q,O5) ) DASFAA 2005, Beijing

  16. Experimental Results (synthetic datasets) • synthetic vector datasets (4D – 60D); 100,000 tuples; 1000 clusters • disk page sizes: 1 KB – 4 KB; index sizes: 4.5 MB – 55 MB DASFAA 2005, Beijing

  17. Experimental Results(image database) • WBIIS imagedatabase; appr. 10,000 256D-vectors (gray histograms) • disk page size: 32 KB; index sizes: 16 MB – 20 MB DASFAA 2005, Beijing

  18. References [1] Skopal T., Pokorný J., Snášel V.: PM-tree: Pivoting Metric Tree for Similarity Search in Multimedia Databases, ADBIS 2004, Budapest, Hungary [2] Skopal T.: Pivoting M-tree: A Metric Access Method for Efficient Similarity Search, DATESO 2004, Desná, Czech Republic [3] Skopal T., Pokorný J., Krátký M., Snášel V.: Revisiting M-tree Building Principles. ADBIS 2003, Dresden, Germany, LNCS2798, Springer [4] Skopal T.: Metric Indexing in Information Retrieval PhD thesis, VSB-Technical University of Ostrava http://urtax.ms.mff.cuni.cz/~skopal/phd/thesis.pdf DASFAA 2005, Beijing

More Related