150 likes | 368 Views
Improving the Performance of M-tree Family by Nearest-Neighbor Graphs. Tom áš Skopal, David Hoksza Charles University in Prague Department of Software Engineering Czech Republic. Presentation Outline. Metric Access Methods (MAMs) M-tree, PM-tree Query processing and Filtering
E N D
Improving the Performance of M-tree Family by Nearest-Neighbor Graphs Tomáš Skopal, David Hoksza Charles University in PragueDepartment of Software Engineering Czech Republic
Presentation Outline • Metric Access Methods (MAMs) • M-tree, PM-tree • Query processing and Filtering • Nearest-neighbor graphs → M*-tree, PM*-tree • filtering • pivot selection strategies • Experiments ADBIS 2007
Metric Access Methods • Indexing methods designed for searching metric datasets • Similarities among objects are modeled by a distance function which fulfills metric properties • MAMs focus on minimizing number of distance computations by storing the distances in index, thus filtering non-relevant objects when querying • Methods • GNAT, (m)vp-tree, D-index, (L)AESA, … • M-tree, PM-tree ADBIS 2007
M-tree (Metric tree) • dynamic, hierarchical index structure • data space divided into ball shaped data regions (hyper-spheres) • root node represent data region covering all data • children nodes represent regions covering parts of the space, … • built in bottom-up way like b-tree • when node is full, new node is created and the objects are separated be • data regions form balanced hierarchical structure • inner nodes → routing entries • leaf nodes → ground items ADBIS 2007
Query Processing + Filtering • range and k nearest neighbor (kNN) queries • traversing from the root node • in case of kNN dynamically decreasing query radius • basic filtering→ filter out nodes whose parent data region doesn’t intersect the query region • parent filtering→ using precomputed distance of an object to the parent and of the parent to the query ADBIS 2007
query query PM-tree (Pivoting Metric tree) • PM-tree = M-tree enhanced by p static global pivots and each hyper-sphere region enhanced by p hyper-ring regions – rings which restrict it’s volume • ith ring defined by nearest and furthest objects in the node according to ith pivot • query region overlaps node region only if it overlaps hyper-sphere and all hyper-rings → more effective basic filtering Q doesn’t overlap 2. ring Q Q M-tree region PM-tree region ADBIS 2007
Pivot space • global pivots map regions/data into a pivot space of dimensionality p (ith coordinate → distance to ith pivot) • distances of a data region to p pivots produces p-dimensional minimum bounding rectangle • the overlap with rings can be understood in this sense as L∞ filtering (region is filtered out if it’s L∞ distance to Q is smaller then the query radius) ADBIS 2007
M*-tree, PM*-tree • M*-tree = M-tree + nearest-neighbor (NN) graphs • present in every node • each object knows it’s NN (within it’s node) • example → • PM*-tree = PM-tree + nearest-neighbor (NN) graphs O6 = NN(O4) ADBIS 2007
NN-graph Filtering • objects (NN graph nodes) play role of mutual local pivots • sacrifice • local pivot • object whose distance to the query is really computed by query evaluation • used for possible filtering of reverse nearest neighbours (rNNs) • filtering with NN-graph (one step of node processing) • fetch first record (Si) from sacrifices queue (SQ) • apply parent filtering to Si • If Si not filtered → sacrifice (compute Q-Si distance) • try to filter out rNNs(Si) (NN-graph filtering) • move non-filtered rNNs(Si) to the beginning of SQ (rNNs sets are disjoint → non-filtered become sacrifices) • apply basic filtering to Si ADBIS 2007
Sacrifice selection • selection of sacrifices is important • good pivot filters many objects out • poor pivot filters good possible pivot(s) (future sacrifices) • Heuristics • M*-tree • hMaxRNNCount • first in SQ is object with highest number of rNNs • hMinRNNDistance • first in SQ is object nearest to its NN or rNN • hMinToParentDistance • first in SQ is object closest to parent object • PM*-tree • hMinLmaxDistance • first in SQ is object with minimum L∞ distance • hMaxLmaxDistance • first in SQ is object with maximum L∞ distance ADBIS 2007
Experimental Results • Corel dataset • 65,615 feature vectors of images • L1 distance function • 8 dimensions • Polygons dataset • synthetic • 1,000,000 randomly generated 2D polygons (5-10 vertices) • Hausdorff set distance function • GenBank Dataset • 250,000 strings of proteins (of lengths 50-100) • edit distance function • Testing of • computation costs (number of distance computations) ADBIS 2007
Experiments – Corel Dataset ADBIS 2007
Experiments – Polygons Dataset ADBIS 2007
Experiments- Genbank Dataset ADBIS 2007
Conclusion • We have proposed • enhancing nodes of M-tree like structures by nearest-neighbors graphs • filtering technique based on NN-graphs → NN-graph filtering • We have implemented • M*-tree (enhancement of M-tree by NN-graphs) • PM*-tree (enhancement of PM-tree by NN-graphs) • Experimental results • we have shown up to 45% speed-up ADBIS 2007