120 likes | 218 Views
Richard Swinbank 9 th July 2004 Bulk Loading the M-tree to Enhance Query Performance Alan P. Sexton & Richard Swinbank University of Birmingham. Bulk Loading the M-tree. The M-tree Hasn’t this been done already?! Our approach and motivation Outlier effects Symmetry and Deletion
E N D
Richard Swinbank 9th July 2004 Bulk Loading the M-tree to Enhance Query Performance Alan P. Sexton & Richard Swinbank University of Birmingham
Bulk Loading the M-tree • The M-tree • Hasn’t this been done already?! • Our approach and motivation • Outlier effects • Symmetry and Deletion • Conclusions
A B C D E a b c d e The M-tree • Like B+ tree; multiway, paged, post-and-grow • ‘Discriminators’ are metric balls, not intervals • No concept of position, only distance • Query performance depends critically on overlap A D d E a c b e C B
Hasn’t this been done already?! • Ciaccia et al., 1998 • Seeded trees: top-down growth • Cheaper to build than insertion-built trees • Comparable query performance • B+ tree • Sort data • Build bottom-up • M-tree • Cluster data • Build bottom-up?
Bulk Loading the M-tree • 25% - 40% query performance gain • Top : 1-NN query results • Bottom : Leaf radii for related trees
Closest-pair clustering • Requirements • Upper (CMAX) and lower (CMAX/2) bound on cardinality • Minimise overlap of metric representation • Algorithm • Take closest pair of clusters (c1, c2) • If |c1| + |c2| <= CMAX, merge, otherwise remove larger cluster from working set • Repeat until working set is empty • Outlier effects
Outlier effects M-tree insertion Closest-pair clustering
Bulk Loading • Use closest-pair clustering to prepare a full level • Accumulate primary medoids to populate next level up • Algorithm • Cluster points • On-the-fly: • Write output clusters to disk: M-tree nodes • Generate parent entries: points for next level up • Repeat until next level is a single page • Bottom-up growth • Subtree containment
Conclusions • Closest-pair clustering algorithm • Mitigates outlier effects • Improves query performance • Bulk loading algorithm • Bottom-up, balanced growth • Insert/Delete symmetry: SM-tree • Further work • Questions?