270 likes | 386 Views
Hierarchical Cellular Tree: An Efficient Indexing Scheme for Content-Based Retrieval on Multimedia Databases. Serkan Kiranyaz and Moncef Gabbouj. Objective.
E N D
Hierarchical Cellular Tree: An Efficient Indexing Scheme for Content-Based Retrieval on Multimedia Databases Serkan Kiranyaz and Moncef Gabbouj
Objective • To present the technique of using a Hierarchical Cellular Tree (HCT) as an indexing scheme for content-based retrieval on multimedia databases.
Why is this technique important? • Technological hardware and network improvements • Daily usage of Internet • Technique reduces costly I/O operations
HCT Overview • Is a MAM(Metric Access Method) technique. • Based off the M-tree • Is a dynamic, cell-based, hierarchical structured indexing method • Items are partitioned based on distances and stored within cells based on their similarity proximity • Self-organized tree implemented via genetic programming principles
Indexing Technique Categories SAM(spatial access method) • (dis-)similarity distance only measured through Euclidean distance. • Not suited for deep spanning trees MAM (metric access method) • Support black box approach to (dis-)similarity distance. • Allows for deep trees • Do not support dynamic changes*
*M-tree Similarities • Is a dynamic MAM • Has a hierarchical structure based on the mitosis of a cell • Tree grows one level upwards whenever a split occurs at the top level • Each cell is represented by a nucleus (except the top most cell)
M-tree Problems • Achieves a balanced tree with low I/O cost in large datasets • Problem: Multimedia databases are seldom balanced at all. • HCT: Cells are unbalanced and can vary in size • Must know the size of the database entries/Cells before building (capacity M) • Problem: All M-tree structures can hit upper limits (size non dynamic) • HCT: Removes limit on cell size as long as they keep a definite "compactness" measure
M-tree Problems • M-tree compactness is only measured with respect to distance of nucleus to furthest object (covering radius) • Problem: Determining compactness this way does not allow for dynamic sizing of cells. • HCT: Uses all cell items and their minimum distances to the cell(instead of a single nucleus item alone), compactness is constantly being updated.
Related Work in Multimedia Databases (SAM trees) • KD-Trees • Hierarchical tree structure • Use space-partitioning methods to divide the feature space into predefined hyperplanes • R-Trees • Feature space divided according to distribution of database items • Region overlapping may occur
Related Work in Multimedia Databases (SAM trees) • R*-trees • Improves the node splitting of R-tree by taking overlapping areas into consideration • TV-tree • Uses telescope vectors • Authors call telescope vectors "so called telescope vectors" • Google search does not come up with anything meaningful for telescope vectors
Related Work in Multimedia Databases (SAM trees) • X-tree • Avoids overlapping of region bounding boxes by using a new organization of the directory • Boxes can still intersect at higher levels in the tree • Paper does not go into detail on what a bounding box is (assumption bounding box = cell) • SS-tree • Uses minimum bounding spheres instead of boxes • Less intersects at higher levels
Related Work in Multimedia Databases (MAM trees) • vp-tree(vantage point) • organizes feature vectors(data points) into two groups according to their similarity distances with respect to a single point(vantage point) • mvp-tree(multiple vantage point) • assigns multiple vantage points instead of one
HCT Structure - Cell Structure • Basic container in which similar database items are stored. • Ground level cells contain the entire database items • Cells carry an MST (Minimum Spanning Tree) • Holds minimum (dis-)similarity distance of each item to other items within the cell. • Used to determine when mitosis should occur. • Splits occur at longest branch. • This is actually very similar to MVP-tree except every cell is treated as a vantage point. • Better idea about the similarity proximity of an item.
HCT Structure - Cell Structure • Cells cannot undergo mitosis before reaching a specific level of maturity • This works like real cells • Reason for this is not like real cells • Nucleus • Represents the owner cell of a higher level • Nucleus is found through MST • Item with maximum number of branches • Nucleus is updated with every operation performed • M-tree does not do this
HCT Structure - Cell Structure • Cell Compactness • How tight focused the clustering for items within the cell • High variations are eliminated by using more than a single item(vantage point)
HCT Structure - Cell Structure • Cell Mitosis • Two conditions for mitosis • Maturity (Nc > Nm) • c = number of items in cell • m = maturity minimum limit • Cell Compactness (CFc > CThrL) • CFc = Compactness feature • CThrL = current level compactness threshold • Cell Mitosis has no cost as the cell is simply split by breaking longest branch
HCT Structure - Level Structure • Top level always single cell • If mitosis occurs on top level, new top level is created to preserve single cell top level. • Each level attempts to dynamically maximize compactness of cells
HCT Structure - HCT Operations • Three operations • Cell mitosis • Item insertion • Item removal • As stated before all three operations cause a recalculation of Compactness
HCT Structure - HCT Operations • Insert • First performs the Pre-Emptive cell search • recursively descends HCT from top to target level • Once target located, insert item into target cell • Perform post-processing check • Check for mitosis • Recalculate compactness for single or multiple cells • If mitosis was performed • Remove old nucleus item from higher level • Consecutively call Insert for new nucleus
HCT Structure - HCT Indexing • HCT can index using any set of available features • Must have fusion mechanism • Must have similarity measure • Consists of two operations • Incremental construction • Optional periodic fitness check
HCT Structure - HCT Indexing • HCT Incremental Construction • Takes a Database D and appends all new items contained in an Array • If an HCT does not already exist for database D • All current items of D are inserted into the Array • A new HCT body is constructed from D • Else if an HCT does exist for database D • HCT body is first loaded • HCT body is updated with contents of Array
HCT Structure - HCT Indexing • HCT Fitness Check • Aims to minimize corruption which can happen during construction of HCT body • Corruption happens because the order of items that are inserted is not handled • Outliers Check • Reduces the "crowd effect" by removing redundant minority cells • minority cells, cells with a few or one item in it • All minority cells are reintroduced into the system to see if they fit into another cell
HCT Structure - HCT Indexing • Cell Merging • If a cell merge occurs that is later deemed as not meeting the requirements of cell compactness it can be merged.