450 likes | 735 Views
3D Shape Histograms for Similarity Search and Classification in Spatial Databases. Mihael Ankerst,Gabi Kastenmuller, Hans-Peter-Kriegel,Thomas Seidl Univ of Munich, Germany. Outline. Introduction 3D Shape Similarity Model Quadratic Form Distance Functions Extensibility of Histogram Models
E N D
3D Shape Histograms for Similarity Search and Classification in Spatial Databases.Mihael Ankerst,Gabi Kastenmuller,Hans-Peter-Kriegel,Thomas SeidlUniv of Munich, Germany
Outline • Introduction • 3D Shape Similarity Model • Quadratic Form Distance Functions • Extensibility of Histogram Models • Query Processing • Experimental Results and Conclusion
Outline • Introduction • 3D Shape Similarity Model • Quadratic Form Distance Functions • Extensibility of Histogram Models • Query Processing • Experimental Results and Conclusion
Introduction • Classification • the problem of assigning an appropriate class to the query object • Applications -molecular biology, medical imaging mechanical engg., astronomy • Objects of same class have some characteristic properties in common. • These could be geometric properties , thematic properties.
Classification in Molecular Databases • Classification schemata is already available • We need a fast filter classification algorithm • Dali System - a sophisticated classification algorithm for proteins • CATH – hierarchical classification of protein domain structures • Four levels – class, architecture, topology and homologous super family.
Nearest Neighbor Classification • In general classification is done after training • Object is assigned if it matches the description of the class • Nearest neighbor classifiers –find the nearest neighbor and return its class • K- nearest neighbors - #k, Weights of neighbors
Geometry Based Similarity Search • Spatial objects transformed into high dimensional vector space • In 2D shapes can be represented as ordered set of surface points, approx rectangular coverings etc. • Section Coding technique – each polygon’s circumcircle is decomposed into number of sectors, and each of these sectors are normalized. • Similarity is defined in terms of Euclidean distance between resulting feature vectors.
Invariance Properties • Similarity models need to incorporate invariance against translation, rotation, scaling etc. • Most of the methods include a preprocessing step such as rotation of objects to a normalized orientation, translation of center of mass to origin etc. • Robustness against errors is not considered in most of these models
Outline • Introduction • 3D Shape Similarity Model • Quadratic Form Distance Functions • Extensibility of Histogram Models • Query Processing • Experimental Results and Conclusion
3D Shape Similarity Model • We extend the concept of section coding technique to 3D. • Shape Histograms – feature vectors • Quadratic Distance Function
Shape Histograms • Feature transform maps a complex object onto a feature vector in a multidimensional space. • 3D shape histograms are also feature vectors • Based on partitioning the space into complete and disjoint cells called the bins of the histogram • We can use any space (geometric , thematic etc.)
Shell Model • 3D space is decomposed into concentric shells around the center point • Independent of rotation around the center • Radii of the shells are determined from the extension of the objects • Shells of uniform thickness
Sector Model • 3D space is decomposed into sectors that emerge from the center point of the model • Distribute points uniformly on the surface of the sphere. • The Voronoi diagram gives an appropriate decomposition of the space.
Combined Model • Combination of shell and sector models • Results in a higher dimensionality • We can different combinations of shells and sectors for the same dimensionality
Euclidean Distance • Euclidean Distance between two N dimensional vectors p and q is given by • Individual components of the feature vectors are assumed to be independent • No relationships of the components such as substitutability and compensability may be regarded
Euclidean Distance • Consider 3 objects a, b and c • We can clearly see ‘a and b’ are closely related than ‘a and c’ or ‘b and c’ • However due to rotation, the peaks of ‘a’ and ‘b’ are mapped into different bins and hence the Euclidean distance does not reflect similarity in this case
Outline • Introduction • 3D Shape Similarity Model • Quadratic Form Distance Functions • Extensibility of Histogram Models • Query Processing • Experimental Results and Conclusion
Quadratic Form Distance Function • Quadratic form distance function is defined in terms of similarity matrix ‘A’ • The components aij of Arepresent similarity of the components i and j in the underlying space • Euclidean distance is a specific case of Quad Form Distance where A= I, the Identity Matrix
Quadratic Form Distance Functions • Euclidean distance of two vectors is totally determined • Weighted Euclidean distance is a little more flexible , for it controls the effect of individual vector component onto the overall distance • On top of this, General Quad form distance function also specifies cross-dependencies of the dimensions
Quadratic Form Distance Functions • The neighborhood of the bins can be represented as the similarity weights • Let d(i,j) represent the distance of the cells that correspond to bin i and j • For shells the bin distance is the difference in the corresponding radii • For sectors the bin distance is the difference in the angles of sector centers
Quadratic Form Distance Functions • When provided with appropriate distance function, the similarity matrix can be computed as aij = e-σ.d(i,j) where the parameter σ controls the global shape of the similarity matrix.
Invariance Properties • During normalization , we perform translation and rotation of all objects • Translation is done such that the COM maps onto the Origin • Principal Axes Transform is done • This generally leads to unique orientation of the object
Principal Axes Transform • Compute the Covariance matrix for a given 3D set of points (x,y,z)
Principal Axes Transform • The eigen vectors of this matrix represent the principal axes of the original 3D point set • The eigen values indicate the variance of the points in the respective direction • As a result of PAT all the covariances of the transformed points vanish
Outline • Introduction • 3D Shape Similarity Model • Quadratic Form Distance Functions • Extensibility of Histogram Models • Query Processing • Experimental Results and Conclusion
Extensibility of Histogram Models • Along with spatial properties we can also consider thematic properties • General approach to manage both thematic and spatial properties is to use combined histograms • Combined histogram is the cartesian product of the individual histograms
Outline • Introduction • 3D Shape Similarity Model • Quadratic Form Distance Functions • Extensibility of Histogram Models • Query Processing • Experimental Results and Conclusion
Query Processing • In case of Quad Form Distance Function, the evaluation time of a single database object increases quadratically with dimension
Optimal Multistep k- Nearest Neighbor Search • In order to achieve a good performance , the paradigm of mutlistep query processing is used • An index-based filter step produces a set of candidates • Refinement step performs the expensive exact evaluation of the candidates • Filter is responsible for completeness and refinement for correctness
Optimal Multistep k- Nearest Neighbor Search • Based on multi-dimensional index structure, the filter step performs an incremental ranking • objects ordered by their increasing filter distance to the query are reported • In order to guarantee no false dismissals caused by the filter step, dj(p,q) ≤ do(p,q) Where dj =filter distance and d0 = object distance
Reduction in Dimensionality of Quadratic Forms • Objects in high dimensional spaces are managed by reducing their dimensionality • Typically this is done by Principal Component Analysis, Discrete Fourier transform, Similarity Matrix decomposition, Feature Subselection etc. • These approaches can also be used in case of Quadratic Form Distance
Reduction in Dimensionality of Quadratic Forms • An algorithm to reduce the similarity matrix from a high-dim. space down to a low-dim. space was developed in the context of multimedia databases. • The method guarantees three things • the reduced distance function is a lower bound of the given high-dimensional distance function. • the reduced distance function again is a quadratic form • the reduced distance function is the greatest of all lower-bounding distance functions in the reduced space.
Experimental Evaluation • Data is taken from Brookhaven Protein Databank. • Molecules are represented as surface points for the computation of shape histograms • Reduced Feature Vectors for the filter step are managed by a X-tree of dimension 10.
Experimental Evaluation • Similarity Matrices are computed by an adapted formula from where the similarity weights aij of bin i and j are defined as aij = e-σ.d(i,j) • σ = 10
Classification by Shape Similarity • Every class has at least two molecules • From Preprocessing , 3422 proteins have been classified into 281 classes • 3models pure shell model, pure sector model and combined model have been considered . • The accuracy for the combined model is the best