1 / 44

Outline

3D Shape Histograms for Similarity Search and Classification in Spatial Databases. Mihael Ankerst,Gabi Kastenmuller, Hans-Peter-Kriegel,Thomas Seidl Univ of Munich, Germany. Outline. Introduction 3D Shape Similarity Model Quadratic Form Distance Functions Extensibility of Histogram Models

dasan
Download Presentation

Outline

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 3D Shape Histograms for Similarity Search and Classification in Spatial Databases.Mihael Ankerst,Gabi Kastenmuller,Hans-Peter-Kriegel,Thomas SeidlUniv of Munich, Germany

  2. Outline • Introduction • 3D Shape Similarity Model • Quadratic Form Distance Functions • Extensibility of Histogram Models • Query Processing • Experimental Results and Conclusion

  3. Outline • Introduction • 3D Shape Similarity Model • Quadratic Form Distance Functions • Extensibility of Histogram Models • Query Processing • Experimental Results and Conclusion

  4. Introduction • Classification • the problem of assigning an appropriate class to the query object • Applications -molecular biology, medical imaging mechanical engg., astronomy • Objects of same class have some characteristic properties in common. • These could be geometric properties , thematic properties.

  5. Classification in Molecular Databases • Classification schemata is already available • We need a fast filter classification algorithm • Dali System - a sophisticated classification algorithm for proteins • CATH – hierarchical classification of protein domain structures • Four levels – class, architecture, topology and homologous super family.

  6. Nearest Neighbor Classification • In general classification is done after training • Object is assigned if it matches the description of the class • Nearest neighbor classifiers –find the nearest neighbor and return its class • K- nearest neighbors - #k, Weights of neighbors

  7. Geometry Based Similarity Search • Spatial objects transformed into high dimensional vector space • In 2D shapes can be represented as ordered set of surface points, approx rectangular coverings etc. • Section Coding technique – each polygon’s circumcircle is decomposed into number of sectors, and each of these sectors are normalized. • Similarity is defined in terms of Euclidean distance between resulting feature vectors.

  8. Invariance Properties • Similarity models need to incorporate invariance against translation, rotation, scaling etc. • Most of the methods include a preprocessing step such as rotation of objects to a normalized orientation, translation of center of mass to origin etc. • Robustness against errors is not considered in most of these models

  9. Outline • Introduction • 3D Shape Similarity Model • Quadratic Form Distance Functions • Extensibility of Histogram Models • Query Processing • Experimental Results and Conclusion

  10. 3D Shape Similarity Model • We extend the concept of section coding technique to 3D. • Shape Histograms – feature vectors • Quadratic Distance Function

  11. Shape Histograms • Feature transform maps a complex object onto a feature vector in a multidimensional space. • 3D shape histograms are also feature vectors • Based on partitioning the space into complete and disjoint cells called the bins of the histogram • We can use any space (geometric , thematic etc.)

  12. Shell Model • 3D space is decomposed into concentric shells around the center point • Independent of rotation around the center • Radii of the shells are determined from the extension of the objects • Shells of uniform thickness

  13. Sector Model • 3D space is decomposed into sectors that emerge from the center point of the model • Distribute points uniformly on the surface of the sphere. • The Voronoi diagram gives an appropriate decomposition of the space.

  14. Combined Model • Combination of shell and sector models • Results in a higher dimensionality • We can different combinations of shells and sectors for the same dimensionality

  15. Euclidean Distance • Euclidean Distance between two N dimensional vectors p and q is given by • Individual components of the feature vectors are assumed to be independent • No relationships of the components such as substitutability and compensability may be regarded

  16. Euclidean Distance • Consider 3 objects a, b and c • We can clearly see ‘a and b’ are closely related than ‘a and c’ or ‘b and c’ • However due to rotation, the peaks of ‘a’ and ‘b’ are mapped into different bins and hence the Euclidean distance does not reflect similarity in this case

  17. Outline • Introduction • 3D Shape Similarity Model • Quadratic Form Distance Functions • Extensibility of Histogram Models • Query Processing • Experimental Results and Conclusion

  18. Quadratic Form Distance Function • Quadratic form distance function is defined in terms of similarity matrix ‘A’ • The components aij of Arepresent similarity of the components i and j in the underlying space • Euclidean distance is a specific case of Quad Form Distance where A= I, the Identity Matrix

  19. Quadratic Form Distance Functions • Euclidean distance of two vectors is totally determined • Weighted Euclidean distance is a little more flexible , for it controls the effect of individual vector component onto the overall distance • On top of this, General Quad form distance function also specifies cross-dependencies of the dimensions

  20. Quadratic Form Distance Functions • The neighborhood of the bins can be represented as the similarity weights • Let d(i,j) represent the distance of the cells that correspond to bin i and j • For shells the bin distance is the difference in the corresponding radii • For sectors the bin distance is the difference in the angles of sector centers

  21. Quadratic Form Distance Functions • When provided with appropriate distance function, the similarity matrix can be computed as aij = e-σ.d(i,j) where the parameter σ controls the global shape of the similarity matrix.

  22. Invariance Properties • During normalization , we perform translation and rotation of all objects • Translation is done such that the COM maps onto the Origin • Principal Axes Transform is done • This generally leads to unique orientation of the object

  23. Principal Axes Transform • Compute the Covariance matrix for a given 3D set of points (x,y,z)

  24. Principal Axes Transform • The eigen vectors of this matrix represent the principal axes of the original 3D point set • The eigen values indicate the variance of the points in the respective direction • As a result of PAT all the covariances of the transformed points vanish

  25. Outline • Introduction • 3D Shape Similarity Model • Quadratic Form Distance Functions • Extensibility of Histogram Models • Query Processing • Experimental Results and Conclusion

  26. Extensibility of Histogram Models • Along with spatial properties we can also consider thematic properties • General approach to manage both thematic and spatial properties is to use combined histograms • Combined histogram is the cartesian product of the individual histograms

  27. Outline • Introduction • 3D Shape Similarity Model • Quadratic Form Distance Functions • Extensibility of Histogram Models • Query Processing • Experimental Results and Conclusion

  28. Query Processing • In case of Quad Form Distance Function, the evaluation time of a single database object increases quadratically with dimension

  29. Optimal Multistep k- Nearest Neighbor Search • In order to achieve a good performance , the paradigm of mutlistep query processing is used • An index-based filter step produces a set of candidates • Refinement step performs the expensive exact evaluation of the candidates • Filter is responsible for completeness and refinement for correctness

  30. Optimal Multistep k- Nearest Neighbor Search • Based on multi-dimensional index structure, the filter step performs an incremental ranking • objects ordered by their increasing filter distance to the query are reported • In order to guarantee no false dismissals caused by the filter step, dj(p,q) ≤ do(p,q) Where dj =filter distance and d0 = object distance

  31. Reduction in Dimensionality of Quadratic Forms • Objects in high dimensional spaces are managed by reducing their dimensionality • Typically this is done by Principal Component Analysis, Discrete Fourier transform, Similarity Matrix decomposition, Feature Subselection etc. • These approaches can also be used in case of Quadratic Form Distance

  32. Reduction in Dimensionality of Quadratic Forms • An algorithm to reduce the similarity matrix from a high-dim. space down to a low-dim. space was developed in the context of multimedia databases. • The method guarantees three things • the reduced distance function is a lower bound of the given high-dimensional distance function. • the reduced distance function again is a quadratic form • the reduced distance function is the greatest of all lower-bounding distance functions in the reduced space.

  33. Experimental Evaluation • Data is taken from Brookhaven Protein Databank. • Molecules are represented as surface points for the computation of shape histograms • Reduced Feature Vectors for the filter step are managed by a X-tree of dimension 10.

  34. Experimental Evaluation • Similarity Matrices are computed by an adapted formula from where the similarity weights aij of bin i and j are defined as aij = e-σ.d(i,j) • σ = 10

  35. Basic Similarity Search

  36. Classification by Shape Similarity • Every class has at least two molecules • From Preprocessing , 3422 proteins have been classified into 281 classes • 3models pure shell model, pure sector model and combined model have been considered . • The accuracy for the combined model is the best

  37. Classification by Shape Similarity

More Related