Feature Based Similarity

Christian Böhm1, Florian Krebs2, and Hans-Peter Kriegel21University for Health Informatics and Technology, Innsbruck2University of MunichOptimal Dimension Order: A Generic Technique for the Similarity Join

Feature Based Similarity

Simple Similarity Queries • Specify query object and • Find similar objects – range query • Find the k most similar objects – nearest neighbor q.

R S Join Applications: Catalogue Matching • Catalogue matching • E.g. Astronomy catalogues

Join Applications: Clustering • Clustering (e.g. DBSCAN) • Similarity self-join

e R-Tree Similarity Join • Depth-first traversal of two trees[Brinkhoff, Kriegel, Seeger: Efficient Process. of Spatial Joins Using R-trees, Sigmod Conf. 1993] R S

The e-kdB-Tree [Shim, Srikant, Agrawal: High-dimensional Similarity Joins, ICDE 1997] • Assumption: 2 adjacent e-stripes fit in main mem. • Unrealistic for large data sets which are ... • clustered, • skewed and • high-dimensional data

Epsilon Grid Order [Böhm, Braunmüller, Krebs, Kriegel: Epsilon Grid Order. SIGMOD Conf. 2001]

Common Properties • Decomposition of data/space into regions • Regions described by hyper-rectangles for each pair (P,Q) of partitions having dist (P,Q) £e for each pair of points (p,q) on (P,Q) testdist (p,q) £e ; • Most CPU-effort in distance test between vectors:ÞIdea: Speed-up distance test

Related Work: Plane Sweep for Polygons [Shin, Moon, Lee: Adaptive Multi-Stage Distance Join Processing, SIGMOD Conf. 2000] • Observations: • More efficient to use x-axis as sweep direction. • Projection of polygons to y-axis yield high overlap • Decide by projections of the bounding boxes(integrate a pdf)

Feature Vectors in the Similarity Join • Distance computation between feature vectors p,q for (i=0 ; i<d ; i++) { dist2 = dist2 + (p[i] - q[i])2 ; if (dist2 > e2) break ; } • Order dimensions by Mating Probability (increasing) d1 d0

Computation of the Mating Probability To determine mating probability for di: • Project bounding boxes on di-axis d1 d0

d0 d0 d0 d0 d0 d0 Computation of the Mating Probability To determine mating probability for di: • Project bounding boxes on di-axis • Consider two projections in 2-dimensional space d0

Computation of the Mating Probability To determine mating probability for di: • Project bounding boxes on di-axis • Consider two projections in 2-dimensional space d0[Q] d0-Projection of each point pair located in this event space d0[P]

y £ x + e y ³ x - e Computation of the Mating Probability To determine mating probability for di: • Project bounding boxes on di-axis • Consider two projections in 2-dimensional space d0[Q] d0-Projection of each point pair located in this event space mating point pairs on e-stripe d0[P] e e

Computation of the Mating Probability To determine mating probability for di: • Project bounding boxes on di-axis • Consider two projections in 2-dimensional space d0[Q] Mating Probability for d0 e d0[P] e

Optimal Dimension Order • For a given pair (P,Q) of partitions the optimal dimension order ODO is the sequence of dimensions with increasing mating probability • Algorithm: for each pair (P,Q) of partitions having dist (P,Q) £e determine ODO ; for each pair of points (p,q) on (P,Q) testdist (p,q) £eusing ODO ;

Shape of the Intersection Area • 20 different shapes are possible, e.g. 1223 2233 2223 • Easy proof of completeness and efficient case distinction by assigning codes to the corners • 1: Corner is left or above the e-stripe • 2: Corner is on the e-stripe • 3: Corner is right or below the e-stripe • Easy formulas (only 45° and 90° angles)

Experimental Evaluation: R-tree Sim. Join • 8-dimensional data, uniformly distributed

Experimental Evaluation: R-tree Sim. Join • 16-dimensional data, from CAD-similarity search

Experimental Evaluation: Scalability MuX, uniform data Z-RSJ, uniform data

Experimental Evaluation: Scalability EGO, CAD data

Conclusion • Conclusion: • Similarity join is an important database primitive for knowledge discovery in databases • Many different basic algorithms • Most accelerable by our optimal dimension order • Future Work: • New applications of the similarity join • Further optimization (multi-parameter) of the sim. join • Parallel and distributed environments

Feature Based Similarity

Feature Based Similarity

Presentation Transcript

Gene Prediction: Similarity-Based Approaches

Feature-Based Textures

Feature Similarity

On Link-based Similarity Join

Feature-based (object-based) Verification

FeatureCAM Feature-based Programming

Feature-Based Alignment

Feature-Based Modeling

Feature-based Grammar

Feature Grouping-Based Fuzzy-Rough Feature Selection

Feature Based Similarity

SIMILARITY/CLOSENESS-BASED RESOURCE BROWSER

Feature-based Choice and Similarity in Normal-form Games: An Experimental Study

Feature-Based Image Metamorphosis

Fast Business Process Similarity Search with Feature- based E stimation

Feature Based Approaches to Semantic Similarity

Content-Based Similarity Search

Feature Based Recommender Systems

Feature Sets Based Similarity Measures for Image Retrieval

Feature-Based Image Metamorphosis

Similarity based deduplication

Feature Based Image Mosaicing