Learning Embeddings for Similarity-Based Retrieval

Learning Embeddings for Similarity-Based Retrieval Vassilis Athitsos Computer Science Department Boston University

Overview • Background on similarity-based retrieval and embeddings. • BoostMap. • Embedding optimization using machine learning. • Query-sensitive embeddings. • Ability to preserve non-metric structure. • Cascades of embeddings. • Speeding up nearest neighbor classification.

x1 x2 x3 xn Problem Definition database (n objects)

x1 x2 x3 xn Problem Definition database (n objects) • Goals: • find the k nearest neighbors of query q. q

x1 x3 x2 xn Problem Definition database (n objects) • Goals: • find the k nearest neighbors of query q. • Brute force time is linear to: • n (size of database). • time it takes to measure a single distance. x2 q xn

x1 x3 x2 xn Problem Definition database (n objects) • Goals: • find the k nearest neighbors of query q. • Brute force time is linear to: • n (size of database). • time it takes to measure a single distance. q

Nearest neighbor classification. Similarity-based retrieval. Image/video databases. Biological databases. Time series. Web pages. Browsing music or movie catalogs. faces letters/digits Applications handshapes

Comparing d-dimensional vectors is efficient: O(d) time. … … x1 y1 x2 y2 x3 y3 x4 y4 xd yd Expensive Distance Measures

Comparing d-dimensional vectors is efficient: O(d) time. Comparing strings of length d with the edit distance is more expensive: O(d2) time. Reason: alignment. … … x1 y1 x2 y2 y3 x3 x4 y4 xd yd Expensive Distance Measures i m m i g r a t i o n i m i t a t i o n

Comparing d-dimensional vectors is efficient: O(d) time. … … x1 y1 x2 y2 y3 x3 x4 y4 xd yd Expensive Distance Measures • Comparing strings of length d with the edit distance is more expensive: • O(d2) time. • Reason: alignment. i m m i g r a t i o n i m i t a t i o n

Matching Handwritten Digits

Shape Context Distance • Proposed by Belongie et al. (2001). • Error rate: 0.63%, with database of 20,000 images. • Uses bipartite matching (cubic complexity!). • 22 minutes/object, heavily optimized. • Result preview: 5.2 seconds, 0.61% error rate.

More Examples • DNA and protein sequences: • Smith-Waterman. • Time series: • Dynamic Time Warping. • Probability distributions: • Kullback-Leibler Distance. • These measures are non-Euclidean, sometimes non-metric.

Indexing Problem • Vector indexing methods NOT applicable. • PCA. • R-trees, X-trees, SS-trees. • VA-files. • Locality Sensitive Hashing.

Metric Methods • Pruning-based methods. • VP-trees, MVP-trees, M-trees, Slim-trees,… • Use triangle inequality for tree-based search. • Filtering methods. • AESA, LAESA… • Use the triangle inequality to compute upper/lower bounds of distances. • Suffer from curse of dimensionality. • Heuristic in non-metric spaces. • In many datasets, bad empirical performance.

x1 x2 x3 xn x1 x2 x3 x4 xn Embeddings database Rd embedding F

x1 x2 x3 xn x1 x2 x3 x4 xn q Embeddings database Rd embedding F query

x1 x2 x3 xn x1 x2 x3 x4 xn q q Embeddings database Rd embedding F query

x2 x3 x1 xn x4 x3 x2 x1 xn q q • Measure distances between vectors (typically much faster). Embeddings database Rd embedding F query

x2 x3 x1 xn x4 x3 x2 x1 xn q q • Measure distances between vectors (typically much faster). • Caveat: the embedding must preserve similarity structure. Embeddings database Rd embedding F query

Reference Object Embeddings database

Reference Object Embeddings database r1 r2 r3

Reference Object Embeddings database r1 r2 r3 x F(x) = (D(x, r1), D(x, r2), D(x, r3))

F(x) = (D(x, LA), D(x, Lincoln), D(x, Orlando)) F(Sacramento)....= ( 386, 1543, 2920) F(Las Vegas).....= ( 262, 1232, 2405) F(Oklahoma City).= (1345, 437, 1291) F(Washington DC).= (2657, 1207, 853) F(Jacksonville)..= (2422, 1344, 141)

Existing Embedding Methods • FastMap, MetricMap, SparseMap, Lipschitz embeddings. • Use distances to reference objects (prototypes). • Question: how do we directly optimize an embedding for nearest neighbor retrieval? • FastMap & MetricMap assume Euclidean properties. • SparseMap optimizes stress. • Large stress may be inevitable when embedding non-metric spaces into a metric space. • In practice often worse than random construction.

BoostMap • BoostMap: A Method for Efficient Approximate Similarity Rankings.Athitsos, Alon, Sclaroff, and Kollios,CVPR 2004. • BoostMap: An Embedding Method for Efficient Nearest Neighbor Retrieval. Athitsos, Alon, Sclaroff, and Kollios,PAMI 2007(to appear).

Key Features of BoostMap • Maximizes amount of nearest neighbor structure preserved by the embedding. • Based on machine learning, not on geometric assumptions. • Principled optimization, even in non-metric spaces. • Can capture non-metric structure. • Query-sensitive version of BoostMap. • Better results in practice, in all datasets we have tried.

F Rd original space X Ideal Embedding Behavior a q For any query q: we want F(NN(q)) = NN(F(q)).

F Rd original space X Ideal Embedding Behavior b a q For any query q: we want F(NN(q)) = NN(F(q)). For any database object b besides NN(q), we want F(q) closer to F(NN(q)) than to F(b).

b a q Embeddings Seen As Classifiers For triples (q, a, b) such that: - q is a query object - a = NN(q) - b is a database object Classification task: is q closer to a or to b?

b a q Embeddings Seen As Classifiers For triples (q, a, b) such that: - q is a query object - a = NN(q) - b is a database object Classification task: is q closer to a or to b? • Any embedding F defines a classifier F’(q, a, b). • F’ checks if F(q) is closer to F(a) or to F(b).

b a q Classifier Definition For triples (q, a, b) such that: - q is a query object - a = NN(q) - b is a database object Classification task: is q closer to a or to b? • Given embedding F: X  Rd: • F’(q, a, b) = ||F(q) – F(b)|| - ||F(q) – F(a)||. • F’(q, a, b) > 0 means “q is closer to a.” • F’(q, a, b) < 0 means “q is closer to b.”

F Rd original space X Key Observation b a q • If classifier F’ is perfect, then for every q, F(NN(q)) = NN(F(q)). • If F(q) is closer to F(b) than to F(NN(q)), then triple (q, a, b) is misclassified.

F Rd original space X Key Observation b a q • Classification error on triples (q, NN(q), b) measures how well F preserves nearest neighbor structure.

Optimization Criterion • Goal: construct an embedding F optimized for k-nearest neighbor retrieval. • Method: maximize accuracy of F’ on triples (q, a, b) of the following type: • q is any object. • a is a k-nearest neighbor of q in the database. • b is in database, but NOT a k-nearest neighbor of q. • If F’ is perfect on those triples, then F perfectly preserves k-nearest neighbors.

1D Embeddings as Weak Classifiers • 1D embeddings define weak classifiers. • Better than a random classifier (50% error rate).

Lincoln Detroit LA Chicago New York Cleveland Chicago LA Detroit New York

1D Embeddings as Weak Classifiers • 1D embeddings define weak classifiers. • Better than a random classifier (50% error rate). • We can define lots of different classifiers. • Every object in the database can be a reference object.

1D Embeddings as Weak Classifiers • 1D embeddings define weak classifiers. • Better than a random classifier (50% error rate). • We can define lots of different classifiers. • Every object in the database can be a reference object. Question: how do we combine many such classifiers into a single strong classifier?

1D Embeddings as Weak Classifiers • 1D embeddings define weak classifiers. • Better than a random classifier (50% error rate). • We can define lots of different classifiers. • Every object in the database can be a reference object. Question: how do we combine many such classifiers into a single strong classifier? Answer: use AdaBoost. • AdaBoost is a machine learning method designed for exactly this problem.

Fn F2 F1 Using AdaBoost original space X Real line • Output: H = w1F’1 + w2F’2 + … + wdF’d . • AdaBoost chooses 1D embeddings and weighs them. • Goal: achieve low classification error. • AdaBoost trains on triples chosen from the database.

From Classifier to Embedding H = w1F’1 + w2F’2 + … + wdF’d AdaBoost output What embedding should we use? What distance measure should we use?

From Classifier to Embedding H = w1F’1 + w2F’2 + … + wdF’d AdaBoost output BoostMap embedding F(x) = (F1(x), …, Fd(x)).

D((u1, …, ud), (v1, …, vd)) = i=1wi|ui – vi| d From Classifier to Embedding H = w1F’1 + w2F’2 + … + wdF’d AdaBoost output BoostMap embedding F(x) = (F1(x), …, Fd(x)). Distance measure

D((u1, …, ud), (v1, …, vd)) = i=1wi|ui – vi| d From Classifier to Embedding H = w1F’1 + w2F’2 + … + wdF’d AdaBoost output BoostMap embedding F(x) = (F1(x), …, Fd(x)). Distance measure Claim: Let q be closer to a than to b. H misclassifies triple (q, a, b) if and only if, under distance measure D, F maps q closer to b than to a.

Learning Embeddings for Similarity-Based Retrieval

Learning Embeddings for Similarity-Based Retrieval

Presentation Transcript

Selecting Distinctive 3D Shape Descriptors for Similarity Retrieval

Chapter 7 Similarity Based Retrieval

Learning More Powerful Test Statistics for Click-Based Retrieval Evaluation

Learning with Similarity Functions

Learning Near-Isometric Linear Embeddings

An Active Learning Framework for Content-Based Information Retrieval

Learning Near-Isometric Linear Embeddings

Learning Near-Isometric Linear Embeddings

A Similarity Retrieval System for Multimodal Functional Brain Images

Matching Similarity for Keyword - based Clustering

Learning Techniques for Information Retrieval

Feature Based Similarity

Feature Based Similarity

Genetic Learning for Information Retrieval

Similarity-based matching for face authentication

Neighborhood sequences for comparing similarity vectors in image retrieval

Content-Based Similarity Search

Feature Sets Based Similarity Measures for Image Retrieval

Learning Embeddings for Similarity-Based Retrieval

Similarity based deduplication

Adaptive tree similarity learning for image retrieval

Selecting Distinctive 3D Shape Descriptors for Similarity Retrieval