A decision-theoretic view of image retrieval

A decision-theoretic view of image retrieval Nuno Vasconcelos Compaq Computer Corporation Cambridge Research Lab http://www.media.mit.edu/~nuno

horses Texture similarity Color similarity Shape similarity Content-based retrieval • allow users to express queries directly in visual domain • user provides query image • system extracts low-level features (texture, color, shape) • signature compared with those extracted from database • top matches returned Nuno Vasconcelos

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ? + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + = + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Retrieval architecture • three main components • feature transformation • feature representation • similarity function • previous solutions have concentrated on some components • two main strategies: • texture: features • color: representation • need: criteria to guide the design of all components + + + + + Nuno Vasconcelos

Decision-theoretic formulation • given: feature space X and set Y={1,…,C} of classes • goal: design map that minimizes probability of retrieval error • Bayes classifier is optimal • establishes andoptimal criteria for image similarity Nuno Vasconcelos

Bayes: Battacharyya: ML: Kullback Leibler: c2: Quadratic: Mahalanobis: Euclidean: A unified view of image similarity Battacharyya 2 way bound c2 Bayes linearization equal priors Kullback Leibler Quadratic Mahalanobis Euclidean ML Large, iid query Gaussian S orthogonal Sq = Si Si = I Nuno Vasconcelos

Feature transformation • probability of error is lower bounded by Bayes error: • Theorem: for a retrieval system withobservation space Z and a feature transformationthe Bayes error on X can never be smaller than that on Z. Equality is achieved if and only if T is invertible. • suggests thatemphasis on features is a bad idea Nuno Vasconcelos

Feature representation • Theorem:for a retrieval system with class probabilities p(y=i) and class-likelihood functions p(x|y=i), and a decision functionthe difference between real and Bayes error is upper bounded by the L1 distance between real and estimated probabilities Nuno Vasconcelos

Feature representation • distance between actual and ideal probability of error (estimation D) is upper bounded by a function of the quality of density estimates • this means: • good estimation is sufficient condition for accurate retrieval • from the theoretical viewpoint,no reason for features • caveat: estimation is difficult in high dimensions Nuno Vasconcelos

Color (estimation)-based retrieval • no features, emphasis on representation (histograms) • problem: low-order statistics are not sufficient • spatial neighborhoods Þ high dimensionality Nuno Vasconcelos

Summary • low Bayes error: avoid features • good image discrimination: • requires high dimensional spaces • estimation is difficult in high dimensions • can lead to large estimation error • fundamental trade-off of image retrieval: • a feature transformation will increase Bayes error but can also reduce estimation error • the two components have to be considered simultaneously! Bayes error < error < Bayes error + estimationD Nuno Vasconcelos

Example: texture recognition • emphasis: discriminant features • simple representation (m, S) and similarity function (MD) • years of research on “good” features, e.g. MRSAR • problem: discriminant for texture but not generic • can we get similar performance with generic transform? • for Bayesian retrieval the features are not so important Nuno Vasconcelos

Designing retrieval systems • the retrieval trade-off: • low Bayes error: invertible feature transformation • low estimation D: expressive feature representation & low-dimensional feature space • directive 1: get the most expressive representation you can afford! • directive 2: role for feature transform is dimensionality reduction • images live on a low-dimensional manifold embedded in high dimensional space • feature transformation should eliminate unnecessary dimensions • while staying as close to invertible as possible Nuno Vasconcelos

Feature representation • among expressive models (kernel estimators) • we like Gaussian mixtures because they are: • compact (computational efficiency) • able to capture details of multi-modal densities (histogram) • computationally tractable in high dimensions (Gaussian) Nuno Vasconcelos

Q3 T T-1 Feature transformation • dimensionality reduction has been thoroughly studied in compression literature • “close to invertible” = minimum reconstruction error Nuno Vasconcelos

Optimal transformation • optimal solution (squared error sense): principal component analysis • for T(x) = Fxiff F*k = [v1,…,vk], vi = ith eigenvector of Lx, l1<…<ln • problems: • squared error is not Bayes error • PCA does not mimic well early human vision Nuno Vasconcelos

Alternative transformations • defining sparse representation as one where the coefficients are close to zero most of the time (high kurtosis) • Olshausen and Field have shown that if we add a sparseness constraint to PCA the resulting basis functions are remarkably similar to the receptive field of the cells found in V1. Nuno Vasconcelos

Basis functions Nuno Vasconcelos

In practice • early stages of vision: dimensionality reduction, but subject to “efficiency” constraints • sparse representations are computationally intensive • can be reasonably approximated by wavelets • we have obtained good results even with the DCT • in summary, this indicates it is possible to have feature transformations that: • achieve good balance between invertibility and dim. reduction • capture the most important aspects of early human vision • have reduced complexity • work needed to find the best transformation Nuno Vasconcelos

Invariance properties • Lemma: restriction of a Gaussian mixture to a linear subspace is still a Gaussian mixture • Gaussian mixture on a multi-resolution feature space: • family of embedded densities over multiple image scales • each dimension adds higher resolution information • DC only = histogram Nuno Vasconcelos

Embedded multi-resolution mixture • explicit control over trade-off between “invariant” and “invertible”(low Bayes error) invariant invertible Nuno Vasconcelos

Impact on retrieval accuracy • overall, the EMM representation: • extends histogram: account for spatial dependencies • extends Gaussian: expressive power to capture density details • combines good properties of color and texture-based approaches precision: % of retrieved that are relevant to query recall: % of relevant that are retrieved Nuno Vasconcelos

Retrieval results • comparison: • Corel DB • 1500 images, 15 classes • methods: • MRSAR+MD (texture) • histogram intersection (color) • color correlograms (both) • DCT+ Gaussian mixtures + ML • Bayesian retrieval with embedded mixtures is clearly superior: up to 10% better than next best method (correlogram) Nuno Vasconcelos

Conclusions • Probabilistic architecture for image similarity • decision-theoretic formulation • unifying view of similarity • optimal guidelines for feature transformation and representation • DCT + Gaussian mixtures • works well across various types of databases Nuno Vasconcelos

Object recognition Bayesian + embedded multi-resolution mixture: Color histograms + Histogram Intersection (Swain & Ballard): Nuno Vasconcelos

Texture recognition Bayesian + embedded resolution mixture: MRSAR model + Mahalanobis distance (Mao & Jain): Nuno Vasconcelos

A decision-theoretic view of image retrieval