Spectral Hashing

Spectral Hashing Y. Weiss (Hebrew U.) A. Torralba (MIT) Rob Fergus (NYU)

What does the world look like? Motivation High level image statistics Object Recognition for large-scale search

Semantic Hashing [Salakhutdinov & Hinton, 2007] Query Image Semantic HashFunction Address Space Binary code Images in database Query address Semantically similar images Quite differentto a (conventional)randomizing hash

1. Locality Sensitive Hashing • Gionis, A. & Indyk, P. & Motwani, R. (1999) • Take random projections of data • Quantize each projection with few bits 101 0 Gist descriptor 1 0 No learning involved 1 1 0

Toy Example • 2D uniform distribution

2. Boosting • Modified form of BoostSSC [Shaknarovich, Viola & Darrell, 2003] • Positive examples are pairs of similar images • Negative examples are pairs of unrelated images 0 Learn threshold & dimension for each bit (weak classifier) 0 1 1 0 1

3. Restricted Boltzmann Machine (RBM) • Type of Deep Belief Network • Hinton & Salakhutdinov, Science 2006 Hidden units Units are binary & stochastic SingleRBMlayer Symmetric weights W Visible units • Attempts to reconstruct input at visible layer from activation of hidden layer

Multi-Layer RBM: non-linear dimensionality reduction Output binary code (N dimensional) N Layer 3 w3 256 256 Layer 2 w2 512 512 Layer 1 w1 512 Linear units at first layer Input Gist vector (512 dimensions)

2-D Toy example: 3 bits 7 bits 15 bits Distance from query point Red – 0 bits Green – 1 bit Black – >2 bits Blue – 2 bits Query Point

Toy Results Distance Red – 0 bits Green – 1 bit Blue – 2 bits

Semantic Hashing [Salakhutdinov & Hinton, 2007] Query Image Semantic HashFunction Address Space Binary code Images in database Query address Semantically similar images Quite differentto a (conventional)randomizing hash

Spectral Hash Query Image SpectralHash Non-lineardimensionality reduction Address Space Binary code Images in database Real-valuedvectors Query address Semantically similar images Quite differentto a (conventional)randomizing hash

Spectral Hashing (NIPS ’08) • Assume points are embedded in Euclidean space • How to binarize so Hamming distance approximates Euclidean distance? Ham_Dist(10001010,11101110)=3

Spectral Hashing theory • Want to min YT(D-W)Y subject to: • Each bit on 50% of time • Bits are independent • Sadly, this is NP-complete • Relax the problem, by letting Y be continuous. • Now becomes eigenvector problem

Nystrom Approximation • Method for approximating eigenfunctions • Interpolate between existing data points • Requires evaluation of distance to existing data cost grows linearly with #points • Also overfits badly in practice

What about a novel data point? • Need a function to map new points into the space • Take limit of Eigenvalues as n\inf • Need to carefully normalize graph Laplacian • Analytical form of Eigenfunctions exists for certain distributions (uniform, Gaussian) • Constant time compute/evaluate new point • For uniform: Only depends on extent of distribution (b-a)

Eigenfunctions for uniform distribution

The Algorithm Input: Data {xi} of dimensionality d; desired # bits, k • Fit a multidimensional rectangle to the data • Run PCA to align axes, then bound uniform distribution • For each dimension, calculate k smallest eigenfunctions. • This gives dkeigenfunctions. Pick ones with smallest k eigenvalues. • Threshold eigenfunctions at zero to give binary codes

1. Fit Multidimensional Rectangle • Run PCA to align axes • Bound uniform distribution

2. Calculuate Eigenfunctions

3. Pick k smallest Eigenfunctions Eigenvalues e.g. k=3

4. Threshold chosen Eigenfunctions

Back to the 2-D Toy example 3 bits 7 bits 15 bits Distance Red – 0 bits Green – 1 bit Blue – 2 bits

2-D Toy Example Comparison

10-D Toy Example

Experimentson Real Data

Input Image representation: Gist vectors • Pixels not a convenient representation • Use Gist descriptor instead (Oliva & Torralba, 2001) • 512 dimensions/image (real-valued  16,384 bits) • L2 distance btw. Gist vectors not bad substitute for human perceptual distance NO COLOR INFORMATION Oliva & Torralba, IJCV 2001

LabelMe images • 22,000 images (20,000 train | 2,000 test) • Ground truth segmentations for all • Assume L2 Gist distance is true distance

LabelMe data

Extensions

How to handle non-uniform distributions

Bit allocation between dimensions • Compare value of cuts in original space, i.e. before the pointwise nonlinearity.

Summary • Spectral Hashing • Simple way of computing good binary codes • Forced to make big assumption about data distribution • Use point-wise non-linearities to map distribution to uniform • Need more experiments on real data

Overview • Assume points are embedded in Euclidean space (e.g. output from RBM) • How to binarize the space so that Hamming distance between points approximates L2 distance?

Semantic Hashing beyond 30 bits

Strategies for Binarization • Deliberately add noise during backprop - forces extreme values to overcome noise 0 1 0 1

Spectral Hashing

Spectral Hashing

Presentation Transcript

Hashing

Hashing

Hashing

Hashing

Hashing

Hashing

Hashing

HASHING

Hashing

Hashing

Hashing

Hashing

Hashing

HASHING

Hashing

Hashing

Hashing, Hashing Tables

Hashing

Hashing

Hashing