CS598Visual Information retrieval

CS598Visual Information retrieval Lecture VIII: LSH Recap and Case Study on Face Retrieval

Lecture VIII Part I: Locality Sensitive Hashing Re-cap Slides credit: Y. Gong

The problem • Large scale image search: • Given a query image • Want to search a large database to find similar images • E.g., search the internet to find similar images • Need fast turn-around time • Need accurate search results

Large Scale Image Search • Find similar images in a large database Kristen Grauman et al

Internet Large scale image search Internet contains billions of images Search the internet • The Challenge: • Need way of measuring similarity between images (distance metric learning) • Needs to scale to Internet (How?)

Large scale image search • Representation must fit in memory (disk too slow) • Facebook has ~10 billion images (1010) • PC has ~10 Gbytes of memory (1011 bits)  Budget of 101 bits/image Fergus et al

Requirements for image search • Search must be both fast, accurate andscalable • Fast • Kd-trees: tree data structure to improve search speed • Locality Sensitive Hashing: hash tables to improve search speed • Small code: binary small code (010101101) • Scalable • Require very little memory • Accurate • Learned distance metric

Summarization of existing algorithms • Tree Based Indexing • Spatial partitions (i.e.kd-tree) and recursive hyper plane decomposition provide an efficient means to search low-dimensional vector data exactly. • Hashing • Locality-sensitive hashing offers sub-linear time search by hashing highly similar examples together. • Binary Small Code • Compact binary code, with a few hundred bits per image

Tree Based Structure • Kd-tree • The kd-tree is a binary tree in which every node is a k-dimensional point • They are known to break down in practice for high dimensional data, and cannot provide better than a worst case linear query time guarantee.

Locality Sensitive Hashing • Hashing methods to do fast Nearest Neighbor (NN) Search • Sub-liner time search by hashing highly similar examples together in a hash table • Take random projections of data • Quantize each projection with few bits • Strong theoretical guarantees • More detail later

Binary Small Code • 1110101010101010 • Binary? • 0101010010101010101 • Only use binary code (0/1) • Small? • A small number of bits to code each image • i.e. 32 bits, 256 bits • How could this kind of small code improve the image search speed? More detail later.

Detail of these algorithms 1. Locality sensitive hashing • Basic LSH • LSH for learned metric 2. Small binary code • Basic small code idea • Spectral hashing

1. Locality Sensitive Hashing • The basic idea behind LSH is to project the data into a low-dimensional binary (Hamming) space • i.e., each data point is mapped to a b-bit vector, called the hash key. • Each hash function h must satisfy the locality sensitive hashing property: • where ∈ [0, 1] is the similarity function of interest Datar, N. Immorlica, P. Indyk, and V. Mirrokni. Locality-Sensitive Hashing Scheme Based on p-Stable Distributions. In SOCG, 2004.

LSH functions for dot products The hashing function of LSH to produce Hash Code is a hyperplane separating the space

1. Locality Sensitive Hashing • Take random projections of data • Quantize each projection with few bits 101 0 Feature vector 1 No learning involved 0 1 1 0 Slide credit: Fergus et al.

How to search from hash table? N Xi h h r1…rk r1…rk Q A set of data points Hash function Search the hash table for a small set of images << N Hash table 110101 110111 Q 111101 New query results

Could we improve LSH? • Could we utilize learned metric to improve LSH? • How to improve LSH from learned metric? • Assume we have already learned a distance metric A from domain knowledge • XTAX has better quantity than simple metrics such as Euclidean distance

How to learn distance metric? • First assume we have a set of domain knowledge • Use the methods described in the last lecture to learn distance metric A • As discussed before, • Thus is a linear embedding function that embeds the data into a lower dimensional space • Define G =

LSH functions for learned metrics Given learned metric with G could be viewed as linear parametric function or a linear embedding function for data x Thus the LSH function could be: The key idea is first embed the data into a lower space by G and then do LSH in the lower dimensional space Data embedding Jain, B. Kulis, and K. Grauman. Fast Image Search for Learned Metrics. In CVPR, 2008

Some results for LSH • Caltech-101 data set • Goal: Exemplar-based Object Categorization • Some exemplars • Want to categorize the whole data set

Results: object categorization [CORR] Caltech-101 database ML = metric learning Kristen Grauman et al

Question ? • Is Hashing fast enough? • Is sub-linear search time fast enough? • For retrieving (1 + e) near neighbors is bounded by O(n1/(1+e) ) • Is it fast enough? • Is it scalable enough? (adapt to the memory of a PC?)

NO! • Small binary code could do better. • Cast an image to a compact binary code, with a few hundred bits per image. • Small code is possible to perform real-time searches with millions from the Internet using a single large PC. • Within 1 second! (for 80 million data  0.146 sec.) • 80 million data (~300G)  120M

Binary Small code • First introduced in text search/retrieval • Salakhutdinov and Hinton [3] introduced it for text documents retrieval • Introduced to computer vision by Torralbaet al [4]. [3] RuslanSalakhutdinov and Geoffrey Hinton, "Semantic Hashing. International Journal of Approximate Reasoning," 2009 [4] A. Torralba, R. Fergus, and Y. Weiss. "Small Codes and Large Databases for Recognition." In CVPR, 2008.

Semantic Hashing RuslanSalakhutdinov and Geoffrey Hinton Semantic Hashing. International Journal of Approximate Reasoning, 2009 Query Semantic HashFunction Address Space Binary code Images in database Query address Semantically similar images Quite differentto a (conventional)randomizing hash Fergus et al

Semantic Hashing • Similar points are mapped into similar small code • Then store these code into memory and compute hamming distance (very fast, carried out by hardware)

Overall Query Scheme <10μs Binary code Image 1 generate small code Retrieved images <1ms Store into memory Use hard ware to compute hamming distance Query Image Feature vector Feature vector ~1ms (in Matlab) Fergus et al

Search Framework • Produce binary code (01010011010) • Store these binary code into the memory • Use hardware to compute the hamming distance (very fast) • Sort the Hamming distances and get final ranking results

how to learn small binary code • Simplest method (use median) • LSH are already able to produce binary code • Restricted Boltzmann Machines (RBM) • Optimal small binary code by spectral hashing

1. Simple Binarization Strategy Set threshold (unsupervised) - e.g. use median 0 1 0 1 Fergus et al

2. Locality Sensitive Hashing • LSH is ready to generate binary code (unsupervised) • Take random projections of data • Quantize each projection with few bits 101 0 Feature vector 1 0 No learning involved 1 1 0 Fergus et al

3. RBM [3] to generate code • Not going into detail, see Salakhutdinov and Hinton for detail • Use a deep neural network to train small code • Supervised method R. R. Salakhutdinov and G. E. Hinton. Learning a Nonlinear Embedding by Preserving Class NeighbourhoodStructure. In AISTATS, 2007.

LabelMe retrieval • LabelMe is a large database with human annotated images • The goal of this experiment is to • First generate small code • Use hamming distance to search for similar images • Sort the results to produce final ranking • Gist descriptor: ground truth

Examples of LabelMe retrieval • 12 closest neighbors under different distance metrics

Test set 2: Web images • 12.9 million images from tiny image data set • Collected from Internet • No labels, so use Euclidean distance between Gist vectors as ground truth distance • Note: Gist descriptor is a kind of feature widely used in computer vision for

Examples of Web retrieval • 12 neighbors using different distance metrics Fergus et al

Web images retrieval Observation: more codes get better performance

Retrieval Timings Fergus et al

Summary • Image search should be • Fast • Accurate • Scalable • Tree based methods • Locality Sensitive Hashing • Binary Small Code (state of the art)

Questions?

Lecture VIII Part II: Face Retrieval

Challenges • Poses, lighting and facial expressions confront recognition • Efficiently matching against large gallery dataset is nontrivial • Large number of subjects matters Gallery faces ? … … … … … …

Contextual face recognition • Design robust face similarity metric • Leverage local feature context to deal with visual variations • Enable large-scale face recognition tasks • Design efficient and scalable matching metric • Employ image level context and active learning • Utilize social network context to scope recognition • Design social network priors to improve recognition

Preprocessing Input to our algorithm Face Detection Boosted cascade Eye Detection Neural network Face alignment Similarity transform to canonical frame Illumination normalization Self-quotient image [Viola-Jones ‘01] [Wang et. al. ‘04]

Feature extraction Patches 8×8, extracted on a regular grid at each scale Gaussian pyramid Dense sampling in scale DAISY Shape One feature descriptor per patch: < > { f1 … fn } 0 * fiε R400 n ≈ 500 Filtering Convolution with 4 oriented fourth-derivative of Gaussian quadrature pairs Spatial Aggregation Log-polar arrangement of 25 Gaussian-weighted regions

Face representation & matching Adjoin Spatial information: fn xn yn f1 x1 y1 … … { f1 … fn } { g1, g2 … gn } Quantizating by a forest of randomized trees in Feature Space × Image Space : Tk T1 T2 f1 x1 y1 τ w, … < > Each feature gi contributes to k bins of the combined histogram vector h. IDF weighted L1 norm: wi = log ( #{ training h : h(i) > 0 } / #training ). d( h, h’ ) = Σi wi | h(i) – h’(i) |

Randomized projection trees w a random projection: w ~ N( 0, Σ ). { w, [f x y]’ } Linear decision at each node: τ Normalizes spatial and feature parts τ = median Can also be randomized w { w, [f x y]’ } τ • Why random projections? • Simple • Interact well with high-dimensional sparse data (feature descriptors!) • Generalize trees used previously used for vision tasks (kd-trees, Extremely Randomized Forests) < > [Dasgupa & Freund, Wakin et. al., ...] [Guerts, LePetit & Fua, ...] Additional data-dependence can be introduced through multiple trials: Select a (w, τ) pair that minimizes a cost function (i.e., MSE, conditional entropy)

Implicit elastic matching { g1 … gn } Even with good descriptors, dense correspondence is difficult: ??? { g’1 … g’n } Enable efficient inverted file indexing!! Bin 1 gi g’j Instead, quantize features such that corresponding features likely map to same bin: Bin 2 … Number of “soft matches’’ is given by comparing histograms: h h’ … … Histogram of joint quantizations Similarity(I,I’) = 1 - || h - h’ || w1

Ross Query face … … … … … … … … … Gallery faces

Exploring the optimal settings • A subset of PIE for exploration (11554 faces / 68 users) • 30 faces per person are used for inducing the trees • Three settings to explore • Histogram distance metric • Tree depth • Number of trees

CS598Visual Information retrieval

CS598Visual Information retrieval

Presentation Transcript

Information Retrieval

Information retrieval

Information Retrieval

Information retrieval

Information Retrieval

Information Retrieval

Information Retrieval

Information Retrieval

Information Retrieval

Information Retrieval

Information Retrieval

Information Retrieval

CS598Visual Information retrieval

Information Retrieval

Information Retrieval

Information Retrieval

Information Retrieval

information retrieval

Information Retrieval