Indexing Techniques

Indexing Techniques Mei-Chen Yeh

Last week • Matching two sets of features • Strategy 1 • Convert to a fixed-length feature vector (Bag-of-words) • Use a conventional proximity measure • Strategy 2: • Build point correspondences

visual vocabulary ….. Last week: bag-of-words frequency codewords

Matching local features: building patch correspondences ? Image 2 Image 1 To generate candidate matches, find patches that have the most similar appearance (e.g., lowest SSD) Slide credits: Prof. Kristen Grauman

Matching local features: building patch correspondences ? Image 2 Image 1 Simplest approach: compare them all, take the closest (or closest k, or within a thresholded distance) Slide credits: Prof. Kristen Grauman

Indexing local features • Each patch / region has a descriptor, which is a point in some high-dimensional feature space (e.g., SIFT) Descriptor’s feature space Database images

Indexing local features • When we see close points in feature space, we have similar descriptors, which indicates similar local content. Query image Descriptor’s feature space Database images

Problem statement • With potentially thousands of features per image, and hundreds to millions of images to search, how to efficiently find those that are relevant to a new image?

4m 50 thousand images Slide credit: Nistér and Stewénius

110 million images?

Scalability matters!

The Nearest-Neighbor Search Problem • Given • A set S of n points in d dimensions • A query point q • Which point in S is closest to q? Time complexity of linear scan: O( ? ) dn ?

The Nearest-Neighbor Search Problem

The Nearest-Neighbor Search Problem • r-nearest neighbor • for any query q, returns a point p ∈ S s.t. • c-approximate r-nearest neighbor • for any query q, returns a point p’ ∈ S s.t.

Today • Indexing local features • Inverted file • Vocabulary tree • Locality sensitivity hashing

Indexing local features: inverted file

Indexing local features: inverted file • For text documents, an efficient way to find all pages on which a wordoccurs is to use an index. • We want to find all images in which a feature occurs. • page ～ image • word ～ feature • To use this idea, we’ll need to map our features to “visual words”.

Text retrieval vs. image search • What makes the problems similar, different?

Visual words e.g., SIFT descriptor space: each point is 128-dimensional • Extract some local features from a number of images … Slide credit: D. Nister, CVPR 2006

Visual words

Each point is a local descriptor, e.g. SIFT vector.

Example: Quantize into 3 words

Visual words • Quantize via clustering, let cluster centers be the prototype “words” • Determine which word to assign to each new image region by finding the closest cluster center. Word #2 Descriptor’s feature space • Map high-dimensional descriptors to tokens/words by quantizing the feature space

Visual words • Each group of patches belongs to the same visual word! Figure from Sivic & Zisserman, ICCV 2003

Visual vocabulary formation Issues: • Sampling strategy: where to extract features? Fixed locations or interest points? • Clustering / quantization algorithm • What corpus provides features (universal vocabulary?) • Vocabulary size, number of words • Weight of each word?

Inverted file index The index maps word-to-image ids Why the index give us a significant gain in efficiency?

Inverted file index A query image is matched to database images that share visual words.

tf-idf weighting • Term frequency – inverse document frequency • Describe the frequency of each word within an image, decrease the weights of the words that appear often in the database • economic, trade, … • the, most, we, … w↗ discriminative regions common regions w↘

tf-idf weighting • Term frequency – inverse document frequency • Describe the frequency of each word within an image, decrease the weights of the words that appear often in the database Total number of documents in database Number of occurrences of word iin document d Number of documents word i occurs in, in whole database Number of words in document d

Bag-of-Words + Inverted file Bag-of-words representation http://people.cs.ubc.ca/~lowe/keypoints/ Inverted file http://www.robots.ox.ac.uk/~vgg /research/vgoogle/index.html

D. Nistér and H. Stewenius. Scalable Recognition with a Vocabulary Tree, CVPR 2006.

Visualize as a tree

Vocabulary Tree Training: Filling the tree [Nister & Stewenius, CVPR’06] Slide credit: David Nister

Vocabulary Tree Training: Filling the tree [Nister & Stewenius, CVPR’06] 42 Slide credit: David Nister

Vocabulary Tree Recognition Retrieved Or perform geometric verification [Nister & Stewenius, CVPR’06] Slide credit: David Nister

Think about the computational advantage of the hierarchical tree vs. a flat vocabulary!

Hashing

Direct addressing • Create a direct-address table with mslots 0123456789 U (universe of keys) key satellite data 2 4 6 9 0 1 7 3 K (actual keys) 5 2 3 5 8 8

Direct addressing • Search operation: O(1) • Problem: The range of keys can be large! • 64-bit numbers => 18,446,744,073,709,551,616 different keys • SIFT: 128 * 8 bits • U • K

Hashing • O(1) average-case time • Use a hash function h to compute the slot from the key k • T: hash table 0 U (universe of keys) h(k1) may not be k1 anymore! may share a bucket h(k4) K (actual keys) h(k5) = h(k3) k1 k4 m-1 k5 k3

Hashing • A good hash function • Satisfies the assumption of simple uniform hashing: each key is equally likely to hash to any of the m slots. • How to design a hash function for indexing high-dimensional data?

128-d • T: hash table ?

Indexing Techniques