Packing bag-of-features

Packing bag-of-features ICCV 2009 Herv´eJ´egou MatthijsDouze CordeliaSchmid INRIA

Introduction • Introduction • Proposed method • Experiments • Conclusion

Bag-of-features Extracting local image descriptors The histogram of visual word is weighted using the tf-idf weighting scheme of [12] & subsequently normalized with L2 norm Clustering of the descriptors & k-means quantizer(visual words) Roducing a frequency vector fi of length k

TF–IDF weighting

TF–IDF weighting • tf • 100 vocabularies in a document, ‘a’ 3 times • 0.03 (3/100) • idf • 1,000 documents have ‘a’, total number of documents 10,000,000 • 9.21 ( ln(10,000,000 / 1,000) ) • if-idf = 0.28( 0.03 * 9.21)

Binary BOF[12] • discard the information about the exact number of occurrences of a given visual word in the image. • Binary BOF vector components only indicates the presence or not of a particular visual word in the image. • A sequential coding using 1 bit per component, ⌈k/8⌉ bytes per image, the memory usage per image would be typically 10 kB per image [12] J. Sivic and A. Zisserman. Video Google: A text retrieval approach to object matching in videos. In ICCV, pages 1470–1477, 2003.

Binary BOF(Holidays dataset)

Inverted-file index(Sparsity) • Documents • T0 = "it is what it is" • T1 = "what is it" • T2 = "it is a banana" • Index • "a": {2} • "banana": {2} • "is": {0, 1, 2} • "it": {0, 1, 2} • "what": {0, 1}

Binary BOF

Compressed inverted file • Compression can close to the vector entropy • Compared with a standard inverted file, about 4 times more images can be indexed using the same amount of memory • This may compensate the decoding cost of the decompression algorithm [16] J. Zobel and A. Moffat. Inverted files for text search engines. ACM Computing Surveys, 38(2):6, 2006.

MiniBOFs

Projection of a BOF • Sparse projection matices • d: dimension of the output descriptor • k: dimension of the input BOF • For each matrix row, the number of non-zero components is , typically set nz = 8 for k = 1000, resulting in d = 125

Projection of a BOF • The other matrices are defined by random permutations. • For k = 12 and d = 3, the random permutation (11, 2, 12, 8; 9, 4, 10, 1; 7, 5, 6, 3) • Image i , m mini-BOFs • , ( )

Indexing structure • Quantization • The miniBOF is quantized by associated with matrix , , where is the number of codebook entries of the indexing structure. • The set of k-means codebooks is learned off-line using a large number of miniBOF vectors, here extracted from the Flickr1M* dataset. The dictionary size associated with the minBOFs is not related to the one associated with the initial SIFT descriptors, hence we may choose . We typically set = 20000.

Indexing structure • Binary signature generation • The miniBOF is projected using a random rotation matrix R, producing d components • Each bit of the vector is obtained by comparing the value projected by R to the median value of the elements having the same quantized index. The median values for all quantizing cells and all projection directions are learned off-line on our independent dataset

Quantizing cells [4] H. Jegou, M. Douze, and C. Schmid. Hamming embedding and weak geometric consistency for large scale image search. In ECCV, 2008.

Indexing structure • miniBOF associated with image i is represented by the tuple • total memory usage per image is bytes

Multi-probe strategy • retrieving not only the inverted list associated with the quantized index , but the set of inverted lists associated with the closest tcentroidsof the quantizer codebook • T times image hits

Fusion • Query signature • Database signature

Fusion • equal to 0 for images having no observed binary signatures • equal to if the database image i is the query image itself

Fusion

Dataset • Two annotated Dataset • INRIA Holidays dataset [4] • University of Ken-tucky recognition benchmark [9] • Distractor dataset • one million images downloaded from Flickr, Flickr1M • Learning parameters • Flickr1M∗

Detail • Descriptor extraction • Resize to a maximum of 786432 pixels • Performed a slight intensity normalization • SIFT • Evaluation • Recall@N • mAP • Memory • Image hits • Parameters # Using a value of nz between 8 and 12 provides the best accuracy for vocabulary sizes ranging from 1k to 20k.

mAP • Mean average precision • EX: • two images A&B • A has 4 duplicate images • B has 5 duplicate images • Retrieval rank A: 1, 2, 4, 7 • Retrieval rank B: 1, 3, 5 • Average precision A = (1/1+2/2+3/4+4/7)/4=0.83 • Average precision B = (1/1+2/3+3/5+0+0)/3=0.45 • mAP= (0.83+0.45)/2=0.64

Table 1(Holidays) # The number of bytes used per inverted list entry is 4 bytes for binary BOF & 5 bytes for BOF

Table 2(Kentucky)

Table 3(Holidays+Flickr1M)

Figure(Holidays+Flickr1M) # Our approach requires 160 MB for m = 8 and the query is performed in 132ms, to be compared, respectively, with 8 GB and 3s for BOF.

Sample

Conclusion • This paper have introduced a way of packing BOFs:miniBOFs • An efficient indexing structure for rapid access and an expected distance criterion for the fusion of the scores • Reduces memory usage • Reduces the quantity of memory scanned (hits) • Reduces query time

Packing bag-of-features

Packing bag-of-features

Presentation Transcript

Bag-of-features models

Bag of Features Approach: recent work, using geometric information

Bag-of-features models

Packing bag-of-features ICCV2009

Bag-of-features models

Bag of Words

Bag-of-features for category recognition

Bag-of-features for category recognition

Packing of Objects

Bag of Features Tracking

Bag To Bag

PACKING

Bag packing for Air Cadets

Pouch Packing Machine Is Here With Various Salient Features

Features of Packing Cubes

FEATURES OFFERED BY PACKING MACHINES

Packing Baby Essentials In A Diaper Bag

Automatic Bag Packing Machine

Bag-of-features for category recognition

Local Features and Bag of Words Models

Bag------ Non woven bag Polyester bag Paper bag Neoprene bag Waterproof cellphone bag

The various types of Packing Machinery with distinct features