1 / 34

Packing bag-of-features

Packing bag-of-features. ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA. Introduction. Introduction Proposed method Experiments Conclusion. Introduction. Introduction Proposed method Experiments Conclusion. Bag-of-features. Extracting local image descriptors.

gisela
Download Presentation

Packing bag-of-features

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Packing bag-of-features ICCV 2009 Herv´eJ´egou MatthijsDouze CordeliaSchmid INRIA

  2. Introduction • Introduction • Proposed method • Experiments • Conclusion

  3. Introduction • Introduction • Proposed method • Experiments • Conclusion

  4. Bag-of-features Extracting local image descriptors The histogram of visual word is weighted using the tf-idf weighting scheme of [12] & subsequently normalized with L2 norm Clustering of the descriptors & k-means quantizer(visual words) Roducing a frequency vector fi of length k

  5. TF–IDF weighting

  6. TF–IDF weighting • tf • 100 vocabularies in a document, ‘a’ 3 times • 0.03 (3/100) • idf • 1,000 documents have ‘a’, total number of documents 10,000,000 • 9.21 ( ln(10,000,000 / 1,000) ) • if-idf = 0.28( 0.03 * 9.21)

  7. Binary BOF[12] • discard the information about the exact number of occurrences of a given visual word in the image. • Binary BOF vector components only indicates the presence or not of a particular visual word in the image. • A sequential coding using 1 bit per component, ⌈k/8⌉ bytes per image, the memory usage per image would be typically 10 kB per image [12] J. Sivic and A. Zisserman. Video Google: A text retrieval approach to object matching in videos. In ICCV, pages 1470–1477, 2003.

  8. Binary BOF(Holidays dataset)

  9. Inverted-file index(Sparsity) • Documents • T0 = "it is what it is" • T1 = "what is it" • T2 = "it is a banana" • Index • "a": {2} • "banana": {2} • "is": {0, 1, 2} • "it": {0, 1, 2} • "what": {0, 1}

  10. Binary BOF

  11. Compressed inverted file • Compression can close to the vector entropy • Compared with a standard inverted file, about 4 times more images can be indexed using the same amount of memory • This may compensate the decoding cost of the decompression algorithm [16] J. Zobel and A. Moffat. Inverted files for text search engines. ACM Computing Surveys, 38(2):6, 2006.

  12. Introduction • Introduction • Proposed method • Experiments • Conclusion

  13. MiniBOFs

  14. Projection of a BOF • Sparse projection matices • d: dimension of the output descriptor • k: dimension of the input BOF • For each matrix row, the number of non-zero components is , typically set nz = 8 for k = 1000, resulting in d = 125

  15. Projection of a BOF • The other matrices are defined by random permutations. • For k = 12 and d = 3, the random permutation (11, 2, 12, 8; 9, 4, 10, 1; 7, 5, 6, 3) • Image i , m mini-BOFs • , ( )

  16. Indexing structure • Quantization • The miniBOF is quantized by associated with matrix , , where is the number of codebook entries of the indexing structure. • The set of k-means codebooks is learned off-line using a large number of miniBOF vectors, here extracted from the Flickr1M* dataset. The dictionary size associated with the minBOFs is not related to the one associated with the initial SIFT descriptors, hence we may choose . We typically set = 20000.

  17. Indexing structure • Binary signature generation • The miniBOF is projected using a random rotation matrix R, producing d components • Each bit of the vector is obtained by comparing the value projected by R to the median value of the elements having the same quantized index. The median values for all quantizing cells and all projection directions are learned off-line on our independent dataset

  18. Quantizing cells [4] H. Jegou, M. Douze, and C. Schmid. Hamming embedding and weak geometric consistency for large scale image search. In ECCV, 2008.

  19. Indexing structure • miniBOF associated with image i is represented by the tuple • total memory usage per image is bytes

  20. Multi-probe strategy • retrieving not only the inverted list associated with the quantized index , but the set of inverted lists associated with the closest tcentroidsof the quantizer codebook • T times image hits

  21. Fusion • Query signature • Database signature

  22. Fusion • equal to 0 for images having no observed binary signatures • equal to if the database image i is the query image itself

  23. Fusion

  24. Introduction • Introduction • Proposed method • Experiments • Conclusion

  25. Dataset • Two annotated Dataset • INRIA Holidays dataset [4] • University of Ken-tucky recognition benchmark [9] • Distractor dataset • one million images downloaded from Flickr, Flickr1M • Learning parameters • Flickr1M∗

  26. Detail • Descriptor extraction • Resize to a maximum of 786432 pixels • Performed a slight intensity normalization • SIFT • Evaluation • Recall@N • mAP • Memory • Image hits • Parameters # Using a value of nz between 8 and 12 provides the best accuracy for vocabulary sizes ranging from 1k to 20k.

  27. mAP • Mean average precision • EX: • two images A&B • A has 4 duplicate images • B has 5 duplicate images • Retrieval rank A: 1, 2, 4, 7 • Retrieval rank B: 1, 3, 5 • Average precision A = (1/1+2/2+3/4+4/7)/4=0.83 • Average precision B = (1/1+2/3+3/5+0+0)/3=0.45 • mAP= (0.83+0.45)/2=0.64

  28. Table 1(Holidays) # The number of bytes used per inverted list entry is 4 bytes for binary BOF & 5 bytes for BOF

  29. Table 2(Kentucky)

  30. Table 3(Holidays+Flickr1M)

  31. Figure(Holidays+Flickr1M) # Our approach requires 160 MB for m = 8 and the query is performed in 132ms, to be compared, respectively, with 8 GB and 3s for BOF.

  32. Sample

  33. Introduction • Introduction • Proposed method • Experiments • Conclusion

  34. Conclusion • This paper have introduced a way of packing BOFs:miniBOFs • An efficient indexing structure for rapid access and an expected distance criterion for the fusion of the scores • Reduces memory usage • Reduces the quantity of memory scanned (hits) • Reduces query time

More Related