1 / 26

Packing bag-of-features ICCV2009

Packing bag-of-features ICCV2009. Herv´ eJ´ egou MatthijsDouze CordeliaSchmid INRIA INRIA INRIA herve.jegou@inria.fr matthijs.douze@inria.fr cordelia.schmid@inria.fr. LEAR ( Lea rning and R ecognition in Vision). OUTLINE. Introduction Background Structure & Approach

dena
Download Presentation

Packing bag-of-features ICCV2009

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Packing bag-of-featuresICCV2009 Herv´ eJ´ egou MatthijsDouze CordeliaSchmid INRIA INRIA INRIA herve.jegou@inria.frmatthijs.douze@inria.frcordelia.schmid@inria.fr LEAR (Learning and Recognition in Vision)

  2. OUTLINE • Introduction • Background • Structure & Approach • Experiments • Conclusion

  3. Introduction • One of the main limitations of image search based on bag-of-feature is the memory usage per image • Provide a method which Reduce the memory usage and faster than standard bag-of -feature

  4. BackgroundBOF(Bag-of-feature)

  5. BOF outline • 1. Extract local image descriptors (features) (Hessian-Affine detector [8] & SIFT descriptor [6]) • 2. Learn “visual vocabulary” (Clustering of the descriptor) • 3.Quantize features using visual vocabulary (K-means quantizer) • 4. Represent images by frequencies of “visual words” histogram of visual word occurrence is weighted using the tf-idf. normalized with the L2 norm Producing a frequency vector fi of length k

  6. tf-idf weighting tf = 0.030 ( 3/100 ) 100 vocabularies in a document, ‘a’ 3 times idf = 13.287 ( log (10,000,000/1,000) ) 1,000 documents have ‘a’, total number of documents 10,000,000 if-idf = 0.398 ( 0.03 * 13.287 ) Visual word (feature) image

  7. Datasets • INRIA Holidays dataset [4] • University of Kentucky recognition benchmark [9] • Flickr1M & Flickr1M*

  8. Binary BOF [12] • Discard the information about the exact number of occurrences of a given visual word in the image • Binary BOF components only indicates the presence or absence of a particular visual word in the image • A sequential coding using 1 bit per component. The memory usage is, then, ┌k/8┐byte per image. The memory usage per image would be typically 10kB per image [12] J.SivicandA.Zisserman. Video Google: A text retrieval Approach to object matching in videos.In ICCV,pages1470–1477,2003.

  9. Binary BOF [12]INRIA Holidays dataset

  10. Compressed inverted file [16] • [16]J.ZobelandA.Moffat. Inverted files for text search engines. ACM Computing Surveys,38(2):6,2006. • Compared with a standard inverted file , about 4 times more image scan be indexed using the same amount of memory. • The amount of memory to be read is proportionally reduced at query time. • This may compensate the decoding cost of the decompression algorithm.

  11. Binary BOF [12]INRIA Holidays dataset

  12. MiniBOFs Performed a query image 1) Producing 2) Indexing 3) Fusing

  13. Projection of a BOF:vocabulary aggregators • Sparse projection matrices A = {A1,...,Am} of sizes d × k d = dimension of the output descriptor k = dimension of the initial BOF • For each projection vector (a matrix row ) , the number of non-zero components is nz = k/d. Typically set nz = 8 for k=1000 ,resulting d = 125

  14. Projection of a BOF: • The other aggregators are defined by shuffling the input BOF vector components using random permutation. • For k =12 , d=3 the random permutation (11,2,12,8,9,4,10,1,7,5,6,3 • Image i , m miniBOFsωi,j , 1 ≤ j ≤ m fi = BOF frequency vector

  15. Indexing structure [4] • Quantization k’ = number of codebook entries of the indexing structure The set of k-means codebooks qj(.), 1<= j <= m, is learned off-line using a large number of miniBOF vectors, here extracted from the Flickr1M* dataset. The miniBOFs is not related to the one associated with the initial SIFT descriptors, hence we may choose k ≠ k’. Typically set k’ = 20000

  16. Indexing structure • Binary signature generation bi,j • Length of d, refine the localization of the miniBOF within the cell • Using the method of [4] • The miniBOF is projected using a random rotation matrix R, producing d components • Each bit of the vector bi,j is obtained by comparing the value projected by R to the median value of the elements having the same quantized index. The median values for all quantizing cells and all projection directions are learned off-line on our independent dataset.

  17. Quantizing(Voronoi) cell

  18. Indexing structure • j thminiBOF associated with image i is represented by the tuple • 4 bytes to store the image identifier i • ┌d/8┐byte to store the binary vector bi,j • Total memory usage pre image is Ci = m*( 4 + ┌d/8┐)

  19. Indexing structure • Multi-probe strategy [7] • Retrieving not only the inverted list associated with the quantized index ci,j , but the set of inverted lists associated with the closest t centroids of the quantizer codebook • It increases the number of image hits because t times more inverted lists are visited

  20. Fusion:expected distance criterion • bi,jThe signature associated with the query image q • bq,jThe signature of the database image I • bq = [ bq,1,...,bq,m] • bi = [ bi,1,...,bi,m] h(x,y) represents the Hamming distance

  21. Fusion equal to 0 for images having no observed binary signature equal to d * m/2 if the database image I is the query image itself The query speed improved by a threshold on the Hamming distance, we use τ = d/2

  22. Fusion

  23. Experiments Used the following parameters in all the miniBOF experiments On University of Kentucky object recognition benchmark On Holidays + Flickr1M

  24. Experiments On Holiday + Flickr1M

  25. ExperimentsSample

  26. Conclusion • This paper we have introduced a way of packing BOFs : miniBOFs • An efficientindexing structure based on Hamming Embedding allows for rapid access and an expected distance criterion for the fusion of the scores. • It Reduces Memory usage the quantity of memory scanned (hits) query time

More Related