260 likes | 484 Views
Packing bag-of-features ICCV2009. Herv´ eJ´ egou MatthijsDouze CordeliaSchmid INRIA INRIA INRIA herve.jegou@inria.fr matthijs.douze@inria.fr cordelia.schmid@inria.fr. LEAR ( Lea rning and R ecognition in Vision). OUTLINE. Introduction Background Structure & Approach
E N D
Packing bag-of-featuresICCV2009 Herv´ eJ´ egou MatthijsDouze CordeliaSchmid INRIA INRIA INRIA herve.jegou@inria.frmatthijs.douze@inria.frcordelia.schmid@inria.fr LEAR (Learning and Recognition in Vision)
OUTLINE • Introduction • Background • Structure & Approach • Experiments • Conclusion
Introduction • One of the main limitations of image search based on bag-of-feature is the memory usage per image • Provide a method which Reduce the memory usage and faster than standard bag-of -feature
BOF outline • 1. Extract local image descriptors (features) (Hessian-Affine detector [8] & SIFT descriptor [6]) • 2. Learn “visual vocabulary” (Clustering of the descriptor) • 3.Quantize features using visual vocabulary (K-means quantizer) • 4. Represent images by frequencies of “visual words” histogram of visual word occurrence is weighted using the tf-idf. normalized with the L2 norm Producing a frequency vector fi of length k
tf-idf weighting tf = 0.030 ( 3/100 ) 100 vocabularies in a document, ‘a’ 3 times idf = 13.287 ( log (10,000,000/1,000) ) 1,000 documents have ‘a’, total number of documents 10,000,000 if-idf = 0.398 ( 0.03 * 13.287 ) Visual word (feature) image
Datasets • INRIA Holidays dataset [4] • University of Kentucky recognition benchmark [9] • Flickr1M & Flickr1M*
Binary BOF [12] • Discard the information about the exact number of occurrences of a given visual word in the image • Binary BOF components only indicates the presence or absence of a particular visual word in the image • A sequential coding using 1 bit per component. The memory usage is, then, ┌k/8┐byte per image. The memory usage per image would be typically 10kB per image [12] J.SivicandA.Zisserman. Video Google: A text retrieval Approach to object matching in videos.In ICCV,pages1470–1477,2003.
Compressed inverted file [16] • [16]J.ZobelandA.Moffat. Inverted files for text search engines. ACM Computing Surveys,38(2):6,2006. • Compared with a standard inverted file , about 4 times more image scan be indexed using the same amount of memory. • The amount of memory to be read is proportionally reduced at query time. • This may compensate the decoding cost of the decompression algorithm.
MiniBOFs Performed a query image 1) Producing 2) Indexing 3) Fusing
Projection of a BOF:vocabulary aggregators • Sparse projection matrices A = {A1,...,Am} of sizes d × k d = dimension of the output descriptor k = dimension of the initial BOF • For each projection vector (a matrix row ) , the number of non-zero components is nz = k/d. Typically set nz = 8 for k=1000 ,resulting d = 125
Projection of a BOF: • The other aggregators are defined by shuffling the input BOF vector components using random permutation. • For k =12 , d=3 the random permutation (11,2,12,8,9,4,10,1,7,5,6,3 • Image i , m miniBOFsωi,j , 1 ≤ j ≤ m fi = BOF frequency vector
Indexing structure [4] • Quantization k’ = number of codebook entries of the indexing structure The set of k-means codebooks qj(.), 1<= j <= m, is learned off-line using a large number of miniBOF vectors, here extracted from the Flickr1M* dataset. The miniBOFs is not related to the one associated with the initial SIFT descriptors, hence we may choose k ≠ k’. Typically set k’ = 20000
Indexing structure • Binary signature generation bi,j • Length of d, refine the localization of the miniBOF within the cell • Using the method of [4] • The miniBOF is projected using a random rotation matrix R, producing d components • Each bit of the vector bi,j is obtained by comparing the value projected by R to the median value of the elements having the same quantized index. The median values for all quantizing cells and all projection directions are learned off-line on our independent dataset.
Indexing structure • j thminiBOF associated with image i is represented by the tuple • 4 bytes to store the image identifier i • ┌d/8┐byte to store the binary vector bi,j • Total memory usage pre image is Ci = m*( 4 + ┌d/8┐)
Indexing structure • Multi-probe strategy [7] • Retrieving not only the inverted list associated with the quantized index ci,j , but the set of inverted lists associated with the closest t centroids of the quantizer codebook • It increases the number of image hits because t times more inverted lists are visited
Fusion:expected distance criterion • bi,jThe signature associated with the query image q • bq,jThe signature of the database image I • bq = [ bq,1,...,bq,m] • bi = [ bi,1,...,bi,m] h(x,y) represents the Hamming distance
Fusion equal to 0 for images having no observed binary signature equal to d * m/2 if the database image I is the query image itself The query speed improved by a threshold on the Hamming distance, we use τ = d/2
Experiments Used the following parameters in all the miniBOF experiments On University of Kentucky object recognition benchmark On Holidays + Flickr1M
Experiments On Holiday + Flickr1M
Conclusion • This paper we have introduced a way of packing BOFs : miniBOFs • An efficientindexing structure based on Hamming Embedding allows for rapid access and an expected distance criterion for the fusion of the scores. • It Reduces Memory usage the quantity of memory scanned (hits) query time