300 likes | 645 Views
Image Retrieval with Geometry-Preserving Visual Phrases. Yimeng Zhang, Zhaoyin Jia and Tsuhan Chen Cornell University. Similar Image Retrieval. …. Image Database. Ranked relevant images. Bag-of-Visual-Word (BoW) . Images are represented as the histogram of words
E N D
Image Retrieval with Geometry-Preserving Visual Phrases Yimeng Zhang, Zhaoyin Jia and Tsuhan Chen Cornell University
Similar Image Retrieval … Image Database Ranked relevant images
Bag-of-Visual-Word (BoW) Images are represented as the histogram of words Similarity of two images: cosine similarity of histograms Length: dictionary size …
Geometry-preserving Visual Phrases length-k Phrase:: k words in a certain spatial layout … Bag of Phrases: (length-2 phrases) …
Phrases vs. Words Relevant Irrelevant Word Word Length-2 Length-2 Length-3 Length-3
Geometry Verification Only on top ranked images Searching Step with BoW Encode Spatial Info … Post-processing(Geometry Verification)
Modeling relationship between words Co-occurrences in Entire image [L. Torresani, et al, CVPR 2009] No spatial information Phrases in a local neighborhoods [J. Yuan et al, CVPR07][Z. Wu et al., CVPR10] [C.L.Zitnick, Tech.Report 07] No long range interactions, weak geometry Select a subset of phrases [J. Yuan et al, CVPR07] Discard a large portion of phrases (length-2 Phrase) Previous works: reduce the number of phrases Dimension: exponential to # of words in Phrase Our work: All phrases, Linear computation time … …
Overview Similarity Measure BoP BoW [Zhang and Chen, 09] 2. Large Scale Retrieval This Paper Min-hash Inverted Files Min-hash Inverted Files
Co-occurring Phrases Only consider the translation difference A B D C F E F A A B C D A F E F [Zhang and Chen, 09]
Co-occurring Phrase Algorithm A B A 3 2 1 0 -1 -2 -3 -4 D C # of co-occurring length -2 Phrases: F DF EF F E F B A 1 +1 =5 F A C A F A A B C D -2 -1 0 1 2 3 4 A F Offset space E F [Zhang and Chen, 09]
Relation with the feature vector … … … … same as BOW!!! # of co-occurring length-k phrases Inner product of the feature vectors M: # of corresponding pairs, in practice, linear to the number of local features
Inverted Index with BoW Avoid comparing with everyimage Inverted Index … … … … … +1 Score table
Inverted Index with Word Location … … I1 … Assume same word only occurs once in the same image, Same memory usage as BoW … … … …
Score Table Compute # of Co-occurring Phrases: Compute the Offset Space BoW I1 I2 In … BoP
Inverted Files with Phrases … I1 I10 Inverted Index … 0,-1 1,-1 … -1,-1 I8 -1,0 0,0 1,0 … 0,1 … I5 wi … … +1 +1 +1 +1 Offset Space … … … …
Final Score In I2 I1 5 2 4 … Offset Space 8 3 10 2 2 2 1 1 Final similarity scores
Overview BoP BoW Min-hash Inverted Files Min-hash Inverted Files Less storage and time complexity
Min-hash with BoW Probability of min-hash collision (same word) = Image Similarity I I’
Min-hash with Phrases Probability of k min-hash collision with consistent geometry (Details are in the paper) I 3 2 1 0 -1 -2 -3 -4 I’ -3 -2 -1 0 1 2 Offset space
Other Invariances Add dimension to the offset space Increase the memory usage Image I Image I’ [Zhang and Chen, 10]
Variant Matching Local histogram matching
Evaluation BoW + Inverted Index vs. BoP + inverted Index BoW + Min-hash vs. BoP + Min-hash Post-processing methods: complimentary to our work
Experiments –Inverted Index • 5K Oxford dataset (55 queries) • 1M flicker distracters Philbin, J. et al. 07
Example Precision-recall curve Higher precision at lower recall BoW BoW BoW Precision BoP BoP Precision Recall Recall
Comparison Mean average precision: mean of the AP on 55 queries BoP+RANSAC BoP BoW BoW+RANSAC • Outperform BoW (similar computation) • Outperform BoW+RANSAC(10 times slower on 150 top images) • Larger improvement on smaller vocabulary size
+Flicker 1M Dataset • Computational Complexity RANSAC: 4s on top 300 images
Experiment - min-hash University of Kentucky dataset Minhash with BoW: [O. Chum et al., BMVC08]
Conclusion • Encode more spatial information into the BoW • Can be applied to all images in the database at the searching step • Same computational complexity as BoW • Better Retrieval Precision than BoW+RANSAC