1 / 37

Accounting for the relative importance of objects in image retrieval

Sung Ju Hwang and Kristen Grauman University of Texas at Austin. Accounting for the relative importance of objects in image retrieval. Image retrieval. Content-based retrieval from an image database. Image 1. Image 2. Image Database. Query image. …. Image k.

zaide
Download Presentation

Accounting for the relative importance of objects in image retrieval

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sung Ju Hwang and Kristen Grauman University of Texas at Austin Accounting for the relative importance of objects in image retrieval

  2. Image retrieval Content-based retrieval from an image database Image 1 Image 2 Image Database Query image … Image k

  3. Relative importance of objects Which image is more relevant to the query? ? Image Database Query image

  4. Relative importance of objects Which image is more relevant to the query? water sky bird water bird ? Image Database cow fence Query image cow cow mud

  5. Relative importance of objects An image can contain many different objects, but some are more “important” than others. architecture sky mountain bird cow water

  6. Relative importance of objects Some objects are background architecture sky mountain bird cow water

  7. Relative importance of objects Some objects are less salient architecture sky mountain bird cow water

  8. Relative importance of objects Some objects are more prominent or perceptually define the scene architecture sky mountain bird cow water

  9. Our goal Goal: Retrieve those images that share important objects with the query image. versus How to learn a representation that accounts for this?

  10. Idea: image tags as importance cue The order in which person assigns tags provides implicit cues about object importance to scene. TAGS Cow Birds Architecture Water Sky

  11. Idea: image tags as importance cue The order in which person assigns tags provides implicit cues about object importance to scene. TAGS: Cow Birds Architecture Water Sky Learn this connection to improve cross-modal retrieval and CBIR. Then query with untagged images to retrieve most relevant images or tags.

  12. Related work Previous work using tagged images focuses on the noun ↔ object correspondence. Duygulu et al. 02 Berg et al. 04 Fergus et al. 05 Li et al., 09 Lavrenko et al. 2003, Monay & Gatica-Perez 2003, Barnard et al. 2004, Schroff et al. 2007, Gupta & Davis 2008, … Related work building richer image representations from “two-view” text+image data: height: 6-11 weight: 235 lbs position:forward, croatia college: Hardoon et al. 04 Gupta et al. 08 Blaschko & Lampert 08 Bekkerman & Jeon 07, Qi et al. 09, Quack et al. 08, Quattoni et al 07, Yakhnenko & Honavar 09,…

  13. Approach overview: Building the image database Cow Grass Horse Grass … Car House Grass Sky Tagged training images Learn projections from each feature space into common “semantic space” Extract visual and tag-based features

  14. Approach overview: Retrieval from the database Untagged query image Retrieved images Cow Tree Grass Cow Tree Image database Tag list query Retrieved tag-list • Image-to-image retrieval • Tag-to-image retrieval • Image-to-tag auto annotation

  15. Dual-view semantic space Visual features and tag-lists are two views generated by the same concept. Semantic space

  16. Learning mappings to semantic space Canonical Correlation Analysis (CCA): choose projection directions that maximize the correlation of views projected from same instance. View 2 View 1 Semantic space: new common feature space

  17. Kernel Canonical Correlation Analysis Linear CCA Given paired data: Select directions so as to maximize: Given pair of kernel functions: Kernel CCA , Same objective, but projections in kernel space: , [Akaho 2001, Fyfe et al. 2001, Hardoon et al. 2004]

  18. Building the kernels for each view Word frequency, rank kernels Visual kernels Semantic space

  19. Visual features Color Histogram Visual Words Gist captures local appearance (k-means on DoG+SIFT) captures the total scene structure captures the HSV color distribution [Torralba et al.] Average the component χ2 kernels to build a single visual kernel .

  20. Tag features Traditional bag-of-(text)words Word Frequency tagcount Cow 1 Bird 1 Water 1 Architecture 1 Mountain 1 Sky 1 Car 0 Person 0 Cow Bird Water Architecture Mountain Sky

  21. Tag features Absolute rank in this image’s tag-list Absolute Rank tagvalue Cow 1 Bird 0.63 Water 0.50 Architecture 0.43 Mountain 0.39 Sky 0.36 Car 0 Person 0 Cow Bird Water Architecture Mountain Sky

  22. Tag features Percentile rank, compared to word’s typical rank in all tag-lists. Relative Rank tagvalue Cow 0.9 Bird 0.6 Water 0.8 Architecture 0.5 Mountain 0.8 Sky 0.8 Car 0 Person 0 Cow Bird Water Architecture Mountain Sky

  23. Building the kernels for each view Word frequency, rank kernels Visual kernels Semantic space

  24. Experiments We compare the retrieval performance of our method with two baselines: Words+Visual Baseline Visual-Only Baseline Query image 1st retrieved image KCCA semantic space 1st retrieved image Query image [Hardoon et al. 2004, Yakhenenko et al. 2009]

  25. Evaluation We use Normalized Discounted Cumulative Gain at top K (NDCG@K) to evaluate retrieval performance: Reward term score for pth ranked example Sum of all the scores for the perfect ranking (normalization) Doing well in the top ranks is more important. [Kekalainen & Jarvelin, 2002]

  26. Evaluation We present the NDCG@K scores using two different reward terms: Object presence/scale Ordered tag similarity Cow Tree Grass Person Cow Tree Fence Grass Rewards similarity of query’s objects/scales and those in retrieved image(s). Rewards similarity of query’s ground truth tag ranks and those in retrieved image(s). scale presence relative rank absolute rank

  27. Dataset LabelMe Pascal • 6352 images • Database: 3799 images • Query: 2553 images • Scene-oriented • Contains the ordered tag lists via labels added • 56 unique taggers • ~23 tags/image • 9963 images • Database: 5011 images • Query: 4952 images • Object-central • Tag lists obtained on Mechanical Turk • 758 unique taggers • ~5.5 tags/image

  28. Image-to-image retrieval We want to retrieve images most similar to the given query image in terms of object importance. Visual kernel space Tag-list kernel space Untagged query image Image database Retrieved images

  29. Image-to-image retrieval results Query Image Visual only Words + Visual Our method

  30. Image-to-image retrieval results Query Image Visual only Words + Visual Our method

  31. Image-to-image retrievalresults Our method better retrieves images that share the query’s important objects, by both measures. 39% improvement Retrieval accuracy measured by object+scale similarity Retrieval accuracy measured by ordered tag-list similarity

  32. Tag-to-image retrieval We want to retrieve the images that are best described by the given tag list Visual kernel space Tag-list kernel space Cow Person Tree Grass Image database Retrieved images Query tags

  33. Tag-to-image retrieval results Our method better respects the importance cues implied by the user’s keyword query. 31% improvement

  34. Image-to-tag auto annotation We want to annotate query image with ordered tags that best describe the scene. Visual kernel space Tag-list kernel space Untagged query image Cow Tree Grass Field Cow Fence Cow Grass Image database Output tag-lists

  35. Image-to-tag auto annotationresults Tree Boat Grass Water Person Boat Person Water Sky Rock Person Tree Car Chair Window Bottle Knife Napkin Light fork k = number of nearest neighbors used

  36. Implicit tag cues as localization prior [Hwang & Grauman, CVPR 2010] Training: Learn object-specific connection between localization parameters and implicit tag features. P (location, scale | tags) Desk Mug Office Computer Poster Desk Screen Mug Poster Implicit tag features Mug Eiffel Mug Coffee Woman Table Mug Ladder Testing: Given novel image, localize objects based on both tags and appearance. Object detector Mug Key Keyboard Toothbrush Pen Photo Post-it Implicit tag features

  37. Conclusion • We want to learn what is implied (beyond objects present) by how a human provides tags for an image • Approach requires minimal supervision to learn the connection between importance conveyed by tags and visual features. • Consistent gains over • content-based visual search • tag+visual approach that disregards importance

More Related