300 likes | 481 Views
BMVC 2010 Sung Ju Hwang and Kristen Grauman University of Texas at Austin. Accounting for the relative importance of objects in image retrieval. Learning the Relative Importance of Objects from Tagged Images for Retrieval and Cross-Modal Search
E N D
BMVC 2010 Sung Ju Hwang and Kristen Grauman University of Texas at Austin Accounting for the relative importance of objects in image retrieval
Learning the Relative Importance of Objects from Tagged Images for Retrieval and Cross-Modal Search International Journal of Computer Vision, 2011 Sung Ju Hwang and Kristen Grauman
Relative importance of objects An image can contain many different objects, but some are more “important” than others. architecture sky mountain bird cow water
Relative importance of objects Some objects are background architecture sky mountain bird cow water
Relative importance of objects Some objects are less salient architecture sky mountain bird cow water
Relative importance of objects Some objects are more prominent or perceptually define the scene architecture sky mountain bird cow water
Our goal Goal: Retrieve those images that share important objects with the query image. versus How to learn a representation that accounts for this?
Idea: image tags as importance cue The order in which person assigns tags provides implicit cues about object importance to scene. TAGS: Cow Birds Architecture Water Sky
Approach overview: Building the image database Cow Grass Horse Grass … Car House Grass Sky Tagged training images Learn projections from each feature space into common “semantic space” Extract visual and tag-based features
Approach overview: Retrieval from the database Untagged query image Retrieved images Cow Tree Grass Cow Tree Image database Tag list query Retrieved tag-list • Image-to-image retrieval • Image-to-tag auto annotation • Tag-to-image retrieval
Visual features Color Histogram Visual Words Gist captures local appearance (k-means on DoG+SIFT) captures the total scene structure captures the HSV color distribution [Torralba et al.]
Tag features Traditional bag-of-(text)words Word Frequency tagcount Cow 1 Bird 1 Water 1 Architecture 1 Mountain 1 Sky 1 Car 0 Person 0 Cow Bird Water Architecture Mountain Sky
Tag features Absolute rank in this image’s tag-list Absolute Rank tagvalue Cow 1 Bird 0.63 Water 0.50 Architecture 0.43 Mountain 0.39 Sky 0.36 Car 0 Person 0 Cow Bird Water Architecture Mountain Sky
Tag features Percentile rank obtained from the rank distribution of that word in all tag-lists. Relative Rank tagvalue Cow 0.9 Bird 0.6 Water 0.8 Architecture 0.5 Mountain 0.8 Sky 0.8 Car 0 Person 0 Cow Bird Water Architecture Mountain Sky
Learning mappings to semantic space Canonical Correlation Analysis (CCA): choose projection directions that maximize the correlation of views projected from same instance. View 1 View 2 Semantic space: new common feature space
Kernel Canonical Correlation Analysis [Akaho 2001, Fyfe et al. 2001, Hardoon et al. 2004] Linear CCA Given paired data: Select directions so as to maximize: Given pair of kernel functions: Kernel CCA , Same objective, but projections in kernel space: ,
Recap: Building the image database Visual feature space tag feature space Semantic space
Experiments We compare the retrieval performance of our method with two baselines: Words+Visual Baseline Visual-Only Baseline Query image 1st retrieved image KCCA semantic space 1st retrieved image Query image [Hardoon et al. 2004, Yakhenenko et al. 2009]
Evaluation We use Normalized Discounted Cumulative Gain at top K (NDCG@K) to evaluate retrieval performance: Reward term score for pth example Sum of all the scores (normalization) Doing well in the top ranks is more important. [Kekalainen & Jarvelin, 2002]
Evaluation We present the NDCG@k score using two different reward terms: Object presence/scale Ordered tag similarity Cow Tree Grass Person Cow Tree Fence Grass Rewards similarity of query’s objects/scales and those in retrieved image(s). Rewards similarity of query’s ground truth tag ranks and those in retrieved image(s). scale presence relative rank absolute rank
Dataset LabelMe Pascal • 6352 images • Database: 3799 images • Query: 2553 images • ~23 tags/image • 9963 images • Database: 5011 images • Query: 4952 images • ~5.5 tags/image
Image-to-image retrieval We want to retrieve images most similar to the given query image in terms of object importance. Visual kernel space Tag-list kernel space Untagged query image Image database Retrieved images
Image-to-image retrieval results Query Image Visual only Words + Visual Our method
Image-to-image retrieval results Query Image Visual only Words + Visual Our method
Image-to-image retrievalresults Our method better retrieves images that share the query’s important objects, by both measures. 39% improvement Retrieval accuracy measured by object+scale similarity Retrieval accuracy measured by ordered tag-list similarity
Tag-to-image retrieval We want to retrieve the images that are best described by the given tag list Visual kernel space Tag-list kernel space Cow Person Tree Grass Image database Retrieved images Query tags
Tag-to-image retrieval results Our method better respects the importance cues implied by the user’s keyword query. 31% improvement
Image-to-tag auto annotation We want to annotate query image with ordered tags that best describe the scene. Visual kernel space Tag-list kernel space Untagged query image Cow Tree Grass Field Cow Fence Cow Grass Image database Output tag-lists
Image-to-tag auto annotationresults Tree Boat Grass Water Person Boat Person Water Sky Rock Person Tree Car Chair Window Bottle Knife Napkin Light fork k = number of nearest neighbors used