1 / 26

Object Retrieval Using Visual Query Context

Object Retrieval Using Visual Query Context. Linjun Yang Bo Geng Yang Cai Alan Hanjalic Xian-Sheng Hua. Presented By: Shimon Berger. What is a Visual Query?. TinEye Google Image Search Google Goggles. Current Shortcomings. Bounding box Complex shapes User inaccuracy

gaille
Download Presentation

Object Retrieval Using Visual Query Context

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Object Retrieval Using Visual Query Context Linjun Yang Bo Geng Yang Cai Alan Hanjalic Xian-Sheng Hua Presented By: Shimon Berger

  2. What is a Visual Query? • TinEye • Google Image Search • Google Goggles

  3. Current Shortcomings • Bounding box • Complex shapes • User inaccuracy • Issues with the image itself • Too small • Lacks texture

  4. Bad Query Image vs. Good Query Image

  5. How Can We Improve a Visual Query? Objects in real-life aren’t bound by a box

  6. Proposal • Introduce a contextual object retrieval (COR) model • Evaluate experimentally using 3 image datasets • Demonstrate the benefit of introducing contextual data into the query

  7. Existing Methods • Relevance feedback • “Bag of visual words” • Scale-invariant feature transform (SIFT) • Cosine retrieval model • Language modeling

  8. Proposed COR Model • Based on the Kullbak-Leibler retrieval model • Detect interest points • Extract SIFT descriptors • Convert into visual words • Match words to documents in a database • Uses Jelinek-Mercer smoothing method • Captures important patterns, while removing noise

  9. COR Model • Begins with contrast-based saliency detection • Produces saliency score • Uses  as a control variable • Estimate search intent score for each visual word • Indicates probability of a given visual word to reflect user’s search intent

  10. COR Search Intent Score • Standard LM approach uses binary search intent score • Two proposed algorithms to compute SI from bounding box with context: • Based on pixel distance from bounding box (spatial propagation) • Based on color coherence of the pixels (appearance propagation)

  11. Spatial Propagation (CORa) • Bounding box is usually rough and inaccurate • Lack of user effort • Limiting rectangular shape • Use smoothed approximation of bounding box • Dual-sigmoid function • Uses  as a control variable

  12. Spatial Propagation (CORa)

  13. Appearance Propagation (CORm) • Assign high scores to object of interest, normally in foreground • Assign low scores to background objects, or objects of no interest • Similar to image matting • Separate foreground and background using alpha values • Separate relevant objects from irrelevant in bounding box

  14. Appearance Propagation (CORm) Three step approach: • Estimate foreground and background models guided by bounding box • GrabCut algorithm • Use models to select foreground and background pixels • Search intent score estimated based on pixel information • Use pseudo-foreground and -background pixels to account for spatial smoothness • Top 10% of foreground pixels from inside box and top 20% of background pixels from outside box

  15. CORmIn Experiments • CORmis broken down into 2 variations: • CORg • Only uses GrabCut algorithm, not all 3 steps • CORw • Uses alpha values based on weighted foregroundprobability

  16. Experiments • Experiments performed using 3 image datasets: • Oxford5K • Oxford5K+ImageNet500K • Web1M • # 1, 2 use 11 landmarks (55 total images) as queries • # 3 adds an additional 45 images • Randomly selected • Various categories

  17. Experiments • COR models compared to 2 baseline retrieval models: • Cosine • General language modeling (context-unaware) • Baseline models only use visual words from inside bounding box • All models evaluated in terms of average precision (AP) • AP over all queries are averaged to obtain mean average precision (MAP)

  18. Experiments

  19. AP for different landmarks on Oxford5K dataset.

  20. AP for different landmarks on Oxford5K+ImageNet500K dataset.

  21. AP for different queries on Web1M dataset.

  22. Web1M Dataset Best performance enhancement on landmarks:

  23. Control Parameters •  is the control for saliency •  is the control for the reliability of the bounding box

  24. Future Work • Context-aware multimedia retrieval • Using the contextual information shown here • Text surrounding query image • User logs and history

More Related