260 likes | 370 Views
Object Retrieval Using Visual Query Context. Linjun Yang Bo Geng Yang Cai Alan Hanjalic Xian-Sheng Hua. Presented By: Shimon Berger. What is a Visual Query?. TinEye Google Image Search Google Goggles. Current Shortcomings. Bounding box Complex shapes User inaccuracy
E N D
Object Retrieval Using Visual Query Context Linjun Yang Bo Geng Yang Cai Alan Hanjalic Xian-Sheng Hua Presented By: Shimon Berger
What is a Visual Query? • TinEye • Google Image Search • Google Goggles
Current Shortcomings • Bounding box • Complex shapes • User inaccuracy • Issues with the image itself • Too small • Lacks texture
How Can We Improve a Visual Query? Objects in real-life aren’t bound by a box
Proposal • Introduce a contextual object retrieval (COR) model • Evaluate experimentally using 3 image datasets • Demonstrate the benefit of introducing contextual data into the query
Existing Methods • Relevance feedback • “Bag of visual words” • Scale-invariant feature transform (SIFT) • Cosine retrieval model • Language modeling
Proposed COR Model • Based on the Kullbak-Leibler retrieval model • Detect interest points • Extract SIFT descriptors • Convert into visual words • Match words to documents in a database • Uses Jelinek-Mercer smoothing method • Captures important patterns, while removing noise
COR Model • Begins with contrast-based saliency detection • Produces saliency score • Uses as a control variable • Estimate search intent score for each visual word • Indicates probability of a given visual word to reflect user’s search intent
COR Search Intent Score • Standard LM approach uses binary search intent score • Two proposed algorithms to compute SI from bounding box with context: • Based on pixel distance from bounding box (spatial propagation) • Based on color coherence of the pixels (appearance propagation)
Spatial Propagation (CORa) • Bounding box is usually rough and inaccurate • Lack of user effort • Limiting rectangular shape • Use smoothed approximation of bounding box • Dual-sigmoid function • Uses as a control variable
Appearance Propagation (CORm) • Assign high scores to object of interest, normally in foreground • Assign low scores to background objects, or objects of no interest • Similar to image matting • Separate foreground and background using alpha values • Separate relevant objects from irrelevant in bounding box
Appearance Propagation (CORm) Three step approach: • Estimate foreground and background models guided by bounding box • GrabCut algorithm • Use models to select foreground and background pixels • Search intent score estimated based on pixel information • Use pseudo-foreground and -background pixels to account for spatial smoothness • Top 10% of foreground pixels from inside box and top 20% of background pixels from outside box
CORmIn Experiments • CORmis broken down into 2 variations: • CORg • Only uses GrabCut algorithm, not all 3 steps • CORw • Uses alpha values based on weighted foregroundprobability
Experiments • Experiments performed using 3 image datasets: • Oxford5K • Oxford5K+ImageNet500K • Web1M • # 1, 2 use 11 landmarks (55 total images) as queries • # 3 adds an additional 45 images • Randomly selected • Various categories
Experiments • COR models compared to 2 baseline retrieval models: • Cosine • General language modeling (context-unaware) • Baseline models only use visual words from inside bounding box • All models evaluated in terms of average precision (AP) • AP over all queries are averaged to obtain mean average precision (MAP)
AP for different landmarks on Oxford5K+ImageNet500K dataset.
Web1M Dataset Best performance enhancement on landmarks:
Control Parameters • is the control for saliency • is the control for the reliability of the bounding box
Future Work • Context-aware multimedia retrieval • Using the contextual information shown here • Text surrounding query image • User logs and history