Object Retrieval Using Visual Query Context

Object Retrieval Using Visual Query Context Linjun Yang Bo Geng Yang Cai Alan Hanjalic Xian-Sheng Hua Presented By: Shimon Berger

What is a Visual Query? • TinEye • Google Image Search • Google Goggles

Current Shortcomings • Bounding box • Complex shapes • User inaccuracy • Issues with the image itself • Too small • Lacks texture

Bad Query Image vs. Good Query Image

How Can We Improve a Visual Query? Objects in real-life aren’t bound by a box

Proposal • Introduce a contextual object retrieval (COR) model • Evaluate experimentally using 3 image datasets • Demonstrate the benefit of introducing contextual data into the query

Existing Methods • Relevance feedback • “Bag of visual words” • Scale-invariant feature transform (SIFT) • Cosine retrieval model • Language modeling

Proposed COR Model • Based on the Kullbak-Leibler retrieval model • Detect interest points • Extract SIFT descriptors • Convert into visual words • Match words to documents in a database • Uses Jelinek-Mercer smoothing method • Captures important patterns, while removing noise

COR Model • Begins with contrast-based saliency detection • Produces saliency score • Uses  as a control variable • Estimate search intent score for each visual word • Indicates probability of a given visual word to reflect user’s search intent

COR Search Intent Score • Standard LM approach uses binary search intent score • Two proposed algorithms to compute SI from bounding box with context: • Based on pixel distance from bounding box (spatial propagation) • Based on color coherence of the pixels (appearance propagation)

Spatial Propagation (CORa) • Bounding box is usually rough and inaccurate • Lack of user effort • Limiting rectangular shape • Use smoothed approximation of bounding box • Dual-sigmoid function • Uses  as a control variable

Spatial Propagation (CORa)

Appearance Propagation (CORm) • Assign high scores to object of interest, normally in foreground • Assign low scores to background objects, or objects of no interest • Similar to image matting • Separate foreground and background using alpha values • Separate relevant objects from irrelevant in bounding box

Appearance Propagation (CORm) Three step approach: • Estimate foreground and background models guided by bounding box • GrabCut algorithm • Use models to select foreground and background pixels • Search intent score estimated based on pixel information • Use pseudo-foreground and -background pixels to account for spatial smoothness • Top 10% of foreground pixels from inside box and top 20% of background pixels from outside box

CORmIn Experiments • CORmis broken down into 2 variations: • CORg • Only uses GrabCut algorithm, not all 3 steps • CORw • Uses alpha values based on weighted foregroundprobability

Experiments • Experiments performed using 3 image datasets: • Oxford5K • Oxford5K+ImageNet500K • Web1M • # 1, 2 use 11 landmarks (55 total images) as queries • # 3 adds an additional 45 images • Randomly selected • Various categories

Experiments • COR models compared to 2 baseline retrieval models: • Cosine • General language modeling (context-unaware) • Baseline models only use visual words from inside bounding box • All models evaluated in terms of average precision (AP) • AP over all queries are averaged to obtain mean average precision (MAP)

Experiments

AP for different landmarks on Oxford5K dataset.

AP for different landmarks on Oxford5K+ImageNet500K dataset.

AP for different queries on Web1M dataset.

Web1M Dataset Best performance enhancement on landmarks:

Control Parameters •  is the control for saliency •  is the control for the reliability of the bounding box

Future Work • Context-aware multimedia retrieval • Using the contextual information shown here • Text surrounding query image • User logs and history

Object Retrieval Using Visual Query Context

Object Retrieval Using Visual Query Context

Presentation Transcript

3D Object Retrieval

Visual Object Recognition

Visual Object Tracking

Object Retrieval Using Visual Query Context

Object-Graphs for Context-Aware Visual Category Discovery

Visual Object Tracking

Information Retrieval - Query expansion

Visual Object Recognition

Visual query processing for efficient image retrieval using a SOM-based filter-refinement scheme

Visual Object Recognition

OBJECT ORIENTED QUERY LANGUAGES

Object Query Language OQL

Object Query Language ( OQL )

Information Retrieval - Query expansion

Context-based object-class recognition and retrieval by generalized correlograms

Context-Aware Query Classification

Discovering Query Context using Concept Hierarchy

Visual Information Retrieval

Information Retrieval - Query expansion

Visual Object Recognition

Context-Sensitive Information Retrieval Using Implicit Feedback