Object Retrieval Using Visual Query Context

Object Retrieval Using Visual Query Context Linjun Yang Bo Geng Yang Cai Alan Hanjalic Xian-Sheng Hua Presented By: Shimon Berger

What is a Visual Query? • TinEye • Google Image Search • Google Goggles

Current Shortcomings • Bounding box • Complex shapes • User inaccuracy • Issues with the image itself • Too small • Lacks texture

Bad Query Image vs. Good Query Image

How Can We Improve a Visual Query? Objects in real-life aren’t bound by a box

Proposal • Introduce a contextual object retrieval (COR) model • Evaluate experimentally using 3 image datasets • Demonstrate the benefit of introducing contextual data into the query

Existing Methods • Relevance feedback • “Bag of visual words” • Scale-invariant feature transform (SIFT) • Cosine retrieval model • Language modeling

Proposed COR Model • Based on the Kullbak-Leibler retrieval model • Detect interest points • Extract SIFT descriptors • Convert into visual words • Match words to documents in a database • Uses Jelinek-Mercer smoothing method • Captures important patterns, while removing noise

COR Model • Begins with contrast-based saliency detection • Produces saliency score • Uses  as a control variable • Estimate search intent score for each visual word • Indicates probability of a given visual word to reflect user’s search intent

COR Search Intent Score • Standard LM approach uses binary search intent score • Two proposed algorithms to compute SI from bounding box with context: • Based on pixel distance from bounding box (spatial propagation) • Based on color coherence of the pixels (appearance propagation)

Spatial Propagation (CORa) • Bounding box is usually rough and inaccurate • Lack of user effort • Limiting rectangular shape • Use smoothed approximation of bounding box • Dual-sigmoid function • Uses  as a control variable

Spatial Propagation (CORa)

Appearance Propagation (CORm) • Assign high scores to object of interest, normally in foreground • Assign low scores to background objects, or objects of no interest • Similar to image matting • Separate foreground and background using alpha values • Separate relevant objects from irrelevant in bounding box

Appearance Propagation (CORm) Three step approach: • Estimate foreground and background models guided by bounding box • GrabCut algorithm • Use models to select foreground and background pixels • Search intent score estimated based on pixel information • Use pseudo-foreground and -background pixels to account for spatial smoothness • Top 10% of foreground pixels from inside box and top 20% of background pixels from outside box

CORmIn Experiments • CORmis broken down into 2 variations: • CORg • Only uses GrabCut algorithm, not all 3 steps • CORw • Uses alpha values based on weighted foregroundprobability

Experiments • Experiments performed using 3 image datasets: • Oxford5K • Oxford5K+ImageNet500K • Web1M • # 1, 2 use 11 landmarks (55 total images) as queries • # 3 adds an additional 45 images • Randomly selected • Various categories

Experiments • COR models compared to 2 baseline retrieval models: • Cosine • General language modeling (context-unaware) • Baseline models only use visual words from inside bounding box • All models evaluated in terms of average precision (AP) • AP over all queries are averaged to obtain mean average precision (MAP)

Experiments

AP for different landmarks on Oxford5K dataset.

AP for different landmarks on Oxford5K+ImageNet500K dataset.

AP for different queries on Web1M dataset.

Web1M Dataset Best performance enhancement on landmarks:

Control Parameters •  is the control for saliency •  is the control for the reliability of the bounding box

Future Work • Context-aware multimedia retrieval • Using the contextual information shown here • Text surrounding query image • User logs and history

Object Retrieval Using Visual Query Context

Object Retrieval Using Visual Query Context

Presentation Transcript

3D Object Retrieval

Visual Object Recognition

Visual Object Tracking

Object-Graphs for Context-Aware Visual Category Discovery

Visual Object Tracking

Information Retrieval in Context

Information Retrieval - Query expansion

Visual Object Recognition

Visual query processing for efficient image retrieval using a SOM-based filter-refinement scheme

Visual Object Recognition

OBJECT ORIENTED QUERY LANGUAGES

Object Query Language OQL

Object Query Language ( OQL )

Information Retrieval - Query expansion

Context-based object-class recognition and retrieval by generalized correlograms

Context-Aware Query Classification

Discovering Query Context using Concept Hierarchy

Visual Information Retrieval

Information Retrieval - Query expansion

Visual Object Recognition

Context-Sensitive Information Retrieval Using Implicit Feedback