Data-driven Visual Similarity for Cross-domain Image Matching

Data-driven Visual Similarity for Cross-domain Image Matching AbhinavShrivastava *Tomasz MalisiewiczAbhinav GuptaAlexei A. Efros Carnegie Mellon University *MIT To appear in SIGGRAPH Asia, 2011

Outline • Instruction • Approach • Data-driven Uniqueness • Algorithm Description • Experimental Validation • Sketch-to-Image Matching • Painting-to-Image Matching • Applications • Limitations

Instruction

Visual matching approaches • Exact matching: These methods usually fail when tasked with finding similar, but not identical objects (e.g., try using GOOGLE GOGGLES app to find a cup, or a chair).

Approximate matching ： • Most focus on employing various image representations that aim to capture the important, salient parts of the image. （GIST, HoG） • Content-Based Image Retrieval (CBIR), the aim is to retrieve semantically-relevant images, even if they do not appear to be visually similar.

Cross-domain matching: • Particular domains • sketches to photographs [Chen et al. 2009; Eitz et al. 2010] • photos under different illuminants [Chong et al. 2008] • Across multiple domains • Matching local selfsimilarities across images and videos work by Shechtmanand Irani [2007]

each query image decides what is the best way to weight its constituent parts.

Approach • There are two requirements for a good visual similarity function: • It has to focus on the content of the image (the “what”), rather that the style (the “how”). • It should be scene-dependent.

Data-driven Uniqueness: • Re-weightthe different elements of an image based on how unique they are, the resulting similarity function would. • Compute uniqueness in a data-driven way — against a very large dataset of randomly selected images. • The features that would best discriminate this image (the positive sample) against the rest of the data (the negative samples).

Given the learned, query-dependent weight vector wq, the visual similarity between a query image Iq and any other image/sub-image Ii can be defined simply as: where xi is Ii’s extracted feature vector. • We employ the linear Support Vector Machine (SVM) to learn the feature weight vector. • image feature ： Histogram of Oriented Gradients (HOG) template descriptor

To visualize how the SVM captures the notion of data-driven uniqueness：

Algorithm Description： • Learning the weight vector wqamounts to minimizing the following convex objective function: • Each query image (Iq) is represented with a rigid grid-like HoGfeature template (xq). • Due to image misalignment, we create a set of extrapositive data-points, P, by applying smalltransformations (shift,scale and aspect ratio) to the query image Iq, and generating xi foreach sample. • The SVM classifier is learned using IqandP aspositive samples. • Set containing millions of sub-images N(extracted from 10,000 randomly selected Flickr images), as negatives. • We use LIBSVM [Chang and Lin 2011] for learning wqwith a common regularization parameter λ= 100 and the standard hinge loss function h(x) = max(0,1－x).

Experimental Validation • To demonstrate our approach, we performed a number of imagematching experiments on different image datasets, comparingagainst the following popular baseline methods: • Tiny Images • GIST • BoW • Spatial Pyramid • Normalized-HoG (N-HoG)

Sketch-to-Image Matching • We collected a dataset of 50sketches(25 cars and 25 bicycles) to be used as queries. • The sketches were used to query into the PASCALVOC dataset [Everingham et al. 2007].

For quantitative evaluation, we compared how many car and bicycle images were retrieved in the top-K images for car and bicycle sketches respectively. • We used the bounded mean Average Precision (mAP) metric used by [J´egou et al. 2008].

Painting-to-Image Matching • We collected a dataset of 50 paintings of outdoor scenes in a diverse set of painting styles geographical locations. • The retrieval set was sub-sampled from the 6:4M GPS-tagged Flickr images of [Hays and Efros 2008]. • For each query, we created a set of 5,000 images randomly sampled within a 50 mile radius of each painting’s location.

Applications • Internet Re-photography

Painting2GPS

Visual Scene Exploration

Limitations • Two main failure modes ： • We fail to find a good match due to the relatively small size of our dataset (10,000 images) compared to Google’s billions of indexed images. • The query scene is so cluttered that it is difficult for any algorithm to decide which parts of the scene – the car, the people on sidewalk, the building in the background – it should focus on.

Thank you！

Data-driven Visual Similarity for Cross-domain Image Matching

Data-driven Visual Similarity for Cross-domain Image Matching

Presentation Transcript

Image Similarity

Domain-driven Design

Domain Driven Design

Similarity Search in Visual Data

Data-driven Generation of Image Descriptions

Domain Driven Design

Data-Driven Image Color Theme Enhancement

Domain Driven Design

Matching Similarity for Keyword - based Clustering

Domain Driven Design

Domain Driven Design

VISUAL IMAGE

Similarity-based matching for face authentication

SELF-SIMILARITY MEASURE FOR ASSESSMENT OF IMAGE VISUAL QUALITY

Domain-Driven Design

Image Similarity

VISUAL IMAGE

Domain Driven Design

Image Similarity

Data Extraction using Image Similarity

Algorithms for Image Matching for Visual Robot Navigation

Image Similarity