280 likes | 600 Views
Data-driven Visual Similarity for Cross-domain Image Matching. Abhinav Shrivastava * Tomasz Malisiewicz Abhinav Gupta Alexei A. Efros Carnegie Mellon University * MIT. To appear in SIGGRAPH Asia, 2011. Outline. Instruction Approach Data-driven Uniqueness
E N D
Data-driven Visual Similarity for Cross-domain Image Matching AbhinavShrivastava *Tomasz MalisiewiczAbhinav GuptaAlexei A. Efros Carnegie Mellon University *MIT To appear in SIGGRAPH Asia, 2011
Outline • Instruction • Approach • Data-driven Uniqueness • Algorithm Description • Experimental Validation • Sketch-to-Image Matching • Painting-to-Image Matching • Applications • Limitations
Visual matching approaches • Exact matching: These methods usually fail when tasked with finding similar, but not identical objects (e.g., try using GOOGLE GOGGLES app to find a cup, or a chair).
Approximate matching : • Most focus on employing various image representations that aim to capture the important, salient parts of the image. (GIST, HoG) • Content-Based Image Retrieval (CBIR), the aim is to retrieve semantically-relevant images, even if they do not appear to be visually similar.
Cross-domain matching: • Particular domains • sketches to photographs [Chen et al. 2009; Eitz et al. 2010] • photos under different illuminants [Chong et al. 2008] • Across multiple domains • Matching local selfsimilarities across images and videos work by Shechtmanand Irani [2007]
each query image decides what is the best way to weight its constituent parts.
Approach • There are two requirements for a good visual similarity function: • It has to focus on the content of the image (the “what”), rather that the style (the “how”). • It should be scene-dependent.
Data-driven Uniqueness: • Re-weightthe different elements of an image based on how unique they are, the resulting similarity function would. • Compute uniqueness in a data-driven way — against a very large dataset of randomly selected images. • The features that would best discriminate this image (the positive sample) against the rest of the data (the negative samples).
Given the learned, query-dependent weight vector wq, the visual similarity between a query image Iq and any other image/sub-image Ii can be defined simply as: where xi is Ii’s extracted feature vector. • We employ the linear Support Vector Machine (SVM) to learn the feature weight vector. • image feature : Histogram of Oriented Gradients (HOG) template descriptor
To visualize how the SVM captures the notion of data-driven uniqueness:
Algorithm Description: • Learning the weight vector wqamounts to minimizing the following convex objective function: • Each query image (Iq) is represented with a rigid grid-like HoGfeature template (xq). • Due to image misalignment, we create a set of extrapositive data-points, P, by applying smalltransformations (shift,scale and aspect ratio) to the query image Iq, and generating xi foreach sample. • The SVM classifier is learned using IqandP aspositive samples. • Set containing millions of sub-images N(extracted from 10,000 randomly selected Flickr images), as negatives. • We use LIBSVM [Chang and Lin 2011] for learning wqwith a common regularization parameter λ= 100 and the standard hinge loss function h(x) = max(0,1-x).
Experimental Validation • To demonstrate our approach, we performed a number of imagematching experiments on different image datasets, comparingagainst the following popular baseline methods: • Tiny Images • GIST • BoW • Spatial Pyramid • Normalized-HoG (N-HoG)
Sketch-to-Image Matching • We collected a dataset of 50sketches(25 cars and 25 bicycles) to be used as queries. • The sketches were used to query into the PASCALVOC dataset [Everingham et al. 2007].
For quantitative evaluation, we compared how many car and bicycle images were retrieved in the top-K images for car and bicycle sketches respectively. • We used the bounded mean Average Precision (mAP) metric used by [J´egou et al. 2008].
Painting-to-Image Matching • We collected a dataset of 50 paintings of outdoor scenes in a diverse set of painting styles geographical locations. • The retrieval set was sub-sampled from the 6:4M GPS-tagged Flickr images of [Hays and Efros 2008]. • For each query, we created a set of 5,000 images randomly sampled within a 50 mile radius of each painting’s location.
Applications • Internet Re-photography
Limitations • Two main failure modes : • We fail to find a good match due to the relatively small size of our dataset (10,000 images) compared to Google’s billions of indexed images. • The query scene is so cluttered that it is difficult for any algorithm to decide which parts of the scene – the car, the people on sidewalk, the building in the background – it should focus on.