Image Context, Efficient Indexing, and Sense-Specific Category Models

Image Context, Efficient Indexing, and Sense-Specific Category Models Trevor Darrell Kristen Grauman(*) Tom Yeh Kate Saenko MIT CSAIL  UC Berkeley EECS & ICSI (*) UT Austin CS

Outline • Photo-based Question Answering • Tom Yeh • Efficient indexing with local image features • Kristen Grauman • Multimodal Sense Disambiguation for Visual Category Models • Kate Saenko

Photo-based Question Answering Tom Yeh John Lee Trevor Darrell MIT CSAIL UC Berkeley EECS & ICSI

Text-based QA Systems Yahoo! Answers

Text-based versus Photo-based QA

Difficult photo-based QA can be handled by the community

An easier example

An easier example Current image matching and question matching technologies enable us to handle simpler photo-based QA automatically.

System architecture How many floors? Template-based QA Who is the architect? Is there any problem? Books Buildings WWW Frank Gehry Layer 1 IR-based QA Resolved Questions How many stories? 9 floors Layer 2 Human-based QA People are getting lost a lot. Community Layer 3 What labs are here? CSAIL

Prototype 1: Adding photos to a text-based QA system 1 2 3 4 5 6 7

Prototype 2: Adding QA to a photo-album system 1 2 3 4 5

Prototype 3: Applying photo-based QA to mobile devices. 1 2 3 4 5 6 7

Our pilot multimedia dataset

Sample questions

Sample match results

Efficient Image Indexing Methods for Scene and Object Recognition Trevor Darrell UC-Berkeley EECS & ICSI Kristen Grauman University of Texas at Austin Dept. of Computer Sciences

Fast image indexing Goal: to recognize locations and objects, match queries by image content.

Fast image indexing Large and evolving image repository • Key technical challenges: • Robustness to variable viewing conditions • Queries are time-sensitive, but database is huge • Approach: develop sub-linear time search methods for “good” image representations and metrics.

Local Features • Local features provide invariance to geometric and photometric variation • Want fast correspondence-based search with local features

Intra-class appearance Local image features Illumination Object pose Clutter Occlusions Viewpoint

Maximally Stable Extremal Regions [Matas et al.] Shape context [Belongie et al.] Superpixels [Ren et al.] SIFT [Lowe] Spin images [Johnson and Hebert] Geometric Blur [Berg et al.] Local image features Describe component regions or patches separately Salient regions [Kadir et al.] Harris-Affine [Schmid et al.]

Partially matching sets of features Optimal match: O(m3) Greedy match: O(m2 log m) Pyramid match: O(m) Approximation makes large sets of features practical (m=num pts). Optimal match maximizes total similarity of matched points. [Grauman & Darrell, ICCV 2005]

Counting matches with intersection Histogram intersection

Example pyramid match Num “new” matches

Example pyramid match

Example pyramid match pyramid match optimal match

How to index efficiently over correspondences? N 3 2 ? 1 Most similar images according to local feature correspondences Query image Large database of images Approximate matching

Image search with matching-sensitive hash functions • Main idea: • Map point sets to a vector space in such a way that a dot product reflects partial match similarity (normalized pyramid match value). • Exploit random hyperplane properties to construct matching-sensitive hash functions. • Perform approximate similarity search on hashed examples. [Grauman & Darrell, CVPR 2007]

Locality Sensitive Hashing (LSH) N Xi h h r1…rk r1…rk Q Guarantee “approximate”-nearest neighbors in sub-linear time, given appropriate hash functions. << N 110101 110111 Q 111101 [Indyk and Motwani 1998, Charikar 2002]

LSH functions for dot products The probability that a randomhyperplane separates two unit vectors is related to the angle between them. for High dot product: unlikely to split Lower dot product: likely to split [Goemans and Williamson 1995, Charikar 2004]

[ 1, 0, 3 ] A useful property of intersection histograms padded unary encoding = [1, 3, 5] = [ 1 0 0 0 0 1 1 1 0 0 1 1 1 1 1 ] = [2, 0, 3] = [ 1 1 0 0 0 0 0 0 0 0 1 1 1 0 0 ] [1+0+0+0+0+0+0+0+0+0+1+1+1+0+0]

Pyramid match definition ~ Intersection diff. = number of new matches Pyramid match (un-normalized) expressed as sum of weighted intersections

w0-w1 w1-w2 w2-w3 w3 Vector encoding of pyramids [11110,… 00000,… 11110,… 11110,… 11110,… 00000,… 11110,… 00000,… 11000,… 11110,… 11000,… 11000,… 11100,… 11000,… 11111] Weighted sparse count vector Implicit unary encoding Point set Multi-resolution histogram Sparse count vector

w0-w1 w1-w2 w2-w3 w3 Vector encoding of pyramids w0-w1 w1-w2 Dot product between embedded point sets yields pyramid match kernel value w2-w3 w3 Length of an embedded point set is equivalent to its self-similarity

Matching-sensitive hash functions Normalized pyramid match kernel value Probability of collision (hash bits equal) Probability of collision Normalized partial match similarity

N Xi h h r1…rk r1…rk Q Pyramid match hashing Randomized hash functions Embed point sets as pyramids Probability of collision = normalized partial match similarity << N 110101 110111 Q 111101 Guaranteed retrieval of -approx NN in time.

Indexing object images • Caltech101 data set • 101 categories 40-800 images per class • Features: • Densely sampled • SIFT descriptor + spatial • Average m=1140 per set Query object Data provided by Fei-Fei, Fergus, and Perona

Results: indexing object images • Query time controlled by required accuracy • e.g., search less than 2% of database examples for accuracy close to linear scan k-NN error rate Epsilon (ε) slower search faster search

Summary • Content-based queries for location recognition demand fast search algorithms for useful image metrics. • Contributions: • Scalable matching for local representations • Sub-linear time search with matching • Recently extended to semi-supervised hash functions for learned metrics • (See Jain, Kulis, & Grauman, CVPR 2008)

Trevor Darrell trevor@eecs.berkeley.edu Kristen Grauman grauman@cs.utexas.edu • Relevant papers: • P. Jain, B. Kulis, and K. Grauman. Fast Image Search for Learned Metrics. To appear, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Anchorage, Alaska, June 2008. • K. Grauman and T. Darrell. Pyramid Match Hashing: Sub-Linear Time Indexing Over Partial Correspondences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Minneapolis, MN, June 2007. • K. Grauman and T. Darrell. The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Beijing, China, October 2005.

Multimodal Sense Disambiguation for Semi-Supervised Learning of Object Categories from the Web Kate Saenko Trevor Darrell MIT CSAIL UC Berkeley EECS & ICSI

Clutter and Sense ambiguity • Tag-based retrieval returns a lot of clutter • One approach: bootstrap from seed image set • E.g., Fei-Fei et al., OPTIMOL • But how to get unusual apperances of category?

Topic models for image clustering • Latent Dirchlet Allocation • Unsupervised learning of latent topic space • Distance in topic space groups together similar images

Mouse? A multimodal similarity measure can discover unusual appearances

Fused LDA

Multiple senses • Bass: Fish? Musical Instrument? • Mouse: Computer? Animal? • Topic model allows segregation of distinct senses: • use seed data to identify inlier multimodal topics, • two possible approaches: 1) select either single best inlier topic, or 2) threshold to multiple topics • compute distance based on selected latent dimensions

Image Context, Efficient Indexing, and Sense-Specific Category Models

Image Context, Efficient Indexing, and Sense-Specific Category Models

Presentation Transcript

Efficient Image Scene Analysis and Applications

CommonKADS Context Models

iDistance -- Indexing the Distance An Efficient Approach to KNN Indexing

Context Dependent Models

Image indexing and Retrieval Using Histogram Based Methods,

Program-Context Specific Buffer Caching

Feature Selection for Satellite Image Indexing

Efficient Indexing of Versioned Document Sequences

Context within Common Sense

Context-Specific CPDs

Models of Models: Digital Forensics and Domain-Specific Languages

Image Indexing and Retrieval using Moment Invariants

Models of Context

Making Sense of Models

Electrophysiology Subject Specific Models

Multimodal Semantic Indexing for Image Retrieval

Domain Specific Models

CommonKADS Context Models