Visual Object Recognition

Visual Object Recognition Bastian Leibe & Computer Vision Laboratory ETH Zurich Chicago, 14.07.2008 Kristen Grauman Department of Computer Sciences University of Texas in Austin

Outline Detection with Global Appearance & Sliding Windows Local Invariant Features: Detection & Description Specific Object Recognition with Local Features ― Coffee Break ― Visual Words: Indexing, Bags of Words Categorization Matching Local Features Part-Based Models for Categorization Current Challenges and Research Directions 2 K. Grauman, B. Leibe

Recognition with Local Features • Image content is transformed into local features that are invariant to translation, rotation, and scale • Goal: Verify if they belong to a consistent configuration Local Features, e.g. SIFT K. Grauman, B. Leibe Slide credit: David Lowe

Finding Consistent Configurations • Global spatial models • Generalized Hough Transform [Lowe99] • RANSAC [Obdrzalek02, Chum05, Nister06] • Basic assumption: object is planar • Assumption is often justified in practice • Valid for many structures on buildings • Sufficient for small viewpoint variations on 3D objects K. Grauman, B. Leibe

y ρ x θ Hough Transform • Origin: Detection of straight lines in clutter • Basic idea: each candidate point votes for all lines that it is consistent with. • Votes are accumulated in quantized array • Local maxima correspond to candidate lines • Representation of a line • Usual form y = a x + b has a singularity around 90º. • Better parameterization: x cos() + y sin() = K. Grauman, B. Leibe

Hough Transform: Noisy Line • Problem: Finding the true maximum ρ θ Tokens Votes K. Grauman, B. Leibe Slide credit: David Lowe

Hough Transform: Noisy Input • Problem: Lots of spurious maxima ρ θ Tokens Votes K. Grauman, B. Leibe Slide credit: David Lowe

Generalized Hough Transform [Ballard81] • Generalization for an arbitrary contour or shape • Choose reference point for the contour (e.g. center) • For each point on the contour remember where it is located w.r.t. to the reference point • Remember radius r and angle relative to the contour tangent • Recognition: whenever you find a contour point, calculate the tangent angle and ‘vote’ for all possible reference points • Instead of reference point, can also vote for transformation  The same idea can be used with local features! K. Grauman, B. Leibe Slide credit: Bernt Schiele

Gen. Hough Transform with Local Features • For every feature, store possible “occurrences” • For new image, let the matched features vote for possible object positions • Object identity • Pose • Relative position K. Grauman, B. Leibe

3D Object Recognition • Gen. HT for Recognition • Typically only 3 feature matches needed for recognition • Extra matches provide robustness • Affine model can be used for planar objects [Lowe99] K. Grauman, B. Leibe Slide credit: David Lowe

View Interpolation • Training • Training views from similar viewpoints are clusteredbased on feature matches. • Matching features between adjacent views are linked. • Recognition • Feature matches may bespread over several training viewpoints.  Use the known links to “transfer votes” to other viewpoints. [Lowe01] K. Grauman, B. Leibe Slide credit: David Lowe

Recognition Using View Interpolation [Lowe01] K. Grauman, B. Leibe Slide credit: David Lowe

Location Recognition Training [Lowe04] K. Grauman, B. Leibe Slide credit: David Lowe

Applications • Sony Aibo(Evolution Robotics) • SIFT usage • Recognize docking station • Communicate with visual cards • Other uses • Place recognition • Loop closure in SLAM K. Grauman, B. Leibe Slide credit: David Lowe

RANSAC (RANdom SAmple Consensus) [Fischler81] • Randomly choose a minimal subset of data points necessary to fit a model (a sample) • Points within some distance threshold t of model are a consensus set. Size of consensus set is model’s support. • Repeat for N samples; model with biggest support is most robust fit • Points within distance t of best model are inliers • Fit final model to all inliers K. Grauman, B. Leibe Slide credit: David Lowe

RANSAC: How many samples? • How many samples are needed? • Suppose wis fraction of inliers (points from line). • n points needed to define hypothesis (2 for lines) • ksamples chosen. • Prob. that a single sample of n points is correct: • Prob. that all samples fail is:  Choose k high enough to keep this below desired failure rate. K. Grauman, B. Leibe Slide credit: David Lowe

After RANSAC • RANSAC divides data into inliers and outliers and yields estimate computed from minimal set of inliers • Improve this initial estimate with estimation over all inliers (e.g. with standard least-squares minimization) • But this may change inliers, so alternate fitting with re-classification as inlier/outlier K. Grauman, B. Leibe Slide credit: David Lowe

Example: Finding Feature Matches • Find best stereo match within a square search window (here 300 pixels2) • Global transformation model: epipolar geometry from Hartley & Zisserman K. Grauman, B. Leibe Slide credit: David Lowe

Example: Finding Feature Matches • Find best stereo match within a square search window (here 300 pixels2) • Global transformation model: epipolar geometry before RANSAC after RANSAC from Hartley & Zisserman K. Grauman, B. Leibe Slide credit: David Lowe

Gen. Hough Transform Advantages Very effective for recognizing arbitrary shapes or objects Can handle high percentage of outliers (>95%) Extracts groupings from clutter in linear time Disadvantages Quantization issues Only practical for small number of dimensions (up to 4) Improvements available Probabilistic Extensions Continuous Voting Space RANSAC Advantages General method suited to large range of problems Easy to implement Independent of number of dimensions Disadvantages Only handles moderate number of outliers (<50%) Many variants available, e.g. PROSAC: Progressive RANSAC [Chum05] Preemptive RANSAC [Nister05] Comparison [Leibe08] K. Grauman, B. Leibe

Aachen Cathedral Example Applications • Mobile tourist guide • Self-localization • Object/building recognition • Photo/video augmentation [Quack, Leibe, Van Gool, CIVR’08] B. Leibe

Web Demo: Movie Poster Recognition 50’000 movieposters indexed Query-by-imagefrom mobile phoneavailable in Switzer-land http://www.kooaba.com/en/products_engine.html# K. Grauman, B. Leibe

Application: Large-Scale Retrieval Query Results from 5k Flickr images (demo available for 100k set) [Philbin CVPR’07] K. Grauman, B. Leibe

Application: Image Auto-Annotation Moulin Rouge Old Town Square (Prague) Tour Montparnasse Colosseum ViktualienmarktMaypole Left: Wikipedia imageRight: closest match from Flickr [Quack CIVR’08] K. Grauman, B. Leibe

Outline Detection with Global Appearance & Sliding Windows Local Invariant Features: Detection & Description Specific Object Recognition with Local Features ― Coffee Break ― Visual Words: Indexing, Bags of Words Categorization Matching Local Features Part-Based Models for Categorization Current Challenges and Research Directions 29 K. Grauman, B. Leibe

Visual Object Recognition