Object Recognition

Object Recognition CS485/685 Computer Vision Dr. George Bebis

Object Recognition • Model-based Object Recognition • Generic Object Recognition

Model-Based Object Recognition • Recognition relies upon the existence of a set of predefined objects.

Generic Object Recognition (or Object Categorization)

Main Steps • Preprocessing • A model database is built by establishing associations between features and models. • Recognition • Scene features are used to retrieve appropriate associations stored in the model database.

Challenges • The appearance of an object can have a large range of variation due to: • viewpoint changes • shape changes (e.g., non-rigid objects) • photometric effects • scene clutter • Different views of the same object can give rise to widely different images!

Requirements • Invariance • Geometric transformations (translation, rotation, scale) • Caused by viewpoint changes due to camera/object motion • Robustness • Noise (i.e., sensor noise) • Detection errors (e.g., edge or corner detection) • Illumination/Shadows • Partial occlusion (i.e., self and from other objects) • Intrinsic shape distortions (i.e., non-rigid objects)

Performance Criteria (1) Scope • What kind of objects can be recognized and in what kind of scenes ? (2) Robustness • Does the method tolerate reasonable amounts of noise and occlusion in the scene ? • Does it degrade gracefully as those tolerances are exceeded ?

Performance Criteria (cont’d) (3) Efficiency • How much time and memory are required to search the solution space ? (4) Accuracy • Correct recognition • False positives (wrong recognitions) • False negatives (missed recognitions)

Approaches Differ According To: • Restrictions on the form of the objects • 2D or 3D objects • Simple vs complex objects • Rigid vs deforming objects • Representation schemes • Object-centered • Viewer-centered

Approaches Differ According To: (cont’d) • Matching scheme • Geometry-based (i.e., shape) • Appearance-based (e.g., eigenfaces) • Type of features • Local (i.e., SIFT) • Global (e.g., eigen-coefficients)

Approaches Differ According To: (cont’d) • Image formation model • Perspective projection • Orthographic projection + scale • Affine transformation (e.g., for planar objects)

Representation Schemes • (2) Viewer-centered • (1) Object-centered R. Chin and C. Dyer, “Model-based recognition in robot vision,” Computing Surveys, vol. 18, no. 1, pp. 67–108, 1986.

Object-centered Representation • A 3D model of the object is available. • Advantage: every view of the object is available. • Disadvantage: might not be easy to build.

Object-centered Representation (cont’d) • Two different matching approaches: (1) (3D/3D): reconstruct object from the scene and match it with the models. e.g., using “shape from X” methods (2) (3D/2D): back-project candidate model onto the scene and match it with the objects in the scene. i.e., requires camera calibration

Viewer-centered Representation • Objects are described by a set of characteristic views or aspects • Advantages: • - Easier to build compared to object-centered. • - Matching is easier since it involves 2D information. • Disadvantages: • - Requires a large number of views.

Matching Schemes (1) Geometry-based - Employ geometric features (2) Appearance-based - Represent objects from many possible viewpoints and illumination directions using dimensionality reduction.

Geometry-based Recognition • Identify a group of features from an unknown scene which approximately match a group of features from a model object (i.e., correspondences). (2) Recover the geometric transformation that the model object has undergone and look for additional matches. scene model

2D Transformation Spaces • Rigid transformations (3 parameters) • Similarity transformations (4 parameters) • Affine transformations (6 parameters)

Matching – Main Steps • Hypothesis generation: the identities of one or more models are hypothesized. • Hypothesis verification: tests are performed to check if a given hypothesis is correct or not. Recognition

Hypothesis Verification - Example two hypotheses: Features might correspond to: (1) curvature extrema or zero-crossings along the boundary of an object . (2) interest points.

Hypothesis Generation • How to choose the scene groups? • Do we need to consider every possible group? • How to find groups of features that are likely to belong to the same object? • “Grouping”schemes might be helpful. • “Local descriptors” might be helpful.

Grouping • Employs non-accidental properties based on perceptual organization rules to extract groups of features that are likely to come from the same object. • e.g., orientation, co-linearity, parallelism, proximity, convexity D. Jacobs, Robust and Efficient Detection of Convex Groups, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 18, no. 1, pp. 23-37, 1996.

Hypothesis Generation (cont’d) • How to organize and search the model database? • Do we need to search the whole database of models? • How should we organize the model database to allow for fast and efficient storage and retrieval? • “Indexing”schemes are helpful.

We will discuss … • Alignment • Pose Clustering • Geometric Hashing (i.e., indexing-based)

Recognition by Alignment The alignment approach seeks to recover the geometric transformation between the model and the scene using a minimum number of correspondences. Assuming affine transformations, three correspondences are enough. D. Huttenlocher and S. Ullman, "Object Recognition Using Alignment, International Conference on Computer Vision, pp. 102-111, 1987.

Recognition Using Alignment (cont’d) Under the assumption that objects are “flat” and the camera is not very close to the objects, different 2D views of a 3D object can be related by an affine transformation

Recognition Using Alignment (cont’d) p2 p’3 p1 p’2 p’1 p3 • Need at least three correspondences to recover the affine transformation.

Recognition Using Pose Clustering • Performs “voting” in transformation space: - Correct transformations bring into alignment a large number of features. - Incorrect transformations bring into alignment a small number of features. • Transformations that can bring into alignment a large number of features, will receive a large number of votes (i.e., support). C. Olson, "Efficient Pose Clustering Using a Randomized Algorithm“, International Journal of Computer Vision, vol. 23, no. 2, pp. 131-147, 1997.

Pose Clustering (cont’d) • Main Steps (1) Quantize the space of possible transformations (usually 4D - 6D). (2) For each hypothetical match, solve for the transformation that aligns the matched features. (3) Cast a “vote” in the corresponding transformation space bin. (4) Find "peak" in transformation space.

Pose Clustering (cont’d) Example:

Indexing-Based Recognition (cont’d) • Preprocessing step • (i) Groups of model features are used to index the database. • (ii) The indexed locations are filled with entries containing references to the model objects and information that later can be used for recognition.

Indexing-Based Recognition • Recognition step • (i) Groups of scene features are used to index the database. • (ii) The model objects listed in the indexed locations are collected into a list of candidate models (i.e., hypotheses). • .

Indexing-Based Recognition (cont’d) • Ideally, we would like the index computed from a group of model features to be invariant. • i.e., its properties will not change with object transformations or viewpoint changes. • Only one entry per group needs to be stored!

Geometric Hashing Y. Lamdan, J. Schwartz, and H. Wolfson, "Affine invariant model-based object recognition" , IEEE Transactions on Robotics and Automation, vol. 6, no. 5, 1990. 2. H. Wolfson and I. Rigoutsos, "Geometric Hashing: An Overview" , IEEE Computational Science and Engineering, pp. 10-21, October-December 1997. Indexing using affine invariants. Index remains invariant under affine transformations.

Geometric Hashing (cont’d) • Each triplet of non-collinear model points forms a basis of a coordinate system that is invariant under affine transformations. • Represent model points in an affine invariant way by rewriting them in terms of this coordinate system. (u,v) are affine invariant!

Geometric Hashing (cont’d) (model, (p1, p2, p3))

Geometric Hashing: Preprocessing Step For each model do: (1) Extract a set of interest points. (2) For each ordered set of three, non-collinear, points (p1, p2, p3) (2a) Compute the coordinates (u,v) of the remaining features in the coordinate frame defined by the model basis (p1, p2, p3) (2b) After proper scaling and quantization, use the computed coordinates (u,v) as an index to a two dimensional hash table, and record in the corresponding hash table bin the information (model, (p1, p2, p3))

Geometric Hashing: Preprocessing Step hash function: h(u,v) Hash Table maxv • h(u,v): scale + quantization • u’  su * u v’  sv * v • u’’ Q(u’) v’’Q(v’) • h(u,v)=(u’’, v’’) miny minu maxu

Geometric Hashing: Preprocessing Step Hash Table maxv miny minu maxu entries (model1, (p3, p2, p4)) (model1, (p1, p2, p3)) etc… (model2, (p1, p2, p3))

Geometric Hashing: Recognition Step (1) Extract a set of interest points from the scene. (2) Choose an arbitrary ordered pair (p’1, p’2, p’3) (3) Compute the coordinates (u’,v’), of the remaining feature points in the coordinate frame defined by the image basis (p’1, p’2, p’3) (4) After proper scaling and quantization, use the computed coordinates as an index to the hash table. For every entry (model, (p1, p2, p3)) found in the corresponding bin, cast a vote.

Geometric Hashing: Preprocessing Step Hash Table maxv miny minu maxu entries (model1, (p3, p2, p4)) (model1, (p1, p2, p3)) etc… (model2, (p1, p2, p3))

Geometric Hashing: Recognition Step (cont’d) (5) Histogram all hash table entries that received one or more votes. Find those entries that received more than a certain number of votes -- each such entry corresponds to a potential match (hypothesis generation). e.g., for scene triplet (p’1, p’2, p’3) votes (model1, (pi, pj, pk)) …. (model2, (pl, pm, pn))

Geometric Hashing: Recognition Step (cont’d) (6) For each potential match (i.e., hypothesis), recover the affine transformation A (i.e., using SVD) that aligns the corresponding model-scene interest points. A (model2, (pl, pm, pn)) (p’1, p’2, p’3)

Geometric Hashing: Recognition Step (cont’d) (7) Back-project the model onto the scene using the computed transform and compare the model with the scene to find additional matches (verification step). (8) If verification fails for all hypotheses step (5), go back to step (2) and repeat the procedure using a different group of scene features.

Geometric Hashing: Example Scene Model

Geometric Hashing: Example (cont’d) Bad hypothesis Good hypothesis

Geometric Hashing: Complexity • Preprocessing Step: O(Mm4) • Recognition Step: O(i4Mm4) (M: #models, m: #model points, i: #scene points)

Object Recognition