280 likes | 433 Views
Invariants (concluded); Lowe and Biederman. Announcements. No class Thursday. Attend Rao lecture. Double-check your paper assignments. Key Points. Rigid rotation is 3x3 orthonormal matrix. 3-D Translation is 3x4 matrix. 3-D Translation + Rotation is 3x4 matrix.
E N D
Announcements • No class Thursday. Attend Rao lecture. • Double-check your paper assignments.
Key Points • Rigid rotation is 3x3 orthonormal matrix. • 3-D Translation is 3x4 matrix. • 3-D Translation + Rotation is 3x4 matrix. • Scaled Orthographic Projection: Remove row three and allow scaling. • Planar Object, remove column 3. • Projective Transformations • Rigid Rotation of Planar Object Represented by 3x3 matrix. • When we write in homogeneous coordinates, projection implicit. • When we drop rigidity, 3x3 matrix is arbitrary.
Projective Rigid rotation and translation. Notation suggests that first two columns are orthonormal, and transformation has 6 degrees of freedom. Projective Transformation Notation suggests that transformation is unconstrained linear transformation. Points in homogenous coordinates are equivalent. Transformation has 8 degrees of freedom, because its scale is arbitrary.
Lines: Parameterization • Equation for line: ax+by+c=0. • Parameterize line as l = (a,b,c)T. • p=(x,y,1)T is on line if <p,l>=0.
Line Intersection • The intersection of l and l’ is l x l’ (where x denotes the cross product). • This follows from the fact that the cross product is orthogonal to both lines.
Intersection of Parallel Lines • Suppose l and l’ are parallel. We can write l=(a,b,c), l’ = (a,b,c’). l x l’ = (c’-c)(b,-a,0). This equivalent to (b,-a,0). • This point corresponds to a line through the focal point that doesn’t intersect the image plane. • We can think of the real plane as points (a,b,c) where c isn’t equal to 0. When c = 0, we say these points lie on the ideal line at infinity. • Note that a projective transformation can map this to another line, the horizon, which we see.
Invariants of Lines • Notice that affine transformations are the subgroup of projective transformations in which the last row is (0, 0, 1). • These map the line at infinity to itself. • So parallel lines are affine invariants, since they continue to intersect at infinity.
Invariance in 3D to 2D • 3D to 2D “Invariance” isn’t captured by mathematical definition of invariance because 3D to 2D transformations don’t form a group. • You can’t compose or invert them. • Definition: Let f be a function on images. We say f is an invariant iff for every Object O, if I1 and I2 are images of O, f(I1)=f(I2). • This means we can define f(O) as f(I) for I any image of O. O and I match only if f(O)=f(I). • f is a non-trivial invariant if there exist two image I1 and I2 such that f(I1)~=f(I2).
Non-Invariance in 3D to 2D • Theorem: Assume valid objects are any 3D point sets of size k, for some k. Then there are no non-trivial invariants of the images of these objects under perspective projection.
Proof Strategy • Let f be an invariant. • Suppose two objects, A and B have a common image. Then f(I)=f(J) if I and J are images of either A or B. • Given any O0, Ok, we construct a series of objects, O1, …, O(k-1), so that Oi and O(i+1) have a common image for all i, and Ok and j have a common image. • So for any pair of images, I, J, from any two objects, f(I) = f(J).
Constructing O1 … Ok-1 • Oi has its first i points identical to the first i points of Ok, and the remaining points identical to the remaining points of O0. • If two objects are identical except for one point, they produce the same image when viewed along a line joining those two points. • Along that line, those two points look the same. • The remaining points always look the same.
Summary • Planar objects give rise to rich set of invariants. • 3-D objects have no invariants. • We can deal with this by focusing on planar portions of objects. • Or special restricted classes of objects. • Or by relaxing notion of invariants. • However, invariants have become less popular in computer vision due to these limitations.
Lowe and Biederman • Background • Viewpoint Invariant Non-Accidental Properties. • Lowe sees these as probabilistic. • Biederman drops this. • Primitive properties • Composing them into units/geons. • Use in Recognition. • Speed search. • Geons: analogy to speech. • Evidence for Value. • Computational speed. • Human psychology: parts; qualitative descriptions; view invariance.
Background • Computational • 2D approach to recognition. • Lowe is reacting to Marr. • Partly due to Lowe, recognition rarely involves reconstruction now. (But also 3D models more rare). • State of the art: • Little recognition of 3D objects, grouping implicit. • Speed, robustness a big concern. • 2D recognition through search. • Psychology • Much more ambitious and specific than any prior theory of recognition (I believe). • P.O. widely studied, rarely related to other tasks. • Contrast. • CS must account for low-level processing. • Psych must account for categorization.
Viewpoint Invariant NAPs • Non-Accidental Property • Happens rarely by chance • More frequently by scene structure. • p = property, c = chance, s = structure. Jepson and Richards consider this This is high due to viewpoint invariance. Lowe focuses on this • Biederman downplays probabilistic inference. • Not concerned with background, feature detection.
Examples (Copied from Lowe)
Issues with Non-Accidental Properties • Is it “just” Bayesian inference? • Then why not model all information? • This may fit Lowe • Biederman relies more on certain inference. • See also Feldman, Jepson, Richards.
Viewpoint Invariance • Match properties that are invariant to viewing conditions. • Parallelism, symmetry, collinearity, cotermination, straightness. • Lowe picks one side of property, Biederman stresses contrast. Why? • How used? • Lowe, correspondence of geometric features. Speed up search • Description of parts for indexing.
Geons • Biederman, description of geons. Are they still view invariant when describing a geon? • 3D shape’s occluding contour depends on viewpoint. May be straight from one view, curved from another. • Metric properties not truly invariant. • Maybe more like quasi-invariants.
Geons for Recognition • Analogy to speech. • 36 different geons. • Different relations between them. • Millions of ways of putting a few geons together.
Empirical Support for Geons • First, divide geons predictions: • Part structure is important in recognition. • Perceptual grouping can be used for filling in. • NAPs are used for indexing. • View invariant descriptions. • Qualitative descriptions. • Second, what is alternative? • View-based recognition with many examples.
Empirical Support • Recognition is fast. Fine metric judgments are slow. • Does this disqualify other approaches? • Recognition is view-invariant. • Does this disqualify other approaches? • Number of geon descriptions sufficient for number of categories we recognize. • Argues plausibility, but no more.
Empirical Support (2) • 2-4 Geons needed for recognition. Complex objects no harder than simple ones. • Line Drawings vs. Colored images. Color similar speed.
Empirical Support (3): Degraded Objects • Deleting contours that interfere with geon structure interferes more. • Deleting Components worse than midsections. • This argues for perceptual organization for interpolation/reconstruction. But for geons? • Should we measure information deleted rather than contour length?
Conclusions • Maybe helpful to separate: • Perceptual organization/completion. • View Invariance • Part Structure. • All three widely used in computer vision. • Biederman’s paper probably addresses view-invariance least. • This became subject of much research.