240 likes | 436 Views
Recognition by Linear Combinations of Models. By Shimon Ullman & Rosen Basri Presented by: Piotr Dollar Nov. 19, 2002. Key Ideas. A 2d image that has undergone a linear transformation can be expressed as a linear combination of a few other 2d images! This is very, very cool!
E N D
Recognition by Linear Combinations of Models By Shimon Ullman & Rosen Basri Presented by: Piotr Dollar Nov. 19, 2002
Key Ideas • A 2d image that has undergone a linear transformation can be expressed as a linear combination of a few other 2d images! • This is very, very cool! • Note:not a linear combination of pixels – rather a linear combination of point coordinates • Given a new 2d image of an object, we can test to see if it is a transformed version of a model object by seeing if it is a linear combination of 2d model images. Object Recognition.
Aside: Orthographic Projection • Given an object we take its orthographic projection (lose z coordinate). This is an approximation of the perspective projection – makes math more tractable. • So long as the object is small compared to its distance from the camera the orthographic projection the approximation is good
Producing the 2D image • Take the orthographic projection of an object to get a 2D edge map • The rim is the “set of points on object’s surface whose normal is perpendicular to the viewing direction”, and the orthographic projection of the rim generates the silhouette.
Objects w Smooth Boundaries • More difficult, since as perspective changes, so do the points on the rim. • We can estimate curvature at each point and thus how the boundaries would change • This is a generalization since objects with sharp edges can be dealt with as smooth but having very little curvature • Leads to ugly math, but same high level idea • Serge said I can skip this, so I will! From now on we will deal only with objects with sharp edges.
Rest of Presentation • For most of talk, we focus only the case of rotation about the vertical axis. Going through the details will demonstrate many of the key ideas of Ullman’s approach. • Next we will show how to find the coefficient of the linear combination. • Finally we show how to modify algorithm if general linear and rigid transformations allowed. • This is a different format from the paper.
Ok here we go
Rotation around the Vertical Axis • z is the viewing direction, x is the horizontal axis, y is the vertical axis. • For now assume no occlusion • Given an object O, let P1 be an image (orthographic) of O, P2 another image of O after it has been rotated by α (where α != kπ), and P an image of O after a rotation of O by θ • The projection of p=(x, y, z) from O is: • p1 = (x1, y1) = (x, y) in P1 • p2 = (x2, y2) = (x cos α + z sin α, y) in P2 • p = (x, y) = (x cos θ + z sin θ, y) in P
The Cool Part • For any θ there exist an a, b such that for every such point p in O: x = ax1 + bx2 • That is every point in the third image is a linear combination of the point in the first and second image. So, if the image had k points with coordinates (x1, y1)…(xk, yk), and after rotating by α the points had coordinates (x21, y21)…(x2k, y2k), then after a rotation by θ the points would have x-coordinates: [x1,… xk] = a[x1,… xk] + b[x21,… x2k]
Proof • Let: • a = sin(α-θ) / sin α • b = sin θ / sin α • Then: ax1 + bx2 = = sin(α-θ) / sin α + sin θ / sin α (x cos α + z sin α) = x cos θ + z sin θ • That’s it.
Application • Suppose we have the image P1 and P2 of the object 0. Now we are given some new image P (with labeled points) and asked if P could be an image of O after a rotation about the vertical axis. • Intuition: if we can show there is an a, b such that [x1,… xk] = a[x1,… xk] + b[x21,… x2k], then it is possible that P is an image of O (but we can never be sure). If no such a, b exist then P cannot be an image of O.
But wait! (constraints on a and b) • However, recall that a and b are related: • a = sin(α-θ) / sin α • b = sin θ / sin α • Can show that the following relation must hold between a and b (just plug and chug): a2 + b2 + 2ab cos α = 1 • Thus, to show that P is possibly an image of O, a and b must satisfy the additional condition given above.
Testing the constraint • Note that in order to test (a2 + b2 + 2ab cos α = 1) we would need to know α. This poses a serious problem since all we have is two images P1 and P2. So how do we proceed?
Approach 1: Recover α From 3D Structure • This requires first recovering the 3D structure of the object O, which defeats the purpose! • One of the nicest things about Ullman’s method is that it does not require us to know 3D structure of object! If we had the 3D structure of the object than other methods could be used. • (If we wanted 3D structure we could use “structure from motion” (SFM) theorem which says that given 3 orthographic projection of 4 non-coplanar points we can recover the structure. Note that we would need an additional image of the model).
Approach 2:Recover α Directly • We can use the constraint itself to recover α. That is if we had an a, b that we were sure satisfied a2 + b2 + 2ab cos α = 1, then we could find α from this equation based on a and b. • If we know that a third image of O, call it P3, was taken after a rotation about the y-axis, and we find P3 in terms of P1 and P2, then we get a, b that we know satisfy the constraint and can thus calculate α. • Note that this again requires 3 model images, just like an application of SFM.
Approach 3: Ignore the constraint • That is do not test if a and b satisfy the constraint. This will increase the chance of “false positives” – the chance that P is a linear combination of P1 and P2 even though it is not an image of O. • Note that false positives are already possible (since an image P of some object O2 could look just like an image of O after some rotation would). • As the number of points increases the likelihood of a false positive falls drastically anyway (according to Ullman). • This is the approach he uses – although he tells us what constraints must be satisfied by the coefficients in different cases, he never uses these constraints.
Step back take a deep breath
Finding a and b • But how do we actually find a and b? • We are given k model images Mi. Each model image is simply two vectors – a vector of the x coordinates Mi_x and a vector of the y coordinates Mi_y. We are also given an image P as two vectors Px and Py. • We can choose k. For the case of rotation about the y-axis k must be at least 2. • Now we want to find a series of coefficients such that: • Px = c1 * M1_x + … + ck * Mk_x • Py = d1 * M1_y + … + dk * Mk_y (note that in the case of rotation about the vertical axis the y –coordinates of points d not change)
Minimal Alignment • We have 2k equations with 2k unknowns. Can solve this explicitly to get the coefficients ci and di. • Let X = [M1_x … Mk_x]. Then c = X-1 Px. • If we use an over-determined system (additional model images) than we take the pseudo-inverse of X.
Getting the Models and P • Select a series of features that appear in all the model images as well as the image P. Find the correspondences. This gives you Mi_x, Mi_y and Px, Py. • Occlusion not a problem whatsoever so long as we have good segmentation and correspondence algorithms. • Then, given, Mi_x, Mi_y and P…
Other approaches • Brute force: search for a, b. Why do this? • Linear Mappings: Find an L such that for any V that is a linear combination of the Mi_x, V*P = c * q where c is a scalar and q some fixed vector. • Just linear algebra, nothing particularly more interesting than minimal alignment.
Ok On to general linear transformations
General Linear Transformations • General Rotation in 3D space • Rigid Transformations & Scaling in 3D space • Using two view only (on board to avoid matrix in PPT)
Other References • High-level Vision by Ullman, MIT Press 1997 (especially chapter 5).