400 likes | 474 Views
Lecture 7. Moving past vision, to understand motion and video. Grounding image. Optic flow is 2d vector on image (u,v) Assuming: intensity only changes due to the motion. The derivitives are smooth Then we get a constraint: I x u + I y v + I t = 0 Defines line in velocity space
E N D
Lecture 7 • Moving past vision, to understand motion and video. Computer Vision, Robert Pless
Grounding image. Computer Vision, Robert Pless
Optic flow is 2d vector on image (u,v) • Assuming: • intensity only changes due to the motion. • The derivitives are smooth • Then we get a constraint: Ix u + Iy v + It = 0 • Defines line in velocity space • Require additional constraint to define optic flow. Computer Vision, Robert Pless
Solving the aperture problem • How to get more equations for a pixel? • Basic idea: impose additional constraints • most common is to assume that the flow field is smooth locally • one method: pretend the pixel’s neighbors have the same (u,v) • If we use a 5x5 window, that gives us 25 equations per pixel! Computer Vision, Robert Pless
More Optic Flow… then less optic flow. Computer Vision, Robert Pless
Optical flow result Computer Vision, Robert Pless
Better than “Assume 5x5 block is same” • Additional constraints are necessary to estimate optical flow, for example, constraints on size of derivatives, or parametric models of the velocity field. • Horn and Schunck (1981): global smoothness term • This approach is called regularization. • Solve by means of calculus of variation. Computer Vision, Robert Pless
Calculus • (init) Solve for blockwise optic flow. • For each pixel, update optic flow to be similar to neighbors, and (mostly) fit the optic flow constraint equation. Computer Vision, Robert Pless
Optic flow constraint Average of neighboring optic flows is one constraint. Solve for flow that minimizes combined error. Range of solutions Computer Vision, Robert Pless
Filling in blank areas. Computer Vision, Robert Pless
Illusions… and really strange problems…. • When a camera moves, a 3D scene point projects to different places on the image. This motion is called the optic flow. In the presence of noise, all methods to compute optic flow from images gives a bias. Hajime Ouchi, 1977 Spillman, 1993 Hine, Cook & Rogers, 1995,97 Khang & Essock, 1997 Pless, Fermuller, Aloimonos, 1999 Computer Vision, Robert Pless
Least squares solution • One patch gives a system: Computer Vision, Robert Pless
Bias Computer Vision, Robert Pless
y = ax + b, solve for a,b using least squares. If only y is messed up, you’re golden (or blue). If the x coordinates of your input is messed up, you’re hosed. Because the least squares is minimizing vertical distance between the points and the line. Computer Vision, Robert Pless
I care the you remember that solving Mu = b may be biased. • Assuming Gaussian noise, and small high order terms: • Asymptotically true for any symmetric distribution (Stewart, 97). • The expected bias can be explained in terms of the eigenvalues of M. But crazy electrical engineering/applied math types might like: Taylor expansion around zero noise Computer Vision, Robert Pless
If gradient distribution is uniform; M will be multiple of identity matrix, bias will only affect magnitude of the flow. • If only a unique gradient direction, inverse of M is not defined; this is the aperture problem. Computer Vision, Robert Pless
In the Ouchi Pattern • The change in gradient distribution leads to different biases in the computed optic flow. Computer Vision, Robert Pless
Hard to fix this bias… But can you avoid the bias? • Avoid computing optic flow as an intermediate distribution. • See if you can directly estimate the parameters you are interested in as a function of the image derivatives. • There may still be a bias, but if you are using all the image data to solve for a single set of unknowns, the effect of the bias is much less (inversely proportional to the number of data points). Computer Vision, Robert Pless
Small motions • Ix u + Iy v + It = 0 • Constraint! Maybe we don’t care about u,v? Computer Vision, Robert Pless
Small motions and image homographies… • u=(x’-x) v=(y’-y) • Ix (x’ – x)+ Iy (y’ – y) + It = 0 • Ix (Hx – x)+ Iy (Hy – y) + It = 0 • (should be satisfied everywhere in the image). • Solve for H that satisfies linear equ: • Ix (Hx – x)+ Iy (Hy – y) + It = 0 • Hx = (ax + by + c)/(gx + hy + 1) • (Ix (Hx – x)+ Iy (Hy – y) + It) =0 * (gx + hy + 1) • Ix ((ax+by+c) – x(hx+gy+1))+ Iy ((dx+ey+f) – y(hx+gy+1)) + It(hx+gy+1)=0` • Pretty Cool. Its linear! So you can write it as one big matrix and solve for a,b,c,d,…. g. • How many equations do you get per pixel? Computer Vision, Robert Pless
Larger Scale Motion Estimation • Let’s do the motion model… Computer Vision, Robert Pless
Camera Motion and Image Derivatives Relationship between image motion and intensity derivatives: Ix u + Iy v + It = 0 Can assume that the whole image motion is a homography, so that a point (x,y) goes to a point (x’,y’) following the equation: u=(x’-x) v=(y’-y) Ix ((ax+by+c) – x(gx+hy+1))+ Iy ((dx+ey+f) – y(gx+hy+1)) + It(gx+hy+1)=0 Computer Vision, Robert Pless
Camera Motion and Image Derivatives Relationship between image motion and intensity derivatives: Ix u + Iy v + It = 0 If motion is due to a camera motion in a static environment, then the optic flow is related to the camera motion and the depth of points in the scene: Computer Vision, Robert Pless
Rearrange terms: And remember that everything that isn’t colored is something that you measure… So, at how many pixels do we need to measure the derivatives in order to estimate U,V,W, a, b, g? How many constraints do we get per pixel? Do we introduce any new unknowns? Computer Vision, Robert Pless
Before we get all carried away; are there any simple tricks? We are working to find U,V,W,a,b,g, and all the Z’s. Is there a cheap trick (ie. Special case?) to get some of these? What happens when Z = infinity? Then you get an equation that is linear in a,b,g. Where might you find locations in an image where Z is very very large? What happens when U,V are 0? Then near the center of the image, (where x,y are small), you get constraints that are linear in a,b,g. Computer Vision, Robert Pless
OK, let’s get all carried away. What assumptions can you make about Z? Problem: 6 global unknowns about camera motion, one new unknown (Z) for each pixel. • Solutions: • Locally, pixels are often of similar depths. • “Depth smoothness constraint” • For some (many?) choices of U,V,W,a,b,g, when solving for Z, you get many negative values (which are not physically realizable). • “Depth positivity constraint” No linear way to optimize either constraint. Depth positivity can be set up as a large “linear program”, But if there is noise in estimation of Ix, Iy, It, then depth positivity doesn’t hold. Computer Vision, Robert Pless
OK, let’s get all carried away. What assumptions can you make about Z? Assumption 1. Locally constant depth. (assume depth is constant). Assume, for a small block of pixels, that the depth is constant. That is, instead of a new unknown (1/Z) at each pixel, use the same unknown for each ? x ? sized block. (5x5? 10x10?). Assumption 2. Locally linear depth. (assume the scene is a plane). 1 = aX + bY + cZ (equation of a 3D plane) 1 = axZ + byZ + cZ (converting X,Y in terms of image coordinates 1/Z = ax + by + c (divide by Z) 1/Z = (x,y,1) . (a,b,c) (express as a dot product). How does this get incorporated into the equation, and what are the unknowns? Computer Vision, Robert Pless
Error Magnitude Low High Both must be solved with “non-linear” optimization. (!) Let’s first consider locally constant depth (each 10 x 10 block has same 1/Z) Easiest “non-linear” optimization is brute force! (but some brute force is better than others…) Guess all possible translations. For each translation, solve (using linear least squares) for best possible rotation and depth. Residual (error) in this solution is displayed as a color. Smallest error may be the best motion. Computer Vision, Robert Pless
Error Magnitude Low High Ambiguous Error Surface • Sphere represents the set of all possible translations • The colors code for the residual error • Note the lowest errors (red) are not well-localized • Result: All methods of finding the solution for camera motion, amount to minimizing an error function. For a conventional camera with restricted field of view, this function has a bad topography, that is, the minimum lies along valleys (instead of the bottom of a well). This is an illustration of the “translation/rotation ambiguity”. Computer Vision, Robert Pless
Another Approach, Alternating minimization: (1) guess some rotation and depth. (2) solve for best fitting translation (given guesses). (3) re-solve for best rotation and depth (using solution for translation) (4) re-solve for best fitting translation (using solution for rotation) (5) re-solve for best rotation and depth (using solution for translation) . Until solution doesn’t change anymore. Can also do the same thing assuming locally planar (instead of constant) patches. Computer Vision, Robert Pless
But, but, maybe the objects don’t fit along the patch boundaries? We’re going to do crazier optimization, so let us first simplify the writing of the equation: r,f,g are all things that we can measure at pixel i. Computer Vision, Robert Pless
A quick matching game… Rotational velocity of the camera. Image measurements at pixel i related to translational velocity of the camera. Image measurements at pixel i related to rotational velocity of the camera. Translational velocity of camera. Homogenous image coordinate of pixel i. depth plane parameters. Intensity derivative at pixel i. • Let’s assume the depth is “locally planar”. • And, let’s try to do better than assuming “small chunks” of the image fit a plane. What could be better? • Perhaps we can discover what chunks of the scene are co-planar? Before we discover, we need to have a way to represent which parts of the scene are co-planar. This can be done with a labeling function: Computer Vision, Robert Pless
We are going to assume the scene fits “a couple” of depth planes. • Let A(ri), assign pixel i to one scene regions • Now, how can we solve for both the depth/motion parameters and the scene segmentation? • More alternating minimization… • Guess t,w,A (assignment). • Iterate: • Find best fitting set of planes qj for assignment. • Reassign Points: • Solve for t,w Computer Vision, Robert Pless
When it works, it is very very nice Computer Vision, Robert Pless
Segmentation, dirty tricks On some objects, segmentation algorithm does not converge, or converges to a solution that does not correspond to depth discontinuities. The boundaries (of the badly segmented regions) still often indicate likely region boundaries. Allowing only boundary pixels to change regions is a trick sometimes used to give better segmentation. Computer Vision, Robert Pless
Full solution: Solve for the best motion in each frame. “Chaining” them together: requires solution for translation magnitude, why? Computer Vision, Robert Pless
Ambiguities… Ambiguity is minimized if camera is moving towards a point in the field of view, when there are sharp depth discontinuities in the scene, and when the camera field of view is large. Computer Vision, Robert Pless
Other options (besides collection of planes). Could have connected triangular patches (a mesh). Vertices are mesh control points Pixel in a triangle has (1/Z) value which is weighted average (weights are barycentric coordinates) of triangle corners. image Computer Vision, Robert Pless
z4 z2 Vertices are mesh control points Pixel in a triangle has (1/Z) value which is weighted average (weights are barycentric coordinates) of triangle corners. z3 z1 image For point above (1/Z) = 0.3z1 + 0.3z2+0.4z3 For (new) point above (1/Z) = 0.6z1 + 0.2z2+0.2z4 Before: Still linear to solve for all the mesh control point depths simultaneously: Bad notation… these are the barycentric (weighting coordinates of pixel i). Computer Vision, Robert Pless
Recap Solved the complete “structure from motion” problem, using only image derivatives (never have to compute optic flow). Problem in non-linear in the depth and motion parameters (strictly speaking it is “multi-linear”), so we consider “alternating minimization” approaches. We also consider representations of depth • constant/planar patches, • learning arbitrary shaped patches, and • meshes. This is the end of so much geometry, the next couple of classes will be more video analysis and less geometry. Computer Vision, Robert Pless