790 likes | 1.03k Views
Computer Vision – Lecture 18. Motion and Optical Flow 24.01.2012. Bastian Leibe RWTH Aachen http ://www.mmp.rwth-aachen.de leibe@umic.rwth-aachen.de. Many slides adapted from K. Grauman, S. Seitz, R. Szeliski, M. Pollefeys, S. Lazebnik. TexPoint fonts used in EMF.
E N D
Computer Vision – Lecture 18 Motion and Optical Flow 24.01.2012 Bastian Leibe RWTH Aachen http://www.mmp.rwth-aachen.de leibe@umic.rwth-aachen.de Many slides adapted from K. Grauman, S. Seitz, R. Szeliski, M. Pollefeys, S. Lazebnik TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAA
Announcements (2) • Exercise sheet 7 out! • Optical flow • Tracking with a Kalman filter • Application: Tracking billiard balls • Exercise will be on Thu, 02.02. • Submit your results until 01.02. night...
Course Outline • Image Processing Basics • Segmentation & Grouping • Object Recognition • Local Features & Matching • Object Categorization • 3D Reconstruction • Motion and Tracking • Motion and Optical Flow • Tracking with Linear Dynamic Models • Repetition
Xj x1j x3j x2j P1 P3 P2 Recap: Structure from Motion • Given: m images of n fixed 3D points xij = Pi Xj , i = 1, … , m, j = 1, … , n • Problem: estimate m projection matrices Piand n 3D points Xjfrom the mn correspondences xij B. Leibe Slide credit: Svetlana Lazebnik
Recap: Structure from Motion Ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices by the factor of 1/k, the projections of the scene points in the image remain exactly the same. • More generally: if we transform the scene using a transformation Q and apply the inverse transformation to the camera matrices, then the images do not change B. Leibe Slide credit: Svetlana Lazebnik
Recap: Hierarchy of 3D Transformations • With no constraints on the camera calibration matrix or on the scene, we get a projective reconstruction. • Need additional information to upgrade the reconstruction to affine, similarity, or Euclidean. Projective 15dof Preserves intersection and tangency Preserves parallellism, volume ratios Affine 12dof Similarity 7dof Preserves angles, ratios of length Euclidean 6dof Preserves angles, lengths B. Leibe Slide credit: Svetlana Lazebnik
Recap: Affine Structure from Motion • Let’s create a 2m×n data (measurement) matrix: • The measurement matrix D = MS must have rank 3! Points (3× n) Cameras(2m ×3) C. Tomasi and T. Kanade. Shape and motion from image streams under orthography: A factorization method.IJCV, 9(2):137-154, November 1992. B. Leibe Slide credit: Svetlana Lazebnik
Recap: Affine Factorization • Obtaining a factorization from SVD: This decomposition minimizes|D-MS|2 Slide credit: Martial Hebert
Projective Structure from Motion • Given: m images of n fixed 3D points • zijxij = Pi Xj , i = 1,… , m, j = 1, … , n • Problem: estimate m projection matrices Pi and n 3D points Xj from the mn correspondences xij • With no calibration info, cameras and points can only be recovered up to a 4x4 projective transformation Q: X → QX, P → PQ-1 • We can solve for structure and motion when 2mn >= 11m +3n – 15 • For two cameras, at least 7 points are needed. B. Leibe Slide credit: Svetlana Lazebnik
Projective Factorization • If we knew the depths z, we could factorize D to estimate M and S. • If we knew M and S, we could solve for z. • Solution: iterative approach (alternate between above two steps). Points (4× n) Cameras(3m ×4) D = MS has rank 4 B. Leibe Slide credit: Svetlana Lazebnik
Sequential Structure from Motion • Initialize motion from two images using fundamental matrix • Initialize structure • For each additional view: • Determine projection matrixof new camera using all the known 3D points that are visible in its image – calibration Points Cameras B. Leibe Slide credit: Svetlana Lazebnik
Sequential Structure from Motion • Initialize motion from two images using fundamental matrix • Initialize structure • For each additional view: • Determine projection matrixof new camera using all the known 3D points that are visible in its image – calibration • Refine and extend structure:compute new 3D points, re-optimize existing points that are also seen by this camera – triangulation Points Cameras B. Leibe Slide credit: Svetlana Lazebnik
Sequential Structure from Motion • Initialize motion from two images using fundamental matrix • Initialize structure • For each additional view: • Determine projection matrixof new camera using all the known 3D points that are visible in its image – calibration • Refine and extend structure:compute new 3D points, re-optimize existing points that are also seen by this camera – triangulation • Refine structure and motion: bundle adjustment Points Cameras B. Leibe Slide credit: Svetlana Lazebnik
Bundle Adjustment • Non-linear method for refining structure and motion • Minimizing mean-square reprojection error Xj P1Xj x3j x1j P3Xj P2Xj x2j P1 P3 B. Leibe P2 Slide credit: Svetlana Lazebnik
Projective Ambiguity • If we don’t know anything about the camera or the scene, the best we can get with this is a reconstruction up to a projective ambiguity Q. • This can already be useful. • E.g. we can answer questions like “at what point does a line intersect a plane”? • If we want to convert this to a “true” reconstruction, we need a Euclidean upgrade. • Need to put in additional knowledgeabout the camera (calibration) orabout the scene (e.g. from markers). • Several methods available (see F&P Chapter 13.5 or H&Z Chapter 19) B. Leibe Images from Hartley & Zisserman
Practical Considerations (1) • Role of the baseline • Small baseline: large depth error • Large baseline: difficult search problem • Solution • Track features between frames until baseline is sufficient. Small Baseline Large Baseline B. Leibe Slide adapted from Steve Seitz
Practical Considerations (2) • There will still be many outliers • Incorrect feature matches • Moving objects Apply RANSAC to get robust estimates based on the inlier points. • Estimation quality depends on the point configuration • Points that are close together in the image produce less stablesolutions. Subdivide image into a grid and tryto extract about the same number offeatures per grid cell. B. Leibe
Some Commercial Software Packages • boujou (http://www.2d3.com/) • PFTrack (http://www.thepixelfarm.co.uk/) • MatchMover (http://www.realviz.com/) • SynthEyes (http://www.ssontech.com/) • Icarus (http://aig.cs.man.ac.uk/research/reveal/icarus/) • Voodoo Camera Tracker (http://www.digilab.uni-hannover.de/) B. Leibe
Applications: Matchmoving • Putting virtual objects into real-world videos • Original sequence • Tracked features • SfM results • Final video B. Leibe Videos from Stefan Hafeneger
Applications: Large-Scale SfM from Flickr S. Agarwal, N. Snavely, I. Simon, S.M. Seitz, R. Szeliski, Building Rome in a Day, ICCV’09, 2009. (Video from http://grail.cs.washington.edu/rome/) B. Leibe
Topics of This Lecture • Introduction to Motion • Applications, uses • Motion Field • Derivation • Optical Flow • Brightness constancy constraint • Aperture problem • Lucas-Kanade flow • Iterative refinement • Global parametric motion • Coarse-to-fine estimation • Motion segmentation • KLT Feature Tracking B. Leibe
Video • A video is a sequence of frames captured over time • Now our image data is a function of space (x, y) and time (t) B. Leibe Slide credit: Svetlana Lazebnik
Applications of Segmentation to Video • Background subtraction • A static camera is observing a scene. • Goal: separate the static background from the moving foreground. How to come up with background frame estimate without access to “empty” scene? B. Leibe Slide credit: Svetlana Lazebnik, Kristen Grauman
Applications of Segmentation to Video • Background subtraction • Shot boundary detection • Commercial video is usually composed of shots or sequences showing the same objects or scene. • Goal: segment video into shots for summarization and browsing (each shot can be represented by a single keyframe in a user interface). • Difference from background subtraction: the camera is not necessarily stationary. B. Leibe Slide credit: Svetlana Lazebnik
Applications of Segmentation to Video • Background subtraction • Shot boundary detection • For each frame, compute the distance between the current frame and the previous one: • Pixel-by-pixel differences • Differences of color histograms • Block comparison • If the distance is greater than some threshold, classify the frame as a shot boundary. B. Leibe Slide credit: Svetlana Lazebnik
Applications of Segmentation to Video • Background subtraction • Shot boundary detection • Motion segmentation • Segment the video into multiple coherently moving objects B. Leibe Slide credit: Svetlana Lazebnik
Motion and Perceptual Organization • Sometimes, motion is the only cue… B. Leibe Slide credit: Svetlana Lazebnik
Motion and Perceptual Organization • Sometimes, motion is foremost cue B. Leibe Slide credit: Kristen Grauman
Motion and Perceptual Organization • Even “impoverished” motion data can evoke a strong percept B. Leibe Slide credit: Svetlana Lazebnik
Motion and Perceptual Organization • Even “impoverished” motion data can evoke a strong percept B. Leibe Slide credit: Svetlana Lazebnik
Uses of Motion • Estimating 3D structure • Directly from optic flow • Indirectly to create correspondences for SfM • Segmenting objects based on motion cues • Learning dynamical models • Recognizing events and activities • Improving video quality (motion stabilization) B. Leibe Slide adapted from Svetlana Lazebnik
Motion Estimation Techniques • Direct methods • Directly recover image motion at each pixel from spatio-temporal image brightness variations • Dense motion fields, but sensitive to appearance variations • Suitable for video and when image motion is small • Feature-based methods • Extract visual features (corners, textured areas) and track them over multiple frames • Sparse motion fields, but more robust tracking • Suitable when image motion is large (10s of pixels) B. Leibe Slide credit: Steve Seitz
Topics of This Lecture • Introduction to Motion • Applications, uses • Motion Field • Derivation • Optical Flow • Brightness constancy constraint • Aperture problem • Lucas-Kanade flow • Iterative refinement • Global parametric motion • Coarse-to-fine estimation • Motion segmentation • KLT Feature Tracking B. Leibe
Motion Field • The motion field is the projection of the 3D scene motion into the image B. Leibe Slide credit: Svetlana Lazebnik
Motion Field and Parallax P(t+dt) • P(t) is a moving 3D point • Velocity of scene point: V = dP/dt • p(t) = (x(t),y(t))is the projection of P in the image. • Apparent velocity v in the image: given by components vx = dx/dtand vy = dy/dt • These components are known as the motion field of the image. V P(t) v p(t+dt) p(t) B. Leibe Slide credit: Svetlana Lazebnik
Quotient rule: D(f/g) = (g f’ – g’ f)/g^2 Motion Field and Parallax P(t+dt) To find image velocity v, differentiate p with respect to t (using quotient rule): • Image motion is a function of both the 3D motion (V) and the depth of the 3D point (Z). V P(t) v p(t+dt) p(t) B. Leibe Slide credit: Svetlana Lazebnik
Motion Field and Parallax • Pure translation: V is constant everywhere B. Leibe Slide credit: Svetlana Lazebnik
Motion Field and Parallax • Pure translation: V is constant everywhere • Vzis nonzero: • Every motion vector points toward (or away from) v0, the vanishing point of the translation direction. B. Leibe Slide credit: Svetlana Lazebnik
Motion Field and Parallax • Pure translation: V is constant everywhere • Vzis nonzero: • Every motion vector points toward (or away from) v0, the vanishing point of the translation direction. • Vz is zero: • Motion is parallel to the image plane, all the motion vectors are parallel. • The length of the motion vectors is inversely proportional to the depth Z. B. Leibe Slide credit: Svetlana Lazebnik
Topics of This Lecture • Introduction to Motion • Applications, uses • Motion Field • Derivation • Optical Flow • Brightness constancy constraint • Aperture problem • Lucas-Kanade flow • Iterative refinement • Global parametric motion • Coarse-to-fine estimation • Motion segmentation • KLT Feature Tracking B. Leibe
Optical Flow • Definition: optical flow is the apparent motion of brightness patterns in the image. • Ideally, optical flow would be the same as the motion field. • Have to be careful: apparent motion can be caused by lighting changes without any actual motion. • Think of a uniform rotating sphere under fixed lighting vs. a stationary sphere under moving illumination. B. Leibe Slide credit: Svetlana Lazebnik
Apparent Motion Motion Field B. Leibe Figure from Horn book Slide credit: Kristen Grauman
Estimating Optical Flow • Given two subsequent frames, estimate the apparent motion field u(x,y) and v(x,y) between them. • Key assumptions • Brightness constancy: projection of the same point looks the same in every frame. • Small motion: points do not move very far. • Spatial coherence: points move like their neighbors. I(x,y,t–1) I(x,y,t) B. Leibe Slide credit: Svetlana Lazebnik
The Brightness Constancy Constraint • Brightness Constancy Equation: • Linearizing the right hand side using Taylor expansion: • Hence, I(x,y,t–1) I(x,y,t) Spatial derivatives Temporal derivative B. Leibe Slide credit: Svetlana Lazebnik
The Brightness Constancy Constraint • How many equations and unknowns per pixel? • One equation, two unknowns • Intuitively, what does this constraint mean? • The component of the flow perpendicular to the gradient (i.e., parallel to the edge) is unknown gradient (u,v) • If (u,v) satisfies the equation, so does (u+u’, v+v’) if (u’,v’) (u+u’,v+v’) edge B. Leibe Slide credit: Svetlana Lazebnik
The Aperture Problem Perceived motion B. Leibe Slide credit: Svetlana Lazebnik
The Aperture Problem Actual motion B. Leibe Slide credit: Svetlana Lazebnik
The Barber Pole Illusion http://en.wikipedia.org/wiki/Barberpole_illusion B. Leibe Slide credit: Svetlana Lazebnik
The Barber Pole Illusion http://en.wikipedia.org/wiki/Barberpole_illusion B. Leibe Slide credit: Svetlana Lazebnik
The Barber Pole Illusion http://en.wikipedia.org/wiki/Barberpole_illusion B. Leibe Slide credit: Svetlana Lazebnik