390 likes | 402 Views
Formation et Analyse d’Images Session 11. Daniela Hall 12 December 2005. Course Overview. Session 1 (19/09/05) Overview Human vision Homogenous coordinates Camera models Session 2 (26/09/05) Tensor notation Image transformations Homography computation Session 3 (3/10/05)
E N D
Formation et Analyse d’ImagesSession 11 Daniela Hall 12 December 2005
Course Overview • Session 1 (19/09/05) • Overview • Human vision • Homogenous coordinates • Camera models • Session 2 (26/09/05) • Tensor notation • Image transformations • Homography computation • Session 3 (3/10/05) • Camera calibration • Reflection models • Color spaces • Session 4 (10/10/05) • Pixel based image analysis • 17/10/05 course is replaced by Modelisation surfacique
Course overview • Session 5 + 6 (24/10/05) 9:45 – 12:45 • Contrast description • Hough transform • Session 7 (7/11/05) • Kalman filter • Session 8 (14/11/05) • Tracking of regions, pixels, and lines • Session 9 (21/11/05) • Gaussian filter operators • Session 10 (5/12/05) • Scale Space • Session 11 (12/12/05) • Stereo vision • Epipolar geometry • Session 12 (16/01/06): exercises and questions
Session overview • Stereo vision • Epipolar geometry • 3d point position from two views using epipolar geometry • 3d point position from two views when camera models are known.
Human stereo vision • Two Eyes = Three Dimensions (3D)!Each eye captures its own view and the two separate images are sent on to the brain for processing. • When the two images arrive simultaneously in the back of the brain, they are united into one picture. The mind combines the two images by matching up the similarities and adding in the small differences. • The small differences between the two images add up to a big difference in the final picture! The combined image is more than the sum of its parts. It is a three-dimensional stereo picture. • The word "stereo" comes from the Greek word "stereos" which means firm or solid. With stereo vision you see an object as solid in three spatial dimensions--width, height and depth--or x, y and z.
Computer stereo vision • Stereo vision allows to estimates the 3D position of scene point X from its positions x, x’ in 2 images taken from different camera positions P, P’. • The two views can be acquired simultaneously with two cameras or sequentially with one camera in motion. • Each view has an associated camera matrix P,P’. • The 3d point X is imaged as x=PX in the first view and x’=P’X in the second view. • x and x’ correspond because they are the image of the same point in 3d. Source: Hartley, Zisserman: Multiple view geometry in computer vision, Cambridge, 2000. http://www.robots.ox.ac.uk/~vgg/hzbook/
Topics in stereo vision • Correspondence geometry (epipolar geometry): • given an image point x in the first view, how does it constrain the corresponding point x’ in the second view? • Camera geometry (motion): • Given a set of corresponding points {xi, x’i}, what are the cameras P, P’ of the two views? • Scene geometry: • Given corresponding image points {xi,x’i} and cameras P, P’, what is the position of X in 3d?
Session overview • Stereo vision • Epipolar geometry • 3d point position from two views using epipolar geometry • 3d point position from two views when camera models are known.
Epipolar geometry • A point in one view defines an epipolar line in the other view on which the corresponding point lies. • The epipolar geometry depends only on the cameras. Their relative position and their internal parameters. • The epipolar geometry is represented by a 3x3 matrix called the fundamental matrix F.
Epipolar geometry thanks to Andrew Zisserman and Richard Hartley for all figures.
Notations • X 3d point • C, C’ 3d position of camera • I, I’ image planes. • x, x’ 2d position of 3d point X in image I, I’ of camera C, C’. • pi epipolar plane. C, x, e, e’, C’,X all lie on pi. • e, e’ epipoles (2d position of the camera center C in image I’). C, e, e’, C’ lie on the baseline. • l, l’ epipolar line. l’ is the intersection of the epipolar plane pi spanned by the baseline CC’ and the ray of Cx. The corresponding point x’ must lie on l’.
Epipolar geometry • For any two fixed cameras we have one baseline. • For any 3d point X we have a different epipolar plane pi. • All epipolar planes intersect at the baseline.
Epipolar line • Suppose we know only x and the baseline. • How is the corresponding point x’ in the other image constrained? • pi is defined by the baseline and the ray Cx. • The epipolar line l’ is the image of this ray in the other image. x’ must lie on l’. • The benefit of the epipolar line is that the correspondence search can be restricted to l’ instead of searching the entire image.
Epipolar terminology • Epipole: • intersection of the line joining the camera centers (baseline) and the image plane. • the image of the other camera center in the image plane. • intersection of the epipolar lines. • Epipolar plane: • a plane containing the baseline. There is a one-parameter family of epipolar planes for a fixed camera pair. • Epipolar line: • intersection of the epipolar plane with the image plane. • all epipolar lines intersect in the epipole. • an epipolar plane intersects both image planes and defines correspondences between the lines.
The fundamental matrix • The fundamental matrix is the algebraic representation of the epipolar geometry. • Derivation of F: • map point x to some point x’ in the other image • l’ is obtained as the line joing x’ and the epipole e’ • F can be computed from these elements Relation of x and epipolar line Equation for F e epipole [e]x skew-symetric matrix Relation scalar product and skew symetric matrix
Correspondence condition • The fundamental matrix satisfies the condition that for any pair of corresponding points x, x’ in the two images • This is true, because if x and x’ correspond, then x’ lies on the epipolar line l’. And since we know l’=Fx we can write: • The importance of this relations is that we can compute the fundamental matrix only from point correspondences. We need at least 7 point correspondences (details chap 10, Hartley, Zisserman book).
Computing the fundamental matrix • Given sufficiently many point matches xi, xi’ the equation x’TFx=0 can be used to compute F. • Writing x=(x,y,1)T and x’=(x’,y’,1)T each point match gives rise to an equation of the unknowns of F. • writing the 9 unknowns of F as a vector f, we get the lower two equations. • Using the last equation SVD provides a direct solution for F.
The fundamental matrix • Allows to compute the epipolar line l’ in I’ for a point x in I. x’ lies on l’. • Allows to compute the l in I for x’ in I’. We can verify the point correspondence x, x’, because x must lie on l. • In the course 3d vision, you will see that the F is used to compute the camera projection model P for each camera. With the camera model you can estimate the 3d position of a point without calibrating the cameras. (self-calibration).
Session overview • Stereo vision • Epipolar geometry • 3d point position from two views using epipolar geometry • 3d point position from two views when camera models are known.
Computing the 3d position of a point • Compute the fundamental matrix from at least 7 point correspondences. • Determine two camera projection matrices. • Determine a point correspondence x, x’ in the two views. • The 3d position X of the image points x and x’ can be computed directly as intersection of the two rays defined by Cx and C’x’.
Computing the 3d position of a point U L C C’
Camera projection matrices P, P’ • We set the origin of the world to the camera center C of the first camera. Then The projection matrix P of the first camera is • The projection matrix of the second camera has the form • It can be computed by solving F=[e]xP’P+ for P’ • C is the null vector of P:
Defining the rays • The ray backprojected from x by P is obtained • by solving PX=x • by using 2 points on the ray and compute the tensor (see Session 1) • C is on the ray and x. The 3d position of x is P+x • line equation using tensor notation line defined by x and C line defined by x’ and C’ 3d point as intersection of L and U
Intersecting the rays • In real world applications x’TFx =0 may not be true due to imprecise measurements. • This means that the rays L and U may not intersect (they are skew). • In these cases you need to find the point with minimum distance between the L and U. You can solve this by SVD.
Direct computation of 3d point position • Choose a calibration object, whose 3d position is known • Calibrate the cameras (compute the camera model MIS and NIS from at least 5 ½ points) • Then from a correspondence P, Q the 3d position of R can be computed directly.
Session overview • Stereo vision • Epipolar geometry • 3d point position from two views using epipolar geometry • 3d point position from two views when camera models are known.
Direct computation of 3d point position • R is at the intersection of 3 planes
Camera model Equation: Image coordinates
Transformation image-scene • Problem: we need to know depth zs for each image position. Since zs can change, a general form of MSI can not exist. • Any point in I is the image of points in a ray.
Calibration • Construct a calibration object whose 3D position is known. • Measure image coordinates • Determine correspondences between 3D point RSk and image point PIk. • We have 11 DoF. We need at least 5 ½ correspondences.
Calibration • For each correspondence scene point RSk and image point PIk • which gives following equations for k=1, ..., 6 • from wich MIS can be computed
Properties of MIS • The first equation defines a plane that goes through the camera center and the image plane in x direction • The second equation defines a plane that goes through the camera center and the image plane in y direction.
Calibration using many points • For k=5 ½ M has one solution. • Solution depends on precise measurements of 3D and 2D points. • If you use another 5 ½ points you will get a different solution. • A more stable solution is found by using large number of points and do optimisation.
Calibration using many points • For each point correspondence we know (i,j) and R. • We want to know MIS. Solve equation with your favorite algorithm (least squares, levenberg-marquart, svd,...)
Computation of 3d point position • We have the camera model MIS of camera 1 and the camera model NIS of camera 2. • We have a point PI in camera 1 and a point QI in camera 2 which correspond (that means they are the image of the same scene point RS). • The position of RS can be computed by the intersection of 3 planes.
Computation of 3d point position • PI = MISRS, PI=(i,j), QI=NISRS, QI=(u,v) • We have following equations • RS can be found by using 3 of those 4 equations.
Computation of 3d point position The point RS=(x,y,z,1) is computed as follows.