1.14k likes | 1.36k Views
4054 Machine Vision Two or more cameras. Dr. Simon Prince Dept. Computer Science University College London. http:// www.cs.ucl.ac.uk/s.prince/4054.htm. Two or More Cameras. Introduction to 3D vision Stereo vision Geometry of two cameras Finding image keypoints
E N D
4054 Machine VisionTwo or more cameras Dr. Simon Prince Dept. Computer Science University College London http://www.cs.ucl.ac.uk/s.prince/4054.htm
Two or More Cameras • Introduction to 3D vision • Stereo vision • Geometry of two cameras • Finding image keypoints • Finding correspondences between keypoints • Sparse stereo reconstruction • Dense stereo reconstruction • Shape from silhouette
Two or More Cameras 1. Introduction to 3D vision
Seeing the world Perspective projection converts from 3-D to 2-D
3D shape from 2D images • Single image cues • Perspective
3D shape from 2D images • Single image cues • Perspective • Contour
3D shape from 2D images • Single image cues • Perspective • Contour • Texture
3D shape from 2D images • Single image cues • Perspective • Contour • Texture • Arial perspective
3D shape from 2D images • Single image cues • Perspective • Contour • Texture • Areal perspective • Shading
3D shape from 2D images • Multiple image cues • space (stereo)
3D shape from 2D images • Multiple image cues • space (stereo) • time (motion)
3D shape from 2D images • Multiple image cues • space (stereo) • time (motion) • focus (depth from focus)
3D shape from 2D images • Multiple image cues • space (stereo) • time (motion) • focus (depth from focus) • silhouette
Applications of 3D Models • Building models for computer graphics • Helping segmentation (breaking camouflage) • Cartography (satellite imagery). • Preserving ancient/cultural monuments • Rendering new views of objects • Measuring 3D face shape for biometrics Wrong Gaze Right Gaze Wrong Gaze
Two or More Cameras 2. Stereo Vision
Stereo Vision • Stereo vision refers to the ability to infer information on the 3D structure of a scene from two or more images taken from different viewpoints. • This can be achieved by observing the difference in position of image points corresponding to the same scene point.
Binocular Stereo Common in biological vision… 3D scene binocular cameras 1 2 PC reconstruction (brighter closer) ...can we duplicate this ability with machine vision?
O’ Principle of stereo vision is TRIANGULATION. O
O’ Stereo Vision Problems O CALIBRATION: Establish the geometric relationship between the cameras
O’ Stereo Vision Problems O CORRESPONDENCE: Find pairs of matching points in the two images
O’ Stereo Vision Problems O RECONSTRUCTION: Calculate the three-dimensional position of point in scene.
Two or More Cameras 2.1 Geometry of Two Cameras
Correspondence THE BAD NEWS: for a given point in the first image (centre of square) we aim to find the same point in the second image. There are numerous regions which look possible. Two-dimensional search of the image is very expensive.
? ? ? ? Epipolar Geometry Epipolar Line O O THE GOOD NEWS: we do not need to perform a 2D search for correspondence, if we know the geometry of the stereo cameras.
Epipoles Epipole Epipole O O The epipole is the image of the optical centre of the other camera. It is guaranteed to be on every epipolar line.
In coordinate system of second camera The Essential Matrix O O These three vectors are coplanar. Condition for coplanarity of a,b,c is a . (b x c) = 0
The cross product can be expressed in matrix form as : Giving: Which can also be expressed as : Where E is known as the essential matrix. The Essential Matrix Substituting in:
Essential Matrix Relation: Define: i.e. l is the epipolar line in the 1st camera Substituting in: The epipole are on every epipolar line. Hence, for the epipole in the first camera, the essential matrix relation will be satisfied for every x’. This implies that x lies in the nullspace of E. Properties of the Essential Matrix • 3x3 Matrix • Relates cameracoordsin 1st and 2nd images • 6 Degrees of freedom (3 rot, 3 trans) • Rank 2
The essential matrix relationship was: Substituting in: Or: Where F is known as the fundamental matrix The Fundamental Matrix Up until now, we have been working in camera co-ordinates. However, if we do not know the intrinsic matrices of the cameras, we have only image co-ordinate to play with. The relation between them is:
Expanding: This can be written as a dot product between the vectorized entries of F and the co-ordinate positions: Given a series of n matching points, we can form the linear system: Computing the Fundamental Matrix The fundamental matrix relation written out in full:
Some complications • Must force singularity constraint • Scaling makes this numerically unpleasant • Algebraic minimization does not minimize the correct cost function • -Should minimize image distance to epipolar line. 100 10000 1
Degenerate Cases • Degenerate cases • Planar scene • Pure rotation • No unique solution • Remaining DOF filled by noise • Use simpler model (e.g. homography) • Model selection (Torr et al., ICCV´98, Kanatani, Akaike) • Compare H and F according to expected residual error (compensate for model complexity)
A chicken and egg problem... • Given a set of n>=8 matching points, we can calculate the fundamental matrix and hence determine the epipolar geometry • But matching points are hard to find without already knowing the epipolar geometry
Correspondence • GOAL: To identify corresponding location of points in the right hand image to points in the left hand image • ASSUMPTIONS: • Most scene points visible from both viewpoints • Matching image points have similar pixel neighbourhoods • DECISIONS: • Which elements to match? • How to measure similarity? • THREE STAGES: • Finding robust image keypoints • Initial matching of these keypoints • Robust matching of keypoints
Two or More Cameras 2.2 Finding image keypoints
Keypoints / Corners / Feature Points Selectinterest points in each image. What makes a point interesting?
Desirable Properties of Keypoints • DISTINCTIVENESS: distinct from other points in the image to minimize chance of false match • EASE OF EXTRACTION: fast and simple to extract • INVARIANCE: tolerant to • image noise • changes in illumination • uniform scaling • rotation • minor changes in viewing direction
Scale Invariant Feature Transform(SIFT) • AIMS: • To find ~1000 keypoints per image • Have desirable properties listed • THREE STAGES: • Identify candidate points and localise in position and scale • Reject unstable points • Associate orientation with each point
Keypoint Criterion: Scale-Space Extrema Repeatedly convolve (blur) image, I(x,y) with a Gaussian y X s where: Produces a stack of images with sharpest in bottom layer and most blurred at top
Scale Space Extrema Must be larger (or smaller) than each of its 26 neighbours in the image stack
Localising Keypoints GOAL: To localize keypoints in space to subpixel accuracy. METHOD: 1. Take Taylor expansion around current point where D(X) is 3D DOG scale function and is offset from estimated position 2. Take derivative and equate to zero giving where or equivalently 3. Solve
Suppressing Edges What we really want is corner points (e.g.Harris&Stephens´88; Shi&Tomasi´94) as edges are ambiguous in one direction. homogeneous edge corner This information captured by singular values of image structure tensor, H. Corner = both eigenvalues large. Edge = one large, one small. Homogenous = both small.