500 likes | 607 Views
ROBOT VISION Lesson 6: Shape from Stereo Matthias Rüther Slides partial courtesy of Marc Pollefeys Department of Computer Science University of North Carolina, Chapel Hill. Content. Two View Geometry Epipolar Geometry 3D reconstruction Computing F Point Correspondences Interest Points
E N D
ROBOT VISIONLesson 6: Shape from StereoMatthias RütherSlides partial courtesy of Marc Pollefeys Department of Computer ScienceUniversity of North Carolina, Chapel Hill
Content • Two View Geometry • Epipolar Geometry • 3D reconstruction • Computing F • Point Correspondences • Interest Points • Matching
Epipolar Geometry C, C’, x, x’ and X are coplanar
Three Questions • Correspondence geometry: Given an image point x in the first view, how does this constrain the position of the corresponding point x’ in the second image? • (ii) Camera geometry (motion): Given a set of corresponding image points {xi ↔x’i}, i=1,…,n, what are the cameras P and P’ for the two views? • (iii) Scene geometry (structure): Given corresponding image points xi ↔x’i and cameras P, P’, what is the position of (their pre-image) X in space?
Epipolar Geometry What if only C, C’, x are known?
Epipolar Geometry All points on p project on l and l’
Epipolar Geometry Family of planes p and lines l and l’ Intersection in e and e’
Epipolar Geometry epipoles e,e’ = intersection of baseline with image plane = projection of projection center in other image = vanishing point of camera motion direction an epipolar plane = plane containing baseline (1-D family) an epipolar line = intersection of epipolar plane with image (always come in corresponding pairs)
The Fundamental Matrix (F) algebraic representation of epipolar geometry we will see that mapping is (singular) correlation (i.e. projective mapping from points to lines) represented by the fundamental matrix F
The Fundamental Matrix (F) geometric derivation mapping from 2-D to 1-D family (rank 2)
The Fundamental Matrix (F) algebraic derivation (note: doesn’t work for C=C’ F=0)
The Fundamental Matrix (F) correspondence condition The fundamental matrix satisfies the condition that for any pair of corresponding points x↔x’ in the two images
The Fundamental Matrix (F) F is the unique 3x3 rank 2 matrix that satisfies x’TFx=0 for all x↔x’ • Transpose: if F is fundamental matrix for (P,P’), then FT is fundamental matrix for (P’,P) • Epipolar lines: l’=Fx & l=FTx’ • Epipoles: on all epipolar lines, thus e’TFx=0, x e’TF=0, similarly Fe=0 • F has 7 d.o.f. , i.e. 3x3-1(homogeneous)-1(rank2) • F is a correlation, projective mapping from a point x to a line l’=Fx (not a proper correlation, i.e. not invertible)
Epipolar Line Geometry l,l’ epipolar lines, k line not through e l’=F[k]xl and symmetrically l=FT[k’]xl’ (pick k=e, since eTe≠0)
Invariance under projective transformation Derivation based purely on projective concepts F invariant to transformations of projective 3-space unique not unique canonical form
Possible choice: Canonical representation: Canonical cameras given F F matrix corresponds to P,P’ iff P’TFP is skew-symmetric
The Essential Matrix ~fundamental matrix for calibrated cameras (remove K) 5 d.o.f. (3 for R; 2 for t up to scale) E is essential matrix if and only if two singular values are equal (and third=0)
P P L2 L2 m1 m1 m1 C1 M C1 C1 M L1 L1 l1 l1 e1 e1 lT1 l2 e2 e2 Canonical representation: m2 l2 m2 m2 l2 l2 Fundamental matrix (3x3 rank 2 matrix) C2 C2 C2 Epipolar Geometry Underlying structure in set of matches for rigid scenes • Computable from corresponding points • Simplifies matching • Allows to detect wrong matches • Related to calibration
3D reconstruction of cameras and structure reconstruction problem: given xi↔x‘i , compute P,P‘ and Xi for all i without additional information possible up to projective ambiguity
Outline of reconstruction • Compute F from correspondences • Compute camera matrices from F • Compute 3D point for each pair of corresponding points computation of F use x‘iFxi=0 equations, linear in coeff. F 8 points (linear), 7 points (non-linear), 8+ (least-squares) computation of camera matrices use triangulation compute intersection of two backprojected rays
Terminology xi↔x‘i Original scene Xi Projective, affine, similarity reconstruction = reconstruction that is identical to original up to projective, affine, similarity transformation Literature: Metric and Euclidean reconstruction = similarity reconstruction
The projective reconstruction theorem If a set of point correspondences in two views determine thefundamental matrix uniquely, then the scene and cameras may be reconstructed from these correspondences alone, and any two such reconstructions from these correspondences are projectively equivalent • along same ray ofP2, idem for P‘2 two possibilities: X2i=HX1i, or points along baseline key result: allows reconstruction from pair of uncalibrated images
Stratified reconstruction • Projective reconstruction • Affine reconstruction • Metric reconstruction
(2 lin. eq. in H-1per view, 3 for two views) Direct metric reconstruction using ground truth use control points XEi with known coordinates to go from projective to metric
p p L2 L2 m1 m1 m1 C1 C1 C1 M M L1 L1 l1 l1 e1 e1 lT1 l2 e2 e2 Canonical representation: l2 m2 m2 m2 l2 l2 Fundamental matrix (3x3 rank 2 matrix) C2 C2 C2 Epipolar Geometry: computation of F Underlying structure in set of matches for rigid scenes • Computable from corresponding points • Simplifies matching • Allows to detect wrong matches • Related to calibration
Computation of F: basic equation separate known from unknown (data) (unknowns) (linear)
Imposing the singularity constraint SVD from linearly computed F matrix (rank 3) Compute closest rank-2 approximation
~10000 ~100 ~10000 ~100 ~10000 ~10000 ~100 ~100 1 Orders of magnitude difference Between column of data matrix least-squares yields poor results ! The NOT normalized 8-point algorithm
(0,500) (700,500) (-1,1) (1,1) (0,0) (0,0) (700,0) (-1,-1) (1,-1) The normalized 8-point algorithm Transform image to ~[-1,1]x[-1,1] Least squares yields good results(Hartley, PAMI´97)
The Gold Standard Algorithm Maximum Likelihood Estimation (= least-squares for Gaussian noise) Initialize: normalized 8-point, (P,P‘) from F, reconstruct Xi Parameterize: (overparametrized) Minimize cost using Levenberg-Marquardt (preferably sparse LM, see book)
Recommendations • Do not use unnormalized algorithms • Quick and easy to implement: 8-point normalized • Better: enforce rank-2 constraint during minimization • Best: Maximum Likelihood Estimation (minimal parameterization, sparse implementation)
The correspondence problem: feature points • Extract feature points to relate images • Required properties: • Well-defined (i.e. neigboring points should all be different) • Stable across views (i.e. same 3D point should be extracted as feature for neighboring viewpoints)
Feature points (e.g.Harris&Stephens´88; Shi&Tomasi´94) Find points that differ as much as possible from all neighboring points homogeneous edge corner Mshould have large eigenvalues Feature = local maxima (subpixel) of F(1,2)
Feature points Select strongest features (e.g. 1000/image)
? Feature matching Evaluate NCC for all features with similar coordinates Keep mutual best matches Still many wrong matches!
3 3 2 2 4 4 1 5 1 5 Similarity Example Gives satisfying results for small image motions
(generate hypothesis) (verify hypothesis) RANSAC Step 1. Extract features Step 2. Compute a set of potential matches Step 3. do Step 3.1 select minimal sample (i.e. 8 matches) Step 3.2 compute F Step 3.3 determine inliers until (#inliers,#samples)<95% Step 4. Compute F based on all inliers Step 5. Look for additional matches Step 6. Refine F based on all correct matches
Finding more matches restrict search range to neighborhood of epipolar line (1.5 pixels) relax disparity restriction (along epipolar line)
Degenerate Cases • Degenerate cases • Planar scene • Pure rotation • No unique solution • Remaining DOF filled by noise • Use simpler model (e.g. homography) • Model selection (Torr et al., ICCV´98, Kanatani, Akaike) • Compare H and F according to expected residual error (compensate for model complexity)
More problems • Absence of sufficient features (no texture) • Repeated structure ambiguity • Robust matcher also finds • support for wrong hypothesis • solution: detect repetition (Schaffalitzky and Zisserman, BMVC‘98)