3D Vision

3D Vision Lecture 6 Stereo Vision

Stereo Vision

Main Points • Stereo allows depth by triangulation • Two parts: • Finding corresponding points. • Computing depth (easy part). • Constraints: • Geometry, epipolar constraint. • Photometric: Brightness constancy, only partly true. • Ordering: only partly true. • Smoothness of objects: only partly true.

Main Points (continued) • Algorithms: • What you compare: points, regions, features. • How you optimize. • Local greedy matches. • 1D search. • 2D search.

P Q P’=Q’ O Why Stereo Vision? • 2D images project 3D points into 2D: • 3D Points on the same viewing line have the same 2D image: • 2D imaging results in depth information loss

Stereo • Assumes (two) cameras. • Known positions. • Recover depth.

Recovering Depth Information: P Q P’1 P’2=Q’2 Q’1 O2 O1 Depth can be recovered with two images and triangulation.

So Stereo has two steps • Finding matching points in the images • Then using them to compute depth.

Epipolar Constraint • Most powerful correspondence constraint. • Simplifies discussion of depth recovery.

epipolar line epipolar line epipolar plane Stereo correspondence • Determine Pixel Correspondence • Pairs of points that correspond to same scene point • Epipolar Constraint • Reduces correspondence problem to 1D search along conjugateepipolar lines

Simplest Case • Image planes of cameras are parallel. • Focal points are at same height. • Focal lengths same. • Then, epipolar lines are horizontal scan lines.

We can always achieve this geometry with image rectification • Image Reprojection • reproject image planes onto common plane parallel to line between optical centers • Notice, only focal point of camera really matters

C. Loop and Z. Zhang, “Computing Rectifying Homographies for Stereo Vision,” IEEE Conf. Computer Vision and Pattern Recognition, 1999. Image Reprojection • Rectification (perspective transformation) • Warping the input images so that epipolar lines are horizontal • Procedure • Reproject image planes onto common plane, which is parallel to line between optical centers (to the epipole) • A homography (3x3 transform) applied to both input images • Resample lines (and shear/stretch) to place lines in correspondence, and minimize distortion 14

Rectification 15

Rectification 16

Let’s discuss reconstruction with this geometry before correspondence, because it’s much easier. P Z Disparity: xl xr f pl pr Ol Or T Then given Z, we can compute X and Y. T is the stereo baseline d measures the difference in retinal position between corresponding points

Correspondence: What should we match? • Objects? • Edges? • Pixels? • Collections of pixels?

Correspondence: Epipolar constraint.

Correspondence: Photometric constraint • Same world point has same intensity in both images. • Lambertian fronto-parallel • Issues: • Noise • Specularity • Foreshortening

For each epipolar line For each pixel in the left image Improvement: match windows Using these constraints we can use matching for stereo • compare with every pixel on same epipolar line in right image • pick pixel with minimum match cost • This will never work, so:

? = g f Most popular Comparing Windows: For each window, match to closest window on epipolar line in other image.

Comparing Windows: Left Right scanline SSD error disparity Left Right

W = 3 W = 20 Window size • Effect of window size • Better results with adaptive window • T. Kanade and M. Okutomi,A Stereo Matching Algorithm with an Adaptive Window: Theory and Experiment,, Proc. International Conference on Robotics and Automation, 1991. • D. Scharstein and R. Szeliski. Stereo matching with nonlinear diffusion. International Journal of Computer Vision, 28(2):155-174, July 1998

Dv(p) p Large window • Good for global information • Poor at localizing • Less sensitive to noise • Has wide-bottomed valley • Dv(p)=∑(f(i,j)-g(i,j+p))sup2/sigma(f)

Dv(p) p Small window • Sensitive to noise • Dv(p) at p0 to other distinct points may not be large • May have multiple valleys • Has narrow-bottomed valley

Adaptive strategy for window size 1. let n=n0 (ex. n0=9=initial size) 2. compute Dv(p) for candidate points 3. if Dv(p) has only one valley such that Dv(p) < d1, choose the point and exit 4. if min{Dv(p)} > d2, no corresponding points and exit 5. if n = max (ex. Max=19), no corresponding points and exit

d3 d3 d3 d2 d2 d2 d1 d1 d1 found not found 6. let n=n+2 and consider only candidate points such that Dv(p) < d3 7. goto step2 retry

Stereo results • Data from University of Tsukuba Scene Ground truth

Results with window correlation Window-based matching (best window size) Ground truth

Ordering constraint • Usually, order of points in two images is same.

Ordering constraint • In usual case, order of feature points is same • In case of occlusion, order is not same Ordering constraint… …and its failure

Odering Constraint enables dynamic programming. • Match set of pixels of horizontal scan line to set of pixels of horizontal scan line of other image • Equal to optimal path finding problem

… … Match Match Match Occlusion Disocclusion Stereo Correspondences Left scanline Right scanline

Occluded Pixels Search Over Correspondences Three cases: • Sequential – add cost of match (small if intensities agree) • Occluded – add cost of no match (large cost) • Disoccluded – add cost of no match (large cost) Left scanline Right scanline Disoccluded Pixels

Occluded Pixels Stereo Matching with Dynamic Programming Dynamic programming yields the optimal path through grid. This is the best set of matches that satisfy the ordering constraint Left scanline Start Dis-occluded Pixels Right scanline End

DF with continuous path • Assume that every point in one scan line has one corresponding point in other scan line • Assume monotonic ordering, i.e., if z(l1) matches z(r1), then z(l1+1) may only match z(r1+j), j>0.

DF with discontinuous path • Scene can include discontinuous surface – occlusion can happen • If parts observed in left are not observed in right, corresponding becomes vertical • In the other case, corresponding must be horizontal

DP algorithm • D(i,j)=difference between light intensity profile around I in right image and that around j in left image • F(i,j)=minimum cost associated with optimum path from start to (i,j). • Occ =occlusion cost for no-match

How to calculate optimal match 1. for 0 ≤ i ≤ n F(i,0)=i∙Occ, F(0,i)= i∙Occ 2. for 0 ≤ i,j ≤ n min1 = F(i-1,j-1) + D(i,j) min2 = F(i-1,j) + Occ min3 = F(i,j-1) + Occ F(i,j) = cmin = Min(min1,min2,min3) if (cmin = min1) M(i,j) = 1 if (cmin = min2) M(i,j) = 2 if (cmin = min3) M(i,j) = 3

How to reconstruct optimal match 1. p = n, q = n 2. while (p != 0 && q != 0) switch(M(p,q)) case 1: p matches q; p--; q--; break; case 2: p is unmatched; p--; break; case 3: q is unmatched; q--; break;

Other constraints • Smoothness: disparity usually doesn’t change too quickly. • Unfortunately, this makes the problem 2D again, since we need to look at surrounding area • Solved with a host of graph algorithms, Markov Random Fields, Belief Propagation, ….

Summary • First, we understand constraints that make the problem solvable. • Some are hard, like epipolar constraint. • Ordering isn’t a hard constraint, but most useful when treated properly • Some are soft, like pixel intensities are similar, disparities usually change slowly. • Then we find optimization method. • Which ones we can use depend on which constraints we pick.

3D Vision

3D Vision

Presentation Transcript

3D Vision

3D Vision

3D Vision

3D Vision

3D Vision

3D Vision

3D Vision

3D Vision

3D Vision

3D Vision

3D Vision

3D Vision

3D Vision

3D Vision

3D Vision

3D Vision

3D Vision

3D Vision

3D Vision

3D Vision

3D Vision

3D Vision