Summary: A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms

Summary: A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms Matthew Wilhelm CS5331 Mobile Robotics

Goal / Motivation • Provide means of quantitatively gauge progress in the field of Stereo Correspondence as well as judge the value of new approaches • Novel publications will have to improve in some way on the performance of existing algorithms • Provide an update on the state of the art of the field

Background / Theory • All vision algorithms make assumptions about physical world and camera • Stereo Algorithms commonly make the following assumptions • Lambertian surfaces – appearance does not vary with viewpoint • Piecewise-smooth surfaces • Camera Calibration and geometry

Disparity • ~ difference between location of matching pixels • ≈ inverse depth • Various computation methods • Displayed as a disparity space image, close items will be brighter and far away items will be darker

Taxonomy • A classification system for items based on their relationship to one another • Allows dissection and comparison of individual algorithm components and design decisions • Matching Cost Computation • Cost Aggregation • Disparity Computation / Optimization • Disparity Refinement • Existing algorithms are built of various implementations of above classifications

Matching Cost Computation • Form initial Disparity Space Image • Many Methods including: • Squared Intensity Differences • Absolute Intensity Differences • Truncated Quadratics • Contaminated Gaussians • Normalized Cross-Correlation • Binary Features

Cost Aggregation • Group similar costs in disparity space image in order to form objects • Again Many Different Methods Including: • Square Windows • Gaussian Convolution • Shiftable Windows • Adaptable Size Windows

Disparity Computation / Optimization • Local Methods – majority of work done in previous two steps • For optimization simply choose at each pixel the disparity with the minimum cost value. • Uniqueness is only enforced on one image. • Global Methods – majority of work done in this stage • Energy Minimization – continuation, simulated annealing, highest confidence first, and mean field annealing • Max-Flow and Graph-Cut for special cases • Dynamic Programming – compute disparity for pair wise matching costs, using adjusting parameters • Cooperative Algorithms – models human stereo vision

Disparity Refinement • Sub pixel disparity estimates used when rendering images for more appealing view results • Clean up mismatches via various methods • Not usually done for fast implementation such as robot navigation or tracking

Implementation • Closely tied to Taxonomy given above • Author developed modular and portable C++ implementation of several stereo algorithms • Post processing steps to improve results not implemented, in order to compare methods directly. • Easily extendable to include other algorithms

Implementation Details • Matching Cost Computation • Squared or absolute difference in color • Sub-pixel interpolation • Aggregation • Box Filter: separable moving average filter • Binomial Filter: separable finite impulse response filters • Optimization • Winner-take-all, dynamic programming, scanline optimization, simulated annealing, and graph cut • Refinement • Three aggregated matching cost values around the winning disparity are examined to compute the sub-pixel disparity estimate

Evaluation • Allows for quantitative evaluation of stereo algorithms • Provides test bed for new and existing algorithms along with test data and results on the Web at http://vision.middlebury.edu/stereo/ • Allows for testing of individual components as divided in taxonomy

Quality Metrics • RMS error – root-mean-squared value of difference between the computed disparity map and the ground truth map • Percentage of bad matching pixels – disparity error tolerance • Computed over whole image as well as three areas which usually cause problems: • Textureless regions – average intensity gradient to low • Occluded regions – mapped disparity lands at location covered by closer object • Depth discontinuity regions – neighboring disparities differ by to much

Experiments • Authors perform several experiments to compare various algorithm components, again as divided in the taxonomy • Focus on common problem areas for stereo algorithms

Experiments / Results • Matching Costs • Experiment 1: ran many tests with different matching cost truncation values found good results are 5-20 • Experiment 2: ran same test as above, but used a 9x9 min filter before truncation and found that no truncation performed best • Experiment 3: tested effects of matching cost and truncation on global algorithms, found that some truncation helped, and suggested use of SNR based parameter setting

Experiments / Results • Aggregation • Experiment 4: Analyze affects of various aggregation techniques on local methods • Large amounts of aggregation are necessary in textureless regions • Shiftable windows perform best

Experiments / Results • Disparity Computation /Optimization • Experiment 5: analyze smoothness parameter • Found that the optimal smoothness parameter varies greatly for each image pair • Future work includes parameter calculation techniques • Experiment 6: Focus on graph-cut optimization • While Birchfield-Tomasi’s method and gradient based smoothness cost improve performance of graph-cut algorithms, • Choosing the right parameters for threshold and penalty is difficult and image specific

Experiments / Results • Sub-Pixel Estimations • Experiment 7: refine disparity maps via sub-pixel interpolation • As expected an unrefined DSI contains staircase error, where refined DSI is considerably better • Again, this step is often skipped in fast implementations

Conclusion • The author provides a comparison of 20 stereo algorithms all of which are available in detail on website • Found that most algorithms perform about the same in so-called easy area’s and the differences arise in known problematic areas • One evaluation of algorithms that I thought would have been helpful was runtime comparisons, however the author was not concerned with this

Questions? • Can you clarify what is being reference in Figure 1 (f) regarding the disparity levels as a slice? • A slice simply means that the DSI is 3D and the are keeping on of the 3 variables constant to produce a 2D image • Can you find references to using illumination along side the stereo depth analysis to further define the depth of objects? • I have searched some and did not see any papers however this does not mean that it is impossible. • Probably would be very helpful to have a illumination estimate prior to stereo evaluation • And of course, can you simplify the differences between each algorithm? • I think I have done a brief simplification, to go into more detail I would have to read each of the 132 referenced papers

Questions?? • How were the stereo algorithms chosen in the paper? • The paper focuses on Dense Two-Frame Stereo Correspondence Algorithms • Common algorithms which needed to be compared were chosen to be implemented however the framework allows novel algorithms to be implemented as well • What is a stereo algorithm? • A stereo algorithm utilizes images from two cameras, similar to human vision (two eyes)

Questions??? • In page 2, section 2.2, it was indicated that an unvalued disparity map is produced as output. What is “unvalued disparity map”? • It sais univalued • I think this means that there is a single value for the disparity at each pixel • On page 11 under the evaluation section they discuss that they use three different regions to check the algorithm over (texture less, occluded, and depth) how did they come up with these? • These are common problem areas for several different stereo algorithms

Questions???? • Why did they down sample the images for testing (page 13)? • To normalize the motion of background objects to a few pixels per frame, to allow better results when matching and truly compare various algorithms quality • Why do they only evaluate, bad_pixels_nonocc, bad_pixels_textureless, and bad_pixels_discont ? • The also evaluate the whole image, but the do these areas separate as well to get an idea of how different algorithms perform in these known problem areas

Additional Resources • References throughout the paper provide resources for various algorithms • Hartley and Zisserman: Multiple View Geometry in Computer Vision. • Middlebury website, an excellent source of papers and code related to stereo vision algorithms. • Sebastian Thrun, Wolfram Burgard and Dieter Fox: Probabilistic Robotics, MIT Press, 2005.

Summary: A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms