250 likes | 445 Views
Summary: A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms. Matthew Wilhelm CS5331 Mobile Robotics. Goal / Motivation. Provide means of quantitatively gauge progress in the field of Stereo Correspondence as well as judge the value of new approaches
E N D
Summary: A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms Matthew Wilhelm CS5331 Mobile Robotics
Goal / Motivation • Provide means of quantitatively gauge progress in the field of Stereo Correspondence as well as judge the value of new approaches • Novel publications will have to improve in some way on the performance of existing algorithms • Provide an update on the state of the art of the field
Background / Theory • All vision algorithms make assumptions about physical world and camera • Stereo Algorithms commonly make the following assumptions • Lambertian surfaces – appearance does not vary with viewpoint • Piecewise-smooth surfaces • Camera Calibration and geometry
Disparity • ~ difference between location of matching pixels • ≈ inverse depth • Various computation methods • Displayed as a disparity space image, close items will be brighter and far away items will be darker
Taxonomy • A classification system for items based on their relationship to one another • Allows dissection and comparison of individual algorithm components and design decisions • Matching Cost Computation • Cost Aggregation • Disparity Computation / Optimization • Disparity Refinement • Existing algorithms are built of various implementations of above classifications
Matching Cost Computation • Form initial Disparity Space Image • Many Methods including: • Squared Intensity Differences • Absolute Intensity Differences • Truncated Quadratics • Contaminated Gaussians • Normalized Cross-Correlation • Binary Features
Cost Aggregation • Group similar costs in disparity space image in order to form objects • Again Many Different Methods Including: • Square Windows • Gaussian Convolution • Shiftable Windows • Adaptable Size Windows
Disparity Computation / Optimization • Local Methods – majority of work done in previous two steps • For optimization simply choose at each pixel the disparity with the minimum cost value. • Uniqueness is only enforced on one image. • Global Methods – majority of work done in this stage • Energy Minimization – continuation, simulated annealing, highest confidence first, and mean field annealing • Max-Flow and Graph-Cut for special cases • Dynamic Programming – compute disparity for pair wise matching costs, using adjusting parameters • Cooperative Algorithms – models human stereo vision
Disparity Refinement • Sub pixel disparity estimates used when rendering images for more appealing view results • Clean up mismatches via various methods • Not usually done for fast implementation such as robot navigation or tracking
Implementation • Closely tied to Taxonomy given above • Author developed modular and portable C++ implementation of several stereo algorithms • Post processing steps to improve results not implemented, in order to compare methods directly. • Easily extendable to include other algorithms
Implementation Details • Matching Cost Computation • Squared or absolute difference in color • Sub-pixel interpolation • Aggregation • Box Filter: separable moving average filter • Binomial Filter: separable finite impulse response filters • Optimization • Winner-take-all, dynamic programming, scanline optimization, simulated annealing, and graph cut • Refinement • Three aggregated matching cost values around the winning disparity are examined to compute the sub-pixel disparity estimate
Evaluation • Allows for quantitative evaluation of stereo algorithms • Provides test bed for new and existing algorithms along with test data and results on the Web at http://vision.middlebury.edu/stereo/ • Allows for testing of individual components as divided in taxonomy
Quality Metrics • RMS error – root-mean-squared value of difference between the computed disparity map and the ground truth map • Percentage of bad matching pixels – disparity error tolerance • Computed over whole image as well as three areas which usually cause problems: • Textureless regions – average intensity gradient to low • Occluded regions – mapped disparity lands at location covered by closer object • Depth discontinuity regions – neighboring disparities differ by to much
Experiments • Authors perform several experiments to compare various algorithm components, again as divided in the taxonomy • Focus on common problem areas for stereo algorithms
Experiments / Results • Matching Costs • Experiment 1: ran many tests with different matching cost truncation values found good results are 5-20 • Experiment 2: ran same test as above, but used a 9x9 min filter before truncation and found that no truncation performed best • Experiment 3: tested effects of matching cost and truncation on global algorithms, found that some truncation helped, and suggested use of SNR based parameter setting
Experiments / Results • Aggregation • Experiment 4: Analyze affects of various aggregation techniques on local methods • Large amounts of aggregation are necessary in textureless regions • Shiftable windows perform best
Experiments / Results • Disparity Computation /Optimization • Experiment 5: analyze smoothness parameter • Found that the optimal smoothness parameter varies greatly for each image pair • Future work includes parameter calculation techniques • Experiment 6: Focus on graph-cut optimization • While Birchfield-Tomasi’s method and gradient based smoothness cost improve performance of graph-cut algorithms, • Choosing the right parameters for threshold and penalty is difficult and image specific
Experiments / Results • Sub-Pixel Estimations • Experiment 7: refine disparity maps via sub-pixel interpolation • As expected an unrefined DSI contains staircase error, where refined DSI is considerably better • Again, this step is often skipped in fast implementations
Conclusion • The author provides a comparison of 20 stereo algorithms all of which are available in detail on website • Found that most algorithms perform about the same in so-called easy area’s and the differences arise in known problematic areas • One evaluation of algorithms that I thought would have been helpful was runtime comparisons, however the author was not concerned with this
Questions? • Can you clarify what is being reference in Figure 1 (f) regarding the disparity levels as a slice? • A slice simply means that the DSI is 3D and the are keeping on of the 3 variables constant to produce a 2D image • Can you find references to using illumination along side the stereo depth analysis to further define the depth of objects? • I have searched some and did not see any papers however this does not mean that it is impossible. • Probably would be very helpful to have a illumination estimate prior to stereo evaluation • And of course, can you simplify the differences between each algorithm? • I think I have done a brief simplification, to go into more detail I would have to read each of the 132 referenced papers
Questions?? • How were the stereo algorithms chosen in the paper? • The paper focuses on Dense Two-Frame Stereo Correspondence Algorithms • Common algorithms which needed to be compared were chosen to be implemented however the framework allows novel algorithms to be implemented as well • What is a stereo algorithm? • A stereo algorithm utilizes images from two cameras, similar to human vision (two eyes)
Questions??? • In page 2, section 2.2, it was indicated that an unvalued disparity map is produced as output. What is “unvalued disparity map”? • It sais univalued • I think this means that there is a single value for the disparity at each pixel • On page 11 under the evaluation section they discuss that they use three different regions to check the algorithm over (texture less, occluded, and depth) how did they come up with these? • These are common problem areas for several different stereo algorithms
Questions???? • Why did they down sample the images for testing (page 13)? • To normalize the motion of background objects to a few pixels per frame, to allow better results when matching and truly compare various algorithms quality • Why do they only evaluate, bad_pixels_nonocc, bad_pixels_textureless, and bad_pixels_discont ? • The also evaluate the whole image, but the do these areas separate as well to get an idea of how different algorithms perform in these known problem areas
Additional Resources • References throughout the paper provide resources for various algorithms • Hartley and Zisserman: Multiple View Geometry in Computer Vision. • Middlebury website, an excellent source of papers and code related to stereo vision algorithms. • Sebastian Thrun, Wolfram Burgard and Dieter Fox: Probabilistic Robotics, MIT Press, 2005.