IEEE Transactions on Circuits and Systems for Video Technology, 2011

Real-time Stereo Matching on CUDA using an Iterative Refinement Method for Adaptive Support-Weight Correspondences IEEE Transactions on Circuits and Systems for Video Technology, 2011 University of Nebraska-Lincoln JedrzejKowalczuk Eric T. Psota Lance C. Pérez

Outline • Introduction • Related work • Iterative model • Implement on parallel hardware • Result • Conclusion

Introduction • A novel real-time stereo matching method is presented by using • a two-pass approximation of adaptive support-weight aggregation. • a low-complexity iterative disparity refinement technique. • The refinement technique, constructed using a probabilistic framework.

Introduction • two-pass method produces • an accurate approximation of the support weights. • reducing the complexity of aggregation. • This method has been implemented on massively parallel using the CUDA computing engine.

Introduction • In this paper, a real-time stereo matching method is introduced by using • window-based cost aggregation. • a low-complexity iterative technique implemented. on CUDA.

Introduction • Many real-time methods focus on reducing the complexity, at the expense of reduced accuracy. • The proposed approach takes full advantage of the GTX 580’s computing capabilities to produce a highly accurate stereo matching method.

Related work • Adaptive support-weight • mimics the process of visual grouping in the HVS. • decreases as the geometric distance between p and q increases. • typical scene surfaces have locally consistent color.

Adaptive Support-Weight • . • . • .

Adaptive Support-Weight • Complexity of ASW makes it unsuitable for cost aggregation in real-time applications. • It is necessary to reduce the complexity of raw adaptive support-weight cost aggregation. • two-pass adaptive support weights [21] • approximated joint bilateral filtering [22] • exponential step-size adaptive weights [9] • cross-based support weight [11]

Two-pass Adaptive Support-Weight • Instead of using square windows for matching. • The two-pass approach approximates the ASW by performing cost aggregation along the vertical and then the horizontal direction. • Complexity is reduced from O(n2) to O(n).

Two-pass Adaptive Support-Weight • Fail to accurately approximate the support weights under certain conditions.

Compare the Four Modifications Two-pass Bilateral Filtering ESAW Cross-based

Flow Diagram

Iterative model • Improve the accuracy of the adaptive support-weight stereo matching. • Let denote a probabilistic event • .

Iterative model • Bayes’ theorem

Iterative model • Stereo matching is performed by using an additive distance metric, arbitrarily denoted by δ(q, ͞q). • . • .

Iterative model • .

Iterative Disparity Refinement • Let Dpibe the disparity estimate for pixel p obtained in the ith iteration of matching. • Let Fpiused to express the confidence level associated with the disparity estimate of pixel p. • .

Iterative Disparity Refinement • Penalty function

Iterative Disparity Refinement • After the matching costs are computed, the minimum cost matches are found for both reference and target images using the WTA decision criteria.

Iterative Disparity Refinement • If ͞p = m(p) and p’ = m(͞p), then • disparity d(p, ͞p) is assigned to reference disparity map. • disparity d(p’, ͞p) is assigned to target disparity map. • If | d(p, ͞p) - d(p’, ͞p) | > 1, then its confidence Fpi is set to zero.

Outline • Introduction • Related work • Iterative model • Implement on parallel hardware • CUDA execution model • stereo matching on CUDA • complexity and runtime distribution • Result • Conclusion

Flow Diagram

CUDA execution model • A block of threads is an abstract representation of a multiprocessor and capable of performing operations in parallel. • The threads are executed on the graphics device equipped with a GPU. • At runtime, each block of threads gets mapped to a single multiprocessor on the device.

CUDA execution model • The implementation of the proposed method utilizes the NVIDIA GeForce GTX 580 GPU computing processor, equipped with 512 CUDA cores. • The device code is encapsulated in special functions called kernels that are invoked by the host, and executed in parallel by multiple threads.

Stereo Matching on CUDA • The kernels are designed such that each thread within a block is responsible for computing the matching cost for a single pair of pixels. • This granularity of computations allows the threads in each warp to take advantage of memory coalescing.

Stereo Matching on CUDA

Complexity and Runtime Distribution • Complexity of matching cost volume is O(mnwr/s). • Complexity of iterative refinement is O(mnwk/s).

Percentages of the total execution time

Result • γc= 30.91 and γg= 28.21 for matching cost aggregation. • γc= 10.94 and γg= 118.78 for iterative disparity refinement, and the disparity penalty was set to α = 0.085.

Result

Conclusion • The refinement technique iteratively improves the accuracy of the disparity map and typically converges after only six iterations. • The added complexity associated with iterative refinement is shown both analytically and experimentally to be relatively small.

IEEE Transactions on Circuits and Systems for Video Technology, 2011