1 / 21

A fast local descriptor for dense matching

A fast local descriptor for dense matching. Engin Tola, Vincent Lepetit, Pascal Fua Ecole Polytechnique Federale de Lausanne, Switzerland. Wide baseline stereo. Stereo is called wide baseline if there are significant rotations of the camera translation between camera centers

Download Presentation

A fast local descriptor for dense matching

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A fast local descriptor for dense matching Engin Tola, Vincent Lepetit, Pascal Fua Ecole Polytechnique Federale de Lausanne, Switzerland

  2. Wide baseline stereo • Stereo is called wide baseline if there are significant • rotations of the camera • translation between camera centers • changes of internal camera parameters

  3. Depth map of short-baseline stereo: easy solution • Input images

  4. Depth map of short-baseline stereo: easy solution • Some sparse features are identified and tracked across both images

  5. Image Rectification • Transforms both image planes so that epipolar lines become collinear and parallel to one of image axes

  6. Depth map of short-baseline stereo: easy solution • The stereo matching algorithms assume that corresponding pixels between the two images lay on horizontal scanlines In real data, this requirement is unlikely to be exactly fulfilled, so the images must be rectified

  7. Depth map of short-baseline stereo: easy solution • Corresponding pixels between the two images are identified, disparity map is computed

  8. Not applicable to wide baseline stereo … • The solutions for short baseline stereo won’t work for wide baseline stereo: • The image scene is not locally planar, so the single homography transformation won’t work • Too many occlusions • Significant perspective distortions

  9. Solutions • Find some SIFT feature points in input images,Find SIFT point matches between images,Triangulate SIFT feature points then rectify every triangle separately Problems: still not sure if scene is locally planar inside the triangles, wrong point matches will generate large error in depth estimation • The most recent approach (used in paper)Use graph cuts estimation algorithm and EM framework to find optimal solution for dense point matching between imagesIt uses point descriptors for all points in the image (dense matching). SIFT descriptors, GLOH, or NCC windows are used

  10. Paper novelty • introduces DAISY local image descriptor • much faster to compute than SIFT for dense point matching • works on the par or better than SIFT • DAISY descriptors are fed into expectation-maximization (EM) algorithm which uses graph cuts to estimate the scene’s depth

  11. SIFT local image descriptor • SIFT descriptor is a 3–D histogram in which two dimensions correspond to image spatial dimensions and the additional dimension to the image gradient direction (normally discretized into 8 bins)

  12. SIFT local image descriptor • Each bin contains a weighted sum of the norms of the image gradients around its center, where the weights roughly depend on the distance to the bin center

  13. DAISY local image descriptor • Gaussian convolved orientation maps are calculated for every direction : Gaussian convolution filter with variance S : image gradient in direction o (.)+ : operator (a)+ = max(a, 0) • We observe that every location in contains a value very similar to what a bin in SIFT contains: a weighted sum computed over an area of gradient norms

  14. DAISY local image descriptor

  15. DAISY local image descriptor • Histograms at every pixel location are computed : histogram at location (u, v) : Gaussian convolved orientation maps • Histograms are normalized to unit norm • Local image descriptor is computed as

  16. DAISY vs SIFT: computational complexity • Convolution is time-efficient for separable kernels like Gaussian • Convolution maps with larger Gaussian kernel can be built upon convolution maps with smaller Gaussian kernel:

  17. Probabilistic model for dense point matching • The model uses EM to estimate depth map Z and occlusion map O by maximizing : descriptor of image n • Assume independence between pixel locations • Define occlusion masks as 25-length vectors • Finally, define to be a Laplacian where D is

  18. Results

  19. Results

  20. Results

  21. Results

More Related