210 likes | 374 Views
A fast local descriptor for dense matching. Engin Tola, Vincent Lepetit, Pascal Fua Ecole Polytechnique Federale de Lausanne, Switzerland. Wide baseline stereo. Stereo is called wide baseline if there are significant rotations of the camera translation between camera centers
E N D
A fast local descriptor for dense matching Engin Tola, Vincent Lepetit, Pascal Fua Ecole Polytechnique Federale de Lausanne, Switzerland
Wide baseline stereo • Stereo is called wide baseline if there are significant • rotations of the camera • translation between camera centers • changes of internal camera parameters
Depth map of short-baseline stereo: easy solution • Input images
Depth map of short-baseline stereo: easy solution • Some sparse features are identified and tracked across both images
Image Rectification • Transforms both image planes so that epipolar lines become collinear and parallel to one of image axes
Depth map of short-baseline stereo: easy solution • The stereo matching algorithms assume that corresponding pixels between the two images lay on horizontal scanlines In real data, this requirement is unlikely to be exactly fulfilled, so the images must be rectified
Depth map of short-baseline stereo: easy solution • Corresponding pixels between the two images are identified, disparity map is computed
Not applicable to wide baseline stereo … • The solutions for short baseline stereo won’t work for wide baseline stereo: • The image scene is not locally planar, so the single homography transformation won’t work • Too many occlusions • Significant perspective distortions
Solutions • Find some SIFT feature points in input images,Find SIFT point matches between images,Triangulate SIFT feature points then rectify every triangle separately Problems: still not sure if scene is locally planar inside the triangles, wrong point matches will generate large error in depth estimation • The most recent approach (used in paper)Use graph cuts estimation algorithm and EM framework to find optimal solution for dense point matching between imagesIt uses point descriptors for all points in the image (dense matching). SIFT descriptors, GLOH, or NCC windows are used
Paper novelty • introduces DAISY local image descriptor • much faster to compute than SIFT for dense point matching • works on the par or better than SIFT • DAISY descriptors are fed into expectation-maximization (EM) algorithm which uses graph cuts to estimate the scene’s depth
SIFT local image descriptor • SIFT descriptor is a 3–D histogram in which two dimensions correspond to image spatial dimensions and the additional dimension to the image gradient direction (normally discretized into 8 bins)
SIFT local image descriptor • Each bin contains a weighted sum of the norms of the image gradients around its center, where the weights roughly depend on the distance to the bin center
DAISY local image descriptor • Gaussian convolved orientation maps are calculated for every direction : Gaussian convolution filter with variance S : image gradient in direction o (.)+ : operator (a)+ = max(a, 0) • We observe that every location in contains a value very similar to what a bin in SIFT contains: a weighted sum computed over an area of gradient norms
DAISY local image descriptor • Histograms at every pixel location are computed : histogram at location (u, v) : Gaussian convolved orientation maps • Histograms are normalized to unit norm • Local image descriptor is computed as
DAISY vs SIFT: computational complexity • Convolution is time-efficient for separable kernels like Gaussian • Convolution maps with larger Gaussian kernel can be built upon convolution maps with smaller Gaussian kernel:
Probabilistic model for dense point matching • The model uses EM to estimate depth map Z and occlusion map O by maximizing : descriptor of image n • Assume independence between pixel locations • Define occlusion masks as 25-length vectors • Finally, define to be a Laplacian where D is