570 likes | 661 Views
776 Computer Vision. Jan-Michael Frahm Spring 2012. Feature point extraction. homogeneous. edge. corner. Find points for which the following is maximum. i.e. maximize smallest eigenvalue of M. Comparing image regions. Compare intensities pixel-by-pixel. I(x,y). I´(x,y).
E N D
776 Computer Vision Jan-Michael Frahm Spring 2012
Feature point extraction homogeneous edge corner Find points for which the following is maximum i.e. maximize smallest eigenvalue of M
Comparing image regions Compare intensities pixel-by-pixel I(x,y) I´(x,y) Dissimilarity measures • Sum of Square Differences
Comparing image regions Compare intensities pixel-by-pixel I(x,y) I´(x,y) Similarity measures Zero-mean Normalized Cross Correlation
Feature point extraction • Approximate SSD for small displacement Δ • Image difference, square difference for pixel • SSD for window
Harris corner detector • Only use local maxima, subpixel accuracy through second order surface fitting • Select strongest features over whole image and over each tile (e.g. 1000/image, 2/tile) • Use small local window: • Maximize „cornerness“:
Simple matching • for each corner in image 1 find the corner in image 2 that is most similar (using SSD or NCC) and vice-versa • Only compare geometrically compatible points • Keep mutual best matches What transformations does this work for?
Feature matching: example 3 3 2 2 4 4 1 5 1 5 What transformations does this work for? What level of transformation do we need?
Feature tracking • Identify features and track them over video • Small difference between frames • potential large difference overall • Standard approach: KLT (Kanade-Lukas-Tomasi)
Feature Tracking • Establish correspondences between identical salient points multiple images
Good features to track • Use the same window in feature selection as for tracking itself • Compute motion assuming it is small Affine is also possible, but a bit harder (6x6 in stead of 2x2) differentiate:
Example Simple displacement is sufficient between consecutive frames, but not to compare to reference template
Good features to keep tracking Perform affine alignment between first and last frame Stop tracking features with too large errors
Optical flow • Brightness constancy assumption (small motion) • 1D example possibility for iterative refinement
Optical flow • Brightness constancy assumption (small motion) • 2D example the “aperture” problem (1 constraint) ? (2 unknowns) isophote I(t+1)=I isophote I(t)=I
The Aperture Problem and Let • Algorithm: At each pixel compute by solving • Mis singular if all gradient vectors point in the same direction • e.g., along an edge • of course, trivially singular if the summation is over a single pixel or there is no texture • i.e., only normal flow is available (aperture problem) • Corners and textured areas are OK Motion estimation Slide credit: S.Seitz, R. Szeliski
Optical flow • How to deal with aperture problem? (3 constraints if color gradients are different) Assume neighbors have same displacement Slide credit: S.Seitz, R. Szeliski
SSD Surface – Textured area Motion estimation Slide credit: S.Seitz, R. Szeliski
SSD Surface -- Edge Motion estimation Slide credit: S.Seitz, R. Szeliski
SSD – homogeneous area Motion estimation Slide credit: S.Seitz, R. Szeliski
Lucas-Kanade Assume neighbors have same displacement least-squares:
Revisiting the small motion assumption • Is this motion small enough? • Probably not—it’s much larger than one pixel (2nd order terms dominate) • How might we solve this problem? * From Khurram Hassan-Shafique CAP5415 Computer Vision 2003
Reduce the resolution! * From Khurram Hassan-Shafique CAP5415 Computer Vision 2003
Coarse-to-fine optical flow estimation u=1.25 pixels u=2.5 pixels u=5 pixels u=10 pixels image It-1 image It-1 image I image I Gaussian pyramid of image It-1 Gaussian pyramid of image I slides from Bradsky and Thrun
Coarse-to-fine optical flow estimation warp & upsample run iterative L-K . . . image J image It-1 image I image I Gaussian pyramid of image It-1 Gaussian pyramid of image I slides from Bradsky and Thrun run iterative L-K
Gain-Adaptive KLT-Tracking Video with fixed gain Video with auto-gain • Data parallel implementation on GPU [Sinha, Frahm, Pollefeys, Genc MVA'07] • Simultaneous tracking and radiometric calibration [Kim, Frahm, Pollefeys ICCV07] • But: not data parallel – hard for GPU acceleration • Block-Jacobi iterations [Zach, Gallup, Frahm CVGPU’08] • Data parallel, very efficient on GPU
Gain Estimation Camera reported (blue) and estimated gains (red) [Zach, Gallup, Frahm CVGPU08]
Limits of the gradient method Fails when intensity structure in window is poor Fails when the displacement is large (typical operating range is motion of 1 pixel) Linearization of brightness is suitable only for small displacements • Also, brightness is not strictly constant in images actually less problematic than it appears, since we can pre-filter images to make them look similar Slide credit: S.Seitz, R. Szeliski
Limitations of Yosemite Yosemite Flow Color Coding Image 7 Image 8 Ground-Truth Flow • Only sequence used for quantitative evaluation • Limitations: • Very simple and synthetic • Small, rigid motion • Minimal motion discontinuities/occlusions Slide credit: S.Seitz, R. Szeliski
Limitations of Yosemite Yosemite Flow Color Coding Image 7 Image 8 Ground-Truth Flow • Only sequence used for quantitative evaluation • Current challenges: • Non-rigid motion • Real sensor noise • Complex natural scenes • Motion discontinuities • Need more challenging and more realistic benchmarks Slide credit: S.Seitz, R. Szeliski
Realistic synthetic imagery Rock Grove • Randomly generate scenes with “trees” and “rocks” • Significant occlusions, motion, texture, and blur • Rendered using Mental Ray and “lens shader” plugin Motion estimation Slide credit: S.Seitz, R. Szeliski
Modified stereo imagery • Recrop and resample ground-truth stereo datasets to have appropriate motion for OF Venus Moebius Motion estimation Slide credit: S.Seitz, R. Szeliski
Dense flow with hidden texture Visible UV Setup Lights Image Cropped • Paint scene with textured fluorescent paint • Take 2 images: One in visible light, one in UV light • Move scene in very small steps using robot • Generate ground-truth by tracking the UV images Slide credit: S.Seitz, R. Szeliski
Experimental results • Algorithms: • Pyramid LK: OpenCV-based implementation of Lucas-Kanade on a Gaussian pyramid • Black and Anandan: Author’s implementation • Bruhn et al.: Our implementation • MediaPlayerTM: Code used for video frame-rate upsampling in Microsoft MediaPlayer • Zitnick et al.: Author’s implementation Slide credit: S.Seitz, R. Szeliski
Experimental results Motion estimation Slide credit: S.Seitz, R. Szeliski
Conclusions • Difficulty: Data substantially more challenging than Yosemite • Diversity: Substantial variation in difficulty across the various datasets • Motion GT vs Interpolation: Best algorithms for one are not the best for the other • Comparison with Stereo: Performance of existing flow algorithms appears weak Slide credit: S.Seitz, R. Szeliski
Motion representations • How can we describe this scene? Slide credit: S.Seitz, R. Szeliski
Block-based motion prediction • Break image up into square blocks • Estimate translation for each block • Use this to predict next frame, code difference (MPEG-2) Slide credit: S.Seitz, R. Szeliski
Layered motion • Break image sequence up into “layers”: • = • Describe each layer’s motion Slide credit: S.Seitz, R. Szeliski
Layered motion • Advantages: • can represent occlusions / disocclusions • each layer’s motion can be smooth • video segmentation for semantic processing • Difficulties: • how do we determine the correct number? • how do we assign pixels? • how do we model the motion? Slide credit: S.Seitz, R. Szeliski
Layers for video summarization Motion estimation Slide credit: S.Seitz, R. Szeliski
Background modeling (MPEG-4) • Convert masked images into a background sprite for layered video coding • + + + • = Slide credit: S.Seitz, R. Szeliski
What are layers? • [Wang & Adelson, 1994] • intensities • alphas • velocities Slide credit: S.Seitz, R. Szeliski
How do we form them? Slide credit: S.Seitz, R. Szeliski
How do we estimate the layers? • compute coarse-to-fine flow • estimate affine motion in blocks (regression) • cluster with k-means • assign pixels to best fitting affine region • re-estimate affine motions in each region… Slide credit: S.Seitz, R. Szeliski
Layer synthesis • For each layer: • stabilize the sequence with the affine motion • compute median value at each pixel • Determine occlusion relationships Slide credit: S.Seitz, R. Szeliski
Results Slide credit: S.Seitz, R. Szeliski