776 Computer Vision

776 Computer Vision Jan-Michael Frahm Spring 2012

Feature point extraction homogeneous edge corner Find points for which the following is maximum i.e. maximize smallest eigenvalue of M

Comparing image regions Compare intensities pixel-by-pixel I(x,y) I´(x,y) Dissimilarity measures • Sum of Square Differences

Comparing image regions Compare intensities pixel-by-pixel I(x,y) I´(x,y) Similarity measures Zero-mean Normalized Cross Correlation

Feature point extraction • Approximate SSD for small displacement Δ • Image difference, square difference for pixel • SSD for window

Harris corner detector • Only use local maxima, subpixel accuracy through second order surface fitting • Select strongest features over whole image and over each tile (e.g. 1000/image, 2/tile) • Use small local window: • Maximize „cornerness“:

Simple matching • for each corner in image 1 find the corner in image 2 that is most similar (using SSD or NCC) and vice-versa • Only compare geometrically compatible points • Keep mutual best matches What transformations does this work for?

Feature matching: example 3 3 2 2 4 4 1 5 1 5 What transformations does this work for? What level of transformation do we need?

Feature tracking • Identify features and track them over video • Small difference between frames • potential large difference overall • Standard approach: KLT (Kanade-Lukas-Tomasi)

Feature Tracking • Establish correspondences between identical salient points multiple images

Good features to track • Use the same window in feature selection as for tracking itself • Compute motion assuming it is small Affine is also possible, but a bit harder (6x6 in stead of 2x2) differentiate:

Example Simple displacement is sufficient between consecutive frames, but not to compare to reference template

Example

Synthetic example

Good features to keep tracking Perform affine alignment between first and last frame Stop tracking features with too large errors

Optical flow • Brightness constancy assumption (small motion) • 1D example possibility for iterative refinement

Optical flow • Brightness constancy assumption (small motion) • 2D example the “aperture” problem (1 constraint) ? (2 unknowns) isophote I(t+1)=I isophote I(t)=I

The Aperture Problem and Let • Algorithm: At each pixel compute by solving • Mis singular if all gradient vectors point in the same direction • e.g., along an edge • of course, trivially singular if the summation is over a single pixel or there is no texture • i.e., only normal flow is available (aperture problem) • Corners and textured areas are OK Motion estimation Slide credit: S.Seitz, R. Szeliski

Optical flow • How to deal with aperture problem? (3 constraints if color gradients are different) Assume neighbors have same displacement Slide credit: S.Seitz, R. Szeliski

SSD Surface – Textured area Motion estimation Slide credit: S.Seitz, R. Szeliski

SSD Surface -- Edge Motion estimation Slide credit: S.Seitz, R. Szeliski

SSD – homogeneous area Motion estimation Slide credit: S.Seitz, R. Szeliski

Lucas-Kanade Assume neighbors have same displacement least-squares:

Revisiting the small motion assumption • Is this motion small enough? • Probably not—it’s much larger than one pixel (2nd order terms dominate) • How might we solve this problem? * From Khurram Hassan-Shafique CAP5415 Computer Vision 2003

Reduce the resolution! * From Khurram Hassan-Shafique CAP5415 Computer Vision 2003

Coarse-to-fine optical flow estimation u=1.25 pixels u=2.5 pixels u=5 pixels u=10 pixels image It-1 image It-1 image I image I Gaussian pyramid of image It-1 Gaussian pyramid of image I slides from Bradsky and Thrun

Coarse-to-fine optical flow estimation warp & upsample run iterative L-K . . . image J image It-1 image I image I Gaussian pyramid of image It-1 Gaussian pyramid of image I slides from Bradsky and Thrun run iterative L-K

Gain-Adaptive KLT-Tracking Video with fixed gain Video with auto-gain • Data parallel implementation on GPU [Sinha, Frahm, Pollefeys, Genc MVA'07] • Simultaneous tracking and radiometric calibration [Kim, Frahm, Pollefeys ICCV07] • But: not data parallel – hard for GPU acceleration • Block-Jacobi iterations [Zach, Gallup, Frahm CVGPU’08] • Data parallel, very efficient on GPU

Gain Estimation Camera reported (blue) and estimated gains (red)‏ [Zach, Gallup, Frahm CVGPU08]

Limits of the gradient method Fails when intensity structure in window is poor Fails when the displacement is large (typical operating range is motion of 1 pixel) Linearization of brightness is suitable only for small displacements • Also, brightness is not strictly constant in images actually less problematic than it appears, since we can pre-filter images to make them look similar Slide credit: S.Seitz, R. Szeliski

Limitations of Yosemite Yosemite Flow Color Coding Image 7 Image 8 Ground-Truth Flow • Only sequence used for quantitative evaluation • Limitations: • Very simple and synthetic • Small, rigid motion • Minimal motion discontinuities/occlusions Slide credit: S.Seitz, R. Szeliski

Limitations of Yosemite Yosemite Flow Color Coding Image 7 Image 8 Ground-Truth Flow • Only sequence used for quantitative evaluation • Current challenges: • Non-rigid motion • Real sensor noise • Complex natural scenes • Motion discontinuities • Need more challenging and more realistic benchmarks Slide credit: S.Seitz, R. Szeliski

Realistic synthetic imagery Rock Grove • Randomly generate scenes with “trees” and “rocks” • Significant occlusions, motion, texture, and blur • Rendered using Mental Ray and “lens shader” plugin Motion estimation Slide credit: S.Seitz, R. Szeliski

Modified stereo imagery • Recrop and resample ground-truth stereo datasets to have appropriate motion for OF Venus Moebius Motion estimation Slide credit: S.Seitz, R. Szeliski

Dense flow with hidden texture Visible UV Setup Lights Image Cropped • Paint scene with textured fluorescent paint • Take 2 images: One in visible light, one in UV light • Move scene in very small steps using robot • Generate ground-truth by tracking the UV images Slide credit: S.Seitz, R. Szeliski

Experimental results • Algorithms: • Pyramid LK: OpenCV-based implementation of Lucas-Kanade on a Gaussian pyramid • Black and Anandan: Author’s implementation • Bruhn et al.: Our implementation • MediaPlayerTM: Code used for video frame-rate upsampling in Microsoft MediaPlayer • Zitnick et al.: Author’s implementation Slide credit: S.Seitz, R. Szeliski

Experimental results Motion estimation Slide credit: S.Seitz, R. Szeliski

Conclusions • Difficulty: Data substantially more challenging than Yosemite • Diversity: Substantial variation in difficulty across the various datasets • Motion GT vs Interpolation: Best algorithms for one are not the best for the other • Comparison with Stereo: Performance of existing flow algorithms appears weak Slide credit: S.Seitz, R. Szeliski

Motion representations • How can we describe this scene? Slide credit: S.Seitz, R. Szeliski

Block-based motion prediction • Break image up into square blocks • Estimate translation for each block • Use this to predict next frame, code difference (MPEG-2) Slide credit: S.Seitz, R. Szeliski

Layered motion • Break image sequence up into “layers”: •  = • Describe each layer’s motion Slide credit: S.Seitz, R. Szeliski

Layered motion • Advantages: • can represent occlusions / disocclusions • each layer’s motion can be smooth • video segmentation for semantic processing • Difficulties: • how do we determine the correct number? • how do we assign pixels? • how do we model the motion? Slide credit: S.Seitz, R. Szeliski

Layers for video summarization Motion estimation Slide credit: S.Seitz, R. Szeliski

Background modeling (MPEG-4) • Convert masked images into a background sprite for layered video coding • + + + • = Slide credit: S.Seitz, R. Szeliski

What are layers? • [Wang & Adelson, 1994] • intensities • alphas • velocities Slide credit: S.Seitz, R. Szeliski

How do we form them? Slide credit: S.Seitz, R. Szeliski

How do we estimate the layers? • compute coarse-to-fine flow • estimate affine motion in blocks (regression) • cluster with k-means • assign pixels to best fitting affine region • re-estimate affine motions in each region… Slide credit: S.Seitz, R. Szeliski

Layer synthesis • For each layer: • stabilize the sequence with the affine motion • compute median value at each pixel • Determine occlusion relationships Slide credit: S.Seitz, R. Szeliski

Results Slide credit: S.Seitz, R. Szeliski

776 Computer Vision