880 likes | 1.06k Views
Layered Scene Representations. Vision for Graphics CSE 590SS, Winter 2001 Richard Szeliski. Motion representations. How can we describe this scene?. Block-based motion prediction. Break image up into square blocks Estimate translation for each block
E N D
Layered Scene Representations Vision for GraphicsCSE 590SS, Winter 2001Richard Szeliski
Motion representations • How can we describe this scene? Vision for Graphics
Block-based motion prediction • Break image up into square blocks • Estimate translation for each block • Use this to predict next frame, code difference (MPEG-2) Vision for Graphics
Layered motion • Break image sequence up into “layers”: • = • Describe each layer’s motion Vision for Graphics
Outline • Why layers? • 2-D layers [Wang & Adelson 94; Weiss 97] • 3-D layers [Baker et al. 98] • Layered Depth Images [Shade et al. 98] • Transparency [Szeliski et al. 00] • Bayesian estimation [Torr et al. 99] Vision for Graphics
Layered motion • Advantages: • can represent occlusions / disocclusions • each layer’s motion can be smooth • video segmentation for semantic processing • Difficulties: • how do we determine the correct number? • how do we assign pixels? • how do we model the motion? Vision for Graphics
Layers for video summarization Vision for Graphics
Background modeling (MPEG-4) • Convert masked images into a background sprite for layered video coding • + + + • = Vision for Graphics
What are layers? • [Wang & Adelson, 1994] • intensities • alphas • velocities Vision for Graphics
How do we composite them? Vision for Graphics
How do we form them? Vision for Graphics
How do we form them? Vision for Graphics
How do we estimate the layers? • compute coarse-to-fine flow • estimate affine motion in blocks (regression) • cluster with k-means • assign pixels to best fitting affine region • re-estimate affine motions in each region… Vision for Graphics
Layer synthesis • For each layer: • stabilize the sequence with the affine motion • compute median value at each pixel • Determine occlusion relationships Vision for Graphics
Results Vision for Graphics
What if the motion is not affine? • Use a “regularized” (smooth) motion field • [Weiss, CVPR’97] Vision for Graphics
A Layered Approach To Stereo Reconstruction Simon Baker, Richard Szeliski and P. Anandan CVPR’98
z y x Camera 2 Camera 1 Volumetric Approaches to Stereo • Examples: • Disparity-Spaces [Intille and Bobick, ‘94] [Scharstein and Szeliski, ‘96] • Space-Coloring [Seitz and Dyer, ‘97] • Maximum-Flow Stereo [Roy and Cox, ‘98] • Advantages: • Modeling occlusions [Intille and Bobick, ‘94] • Mixed pixels + transparency [Szeliski and Golland, ‘98] • Equal treatment of many images [Collins, ‘96] Vision for Graphics
Layer 1 Layer 2 Layer 3 Camera 2 Camera 1 2.5-D Layered Approach • Additional advantages over volumetric approaches: • Fewer degrees of freedom • Less resampling artifacts • Robustness of global model + local correction • c.f. “Plane + Parallax” and “Model-Based Stereo” • Output particularly suitable for certain applications • e.g. Image-based rendering and interactive editing Vision for Graphics
layers (“sprites”) Layered Stereo • Use arbitrarily oriented sprites • Estimate 3D plane equation for each sprite Vision for Graphics
World point Plane vector n= (n , n , n , n ) T x= (x, y, z, 1) T x y z d l Plane equation n . x = 0 l u Layer sprite L = (a . r , a . g , a . b , a) v l T (u, v, 1) World origin = Q x l Residual depth Z l Coordinate frame defined by u = Layer Representation Vision for Graphics
Image I k v u Camera P k Image Formation Layer l Scene v u Boolean mask B k l v u Masked image M k l Vision for Graphics
Input: Images I & Cameras P k k Initialize layer assignment B kl Estimate plane vectors n Re-assign pixelslayers B l kl Estimate residual depth Z Estimate sprite images Ll l Refine Layer Sprites L l Output: n , L , & Z l l l Overview Vision for Graphics
Layer Initialization Alternatives • Iterate dominant motion estimation • e.g. [Irani et al., ‘95] • Apply simple stereo algorithm + fit planes • Color segmentation • e.g. [Sawhney and Ayer, ‘94] • Human initialization • e.g. [Debevec et al., ‘96] Vision for Graphics
M M M M M M M kl kl kl jl il jl jl o o o l l l l l Warped images , , … functions of n only H H H H H ik ik ij ij ij l Minimize image variance using hierarchical gradient descent Estimation of Plane Equations Layer l l H ik Camera P k Camera P j o Camera P i Vision for Graphics
M M M kl il jl “Blend” the masked images, warped onto the layer plane Estimation of Layer Sprites Plane n l Camera P k Camera P j Camera P i Vision for Graphics
Estimation of Residual Depth • Per-pixel residual depth estimation • plane plus parallax[Anandan et al.] • model-based stereo[Debevec et al.] • better accuracy / fidelity • makes forward warping more difficult Vision for Graphics
T Perturbed Plane n + (0,0,0,d) l M il Estimation of Residual Depth • Warp masked images onto perturbed plane • Compute variance image • For each pixel, choose d that minimizes variance • Smooth, incorporating confidence weighting [Szeliski & Golland, ‘98] • Recompute sprite using “Plane + Parallax” warp Camera P k Camera P M jl M j kl Camera P i Vision for Graphics
Pixel Assignment Sprite L l • Warp masked image onto • each layer plane Plane n l • Compute difference images • Un-warp difference images • For each pixel, choose the • best difference across layers Un-warped difference image • Smooth pixel assignment M il Camera P i Vision for Graphics
Image 1 Image 9 Grey coded planar depth Initial Segmentation Flower Garden Results Vision for Graphics
Flower Garden Results Recovered Sprite: Without residual depth estimation Recovered Sprite: With residual depth estimation Vision for Graphics
Image 1 of 5 Initial segmentation Grey coded planar depth Residual depth Graphics Symposium Results Vision for Graphics
Graphics Symposium Results • Resulting sprite collection Vision for Graphics
Original image 3 Re-synthesized image 3 Novel view without residual depth Novel view with residual depth Graphics Symposium Results Vision for Graphics
Layered Stereo Demo • SpriteViewer: renders sprites with depth Vision for Graphics
Discussion • Layer initialization: • Can tolerate bad initial plane estimates • Residual depth estimation: • Plane sweep algorithm, similar to [Szeliski and Golland, ‘98] • Pixel assignment: • Combine color and residual depth estimates • Currently under investigation Vision for Graphics
Summary • New approach to stereo matching: • represent scene as collection of layers • each layer has a 3-D plane equation, an alpha-matted color image, and an optional residual depth • generalizes layered motion to 3-D • Computation: • plane eqns. by warping mosaics of masked images • residual depth by perturbing planes • iteratively refine color values and pixel assignments Vision for Graphics
Layered Depth Images Jonathan Shade Steven Gortler Li-wei He Richard Szeliski SIGGRAPH’98
How to render a layer + parallax? • Can’t use inverse warping [Laveau 94] Vision for Graphics
3D Sprites with Depth • 3D sprite consists of: • alpha-matted image I1(x1,y1) • 4×4 camera matrix C1[ w1x1 w1y1 w1d1w1]T = C1 [X Y Z 1]T • plane equation AX + BY + CZ + D = 0(forms third row of C1 ) • optional per-pixel depth d1(x1,y1) Vision for Graphics
Sprites with Depth • Store d1(x1,y1) (scaled displacement) along with each sprite image I1(x1,y1) • I1 d1 I1 d1 Vision for Graphics
3D Sprites — Reprojection • • sprites new view • • use standard texture mapping (projective warp) Vision for Graphics
Forward Mapping • Mapping equation with per-pixel depth d1:[ w2x2 w2y2 w2 ]T = H1,2 [ x1 y1 1 ]T +d1 e1,2 • I1 d1(I2 ) I2 • Problems: gaps and aliasing Vision for Graphics
Inverse Mapping • Reverse order of images 1 & 2:[ w1x1 w1y1 w1 ]T = H2,1 [ x2 y2 1 ]T +d2 e2,1 I1(I2)d2 I2 • Problem: we don’t know d2! Vision for Graphics
Crude perspective map • How to map d1 d2? • Simple idea: use perspective transform H2,1 I1 d1 d2 I2 • Works well for small amounts of motion Vision for Graphics
Better forward map • How to map d1 d2? • Better idea: use full H1,2x1+d1e1,2 fwd. map I1 d1 d2 I2 • Works better for moderate amounts of motion Vision for Graphics
2-pass Mapping • Why is 2-pass mapping (d1 d2 forward followed by I1 I2 backward) a good idea? • can tolerate bigger errors in d1 mapping (since d1 is typically smooth) • can store/process d1 at lower resolution • can use better filtering on color image Vision for Graphics
Sprites with Depth — Demo • Demo Vision for Graphics
Refinements • Only forward map d1 with parallax component • Use affine approximation to parallax flow • Better gap filling • Forward map (u,v)flow instead of d1 depth Vision for Graphics
Layered Depth Images (LDIs) • Store multiple (color,z) values at each pixel • Similar to [sparse] volumetric representation • Render with forward warp (splat) Vision for Graphics