Layered Scene Representations

Layered Scene Representations Vision for GraphicsCSE 590SS, Winter 2001Richard Szeliski

Motion representations • How can we describe this scene? Vision for Graphics

Block-based motion prediction • Break image up into square blocks • Estimate translation for each block • Use this to predict next frame, code difference (MPEG-2) Vision for Graphics

Layered motion • Break image sequence up into “layers”: •  = • Describe each layer’s motion Vision for Graphics

Outline • Why layers? • 2-D layers [Wang & Adelson 94; Weiss 97] • 3-D layers [Baker et al. 98] • Layered Depth Images [Shade et al. 98] • Transparency [Szeliski et al. 00] • Bayesian estimation [Torr et al. 99] Vision for Graphics

Layered motion • Advantages: • can represent occlusions / disocclusions • each layer’s motion can be smooth • video segmentation for semantic processing • Difficulties: • how do we determine the correct number? • how do we assign pixels? • how do we model the motion? Vision for Graphics

Layers for video summarization Vision for Graphics

Background modeling (MPEG-4) • Convert masked images into a background sprite for layered video coding • + + + • = Vision for Graphics

What are layers? • [Wang & Adelson, 1994] • intensities • alphas • velocities Vision for Graphics

How do we composite them? Vision for Graphics

How do we form them? Vision for Graphics

How do we estimate the layers? • compute coarse-to-fine flow • estimate affine motion in blocks (regression) • cluster with k-means • assign pixels to best fitting affine region • re-estimate affine motions in each region… Vision for Graphics

Layer synthesis • For each layer: • stabilize the sequence with the affine motion • compute median value at each pixel • Determine occlusion relationships Vision for Graphics

Results Vision for Graphics

What if the motion is not affine? • Use a “regularized” (smooth) motion field • [Weiss, CVPR’97] Vision for Graphics

A Layered Approach To Stereo Reconstruction Simon Baker, Richard Szeliski and P. Anandan CVPR’98

z y x Camera 2 Camera 1 Volumetric Approaches to Stereo • Examples: • Disparity-Spaces [Intille and Bobick, ‘94] [Scharstein and Szeliski, ‘96] • Space-Coloring [Seitz and Dyer, ‘97] • Maximum-Flow Stereo [Roy and Cox, ‘98] • Advantages: • Modeling occlusions [Intille and Bobick, ‘94] • Mixed pixels + transparency [Szeliski and Golland, ‘98] • Equal treatment of many images [Collins, ‘96] Vision for Graphics

Layer 1 Layer 2 Layer 3 Camera 2 Camera 1 2.5-D Layered Approach • Additional advantages over volumetric approaches: • Fewer degrees of freedom • Less resampling artifacts • Robustness of global model + local correction • c.f. “Plane + Parallax” and “Model-Based Stereo” • Output particularly suitable for certain applications • e.g. Image-based rendering and interactive editing Vision for Graphics

layers (“sprites”) Layered Stereo • Use arbitrarily oriented sprites • Estimate 3D plane equation for each sprite Vision for Graphics

World point Plane vector n= (n , n , n , n ) T x= (x, y, z, 1) T x y z d l Plane equation n . x = 0 l u Layer sprite L = (a . r , a . g , a . b , a) v l T (u, v, 1) World origin = Q x l Residual depth Z l Coordinate frame defined by u = Layer Representation Vision for Graphics

Image I k v u Camera P k Image Formation Layer l Scene v u Boolean mask B k l v u Masked image M k l Vision for Graphics

Input: Images I & Cameras P k k Initialize layer assignment B kl Estimate plane vectors n Re-assign pixelslayers B l kl Estimate residual depth Z Estimate sprite images Ll l Refine Layer Sprites L l Output: n , L , & Z l l l Overview Vision for Graphics

Layer Initialization Alternatives • Iterate dominant motion estimation • e.g. [Irani et al., ‘95] • Apply simple stereo algorithm + fit planes • Color segmentation • e.g. [Sawhney and Ayer, ‘94] • Human initialization • e.g. [Debevec et al., ‘96] Vision for Graphics

M M M M M M M kl kl kl jl il jl jl o o o l l l l l Warped images , , … functions of n only H H H H H ik ik ij ij ij l Minimize image variance using hierarchical gradient descent Estimation of Plane Equations Layer l l H ik Camera P k Camera P j o Camera P i Vision for Graphics

M M M kl il jl “Blend” the masked images, warped onto the layer plane Estimation of Layer Sprites Plane n l Camera P k Camera P j Camera P i Vision for Graphics

Estimation of Residual Depth • Per-pixel residual depth estimation • plane plus parallax[Anandan et al.] • model-based stereo[Debevec et al.] • better accuracy / fidelity • makes forward warping more difficult Vision for Graphics

T Perturbed Plane n + (0,0,0,d) l M il Estimation of Residual Depth • Warp masked images onto perturbed plane • Compute variance image • For each pixel, choose d that minimizes variance • Smooth, incorporating confidence weighting [Szeliski & Golland, ‘98] • Recompute sprite using “Plane + Parallax” warp Camera P k Camera P M jl M j kl Camera P i Vision for Graphics

Pixel Assignment Sprite L l • Warp masked image onto • each layer plane Plane n l • Compute difference images • Un-warp difference images • For each pixel, choose the • best difference across layers Un-warped difference image • Smooth pixel assignment M il Camera P i Vision for Graphics

Image 1 Image 9 Grey coded planar depth Initial Segmentation Flower Garden Results Vision for Graphics

Flower Garden Results Recovered Sprite: Without residual depth estimation Recovered Sprite: With residual depth estimation Vision for Graphics

Image 1 of 5 Initial segmentation Grey coded planar depth Residual depth Graphics Symposium Results Vision for Graphics

Graphics Symposium Results • Resulting sprite collection Vision for Graphics

Original image 3 Re-synthesized image 3 Novel view without residual depth Novel view with residual depth Graphics Symposium Results Vision for Graphics

Layered Stereo Demo • SpriteViewer: renders sprites with depth Vision for Graphics

Discussion • Layer initialization: • Can tolerate bad initial plane estimates • Residual depth estimation: • Plane sweep algorithm, similar to [Szeliski and Golland, ‘98] • Pixel assignment: • Combine color and residual depth estimates • Currently under investigation Vision for Graphics

Summary • New approach to stereo matching: • represent scene as collection of layers • each layer has a 3-D plane equation, an alpha-matted color image, and an optional residual depth • generalizes layered motion to 3-D • Computation: • plane eqns. by warping mosaics of masked images • residual depth by perturbing planes • iteratively refine color values and pixel assignments Vision for Graphics

Layered Depth Images Jonathan Shade Steven Gortler Li-wei He Richard Szeliski SIGGRAPH’98

How to render a layer + parallax? • Can’t use inverse warping [Laveau 94] Vision for Graphics

3D Sprites with Depth • 3D sprite consists of: • alpha-matted image I1(x1,y1) • 4×4 camera matrix C1[ w1x1 w1y1 w1d1w1]T = C1 [X Y Z 1]T • plane equation AX + BY + CZ + D = 0(forms third row of C1 ) • optional per-pixel depth d1(x1,y1) Vision for Graphics

Sprites with Depth • Store d1(x1,y1) (scaled displacement) along with each sprite image I1(x1,y1) • I1 d1 I1 d1 Vision for Graphics

3D Sprites — Reprojection •  • sprites new view •  • use standard texture mapping (projective warp) Vision for Graphics

 Forward Mapping • Mapping equation with per-pixel depth d1:[ w2x2 w2y2 w2 ]T = H1,2 [ x1 y1 1 ]T +d1 e1,2 •  I1 d1(I2 ) I2 • Problems: gaps and aliasing Vision for Graphics

 Inverse Mapping • Reverse order of images 1 & 2:[ w1x1 w1y1 w1 ]T = H2,1 [ x2 y2 1 ]T +d2 e2,1 I1(I2)d2 I2 • Problem: we don’t know d2! Vision for Graphics

Crude perspective map • How to map d1 d2? • Simple idea: use perspective transform H2,1 I1 d1 d2 I2 • Works well for small amounts of motion Vision for Graphics

Better forward map • How to map d1 d2? • Better idea: use full H1,2x1+d1e1,2 fwd. map  I1 d1 d2 I2 • Works better for moderate amounts of motion Vision for Graphics

2-pass Mapping • Why is 2-pass mapping (d1 d2 forward followed by I1 I2 backward) a good idea? • can tolerate bigger errors in d1 mapping (since d1 is typically smooth) • can store/process d1 at lower resolution • can use better filtering on color image Vision for Graphics

Sprites with Depth — Demo • Demo Vision for Graphics

Refinements • Only forward map d1 with parallax component • Use affine approximation to parallax flow • Better gap filling • Forward map (u,v)flow instead of d1 depth Vision for Graphics

Layered Depth Images (LDIs) • Store multiple (color,z) values at each pixel • Similar to [sparse] volumetric representation • Render with forward warp (splat) Vision for Graphics

Layered Scene Representations

Layered Scene Representations

Presentation Transcript

Representations

Layered Curriculum

LAYERED AUDITS

Layered Curriculum

Layered Ink

Representations

Representations

Layered Curriculum

Representations

Multi-layered wavefunction representations and quadratures:

Layered

Representations

Representations

Layered Curriculum

Layered Architecture

Layered Coding

Layered Architectures

Layered Coding

Layered Curriculum