510 likes | 705 Views
Layered Depth Images. Jonathan Shade University of Washington Steven Gortler Harvard University Li-wei He Stanford University Richard Szeliski Microsoft research Presented by Chung, Shuo-Heng. Introduction.
E N D
Layered Depth Images Jonathan Shade University of Washington Steven Gortler Harvard University Li-wei He Stanford University Richard Szeliski Microsoft research Presented by Chung, Shuo-Heng
Introduction The most familiar Image Base Rendering method is texture mapping, however… 1.Aliasing. e.q. Infinite checkerboard 2.Speed is limited by the surface the texture applied. e.q. Tree with thousands of leafs
Introduction • Two extensions have been presented to address these two difficulties: • 1.Sprite • If the new view is near the original view, the new image can be created by altering the original sprite • cannot produce parallax well • 2.Depth Image • gap will be introduced due to visibility changes when some portion of the scene become unoccluded or a surface is magnified.
Introduction • The paper introduces two extension: • sprite with depth • layered depth image.
Previous Work • Max: • use a representation similar to an LDI. • Focus on high-quality anti-aliasing. • Warping multiple input LDI with different camera. • Mark et al. and Darsa et al: • create triangulated depth maps from input images with per-pixel depth. • Taking advantage of graphics hardware pipelines.
Previous Work • Shade et al. and Shaufler et al.: • Render complex portions of a scene onto a sprites. • Reuse these sprites onto in subsequent frames. • Lengyel and Snyder extended above work: • Apply affine transformation to fit a set of sample points. • Transforms are allowed to change as sample points change. • Horry et al.: • Use a single input and some user supplied information. • Be able to provide approximate 3D cues.
Previous Work McMillan’s ordering algorithm:
Sprites Texture maps or images with alphas rendered onto planar surface. 1.Used directly as drawing primitives. 2.Used to generate new views by warping.
Sprites • Where is pixel x,y from image 1 located in image 2? • Assume image 1 a picture of the x-y coordinate plane at z = 0 • Plane can be placed arbitrarily, but setting z = 0 allows us to ignore z coordinate • C1 = WPV, takes point on plane (x,y) to pixel (x1,y1) x1,y1 x,y cam1 cam2 z = 0 x2,y2
Sprites w = w2 / w1
Sprites with Depth epipole
Sprites with Depth Forward map has a problem. When the view suddenly changes, gap may be introduced.
Sprites with Depth In order to deal with the problem, following steps are used: 1.Forward map the displacement map d1(x1, y1) to get d3(x3, y3) => d3(x3, y3) = d1(x1, y1) (x1, y1) (x3, y3) H1,2 is obtained by dropping the third row and column of C2C1 e1,2 is the third column of H1,2 -1
Sprites with Depth 2.Backward map d3(x3, y3) to obtain d2(x2, y2) d2(x2, y2) = d3(x3, y3)
Sprites with Depth 3.Backward map the original sprite color. Assign the color in input image (x1, y1) to output image (x2, y2)
Sprites with Depth • These steps (first forward mapping the displacement and then backward mapping with the new displacements) have following advantages: • Small errors in displacement map warping will not be as evident as errors in the sprite image warping • We can design the forward warping step to have a simpler form by factoring out the planar perspective warp.
Sprites with Depth We can rewrite the equation:
Sprites with Depth Another faster but less accurate variance: In the first step, set u3(x3, y3) = x1 – x3 and v3(x3, y3) = y1 – y3 In the third step, we use instead of
Sprites with Depth Warped with both three steps Warped with homography and crude parallax (d1) Warped with homography and true parallax (d2) Warped by homography only (no parallax) Input color (sprite) image Input depth map d1(x1, y1) Pure parallax warped depth map d3(x3, y3) Forward warped depth map d2(x2, y2) Forward warped depth map without parallax correction Sprite with “pyramid” depth map
Recovering sprite from image sequences • We can use computer vision technique to extract sprite from image sequences.. • Segment the sequence into coherently moving regions with a layered motion estimation algorithm. • Compute a parametric motion estimate (planar perspective transformation) for each layer. • Determine the plane equation associated with each region by tracking feature points from frame to frame.
Recovering sprite from image sequences Initial segmentation into six layers The third of five images Recovered depth map
Recovering sprite from image sequences The five layer sprites Residual depth image for fifth layer
Recovering sprite from image sequences Re-synthesized third image Novel view without residual depth Novel view with residual depth
Layered Depth Image Layered depth image can handle more general disocclusions and large amount of parallax as the viewpoint moves. There are three ways presented in the paper to construct an LDI. 1. LDIs from multiple depth image. 2. LDIs from a modofied ray tracer. 3. LDIs from real images.
Layered Depth Image The structure of an LDI: DepthPixel = ColorRGBA: 32 bit integer Z: 20 bit integer SplatIndex: 11 bit integer LayeredDepthPixel = NumLayers: integer Layers[0..numlayers-1]: array of DepthPixel LayeredDepthImage = Camera: camera Pixels[0..xres-1,0..yres-1]: array of LayeredDepthPixel
LDI from Multiple Depth Images LDI can be constructed by warping n depth image into a common camera view.
LDI from a Modified Ray Tracer Ray can be cast from any point on cue face A to any point on frustum face B. B A
LDI from a Modified Ray Tracer • “Uniformly” Sample the scene. • Any object intersection hit by the ray is reprojected into the LDI. • If the new sample is within a tolerance in depth of an exsiting depth pixel, the new sample color is averaged with the existing depth pixel. Otherwise a new depth pixel is created.
LDI from Real Images Use voxel coloring algorithm to obtain the LDI directly from input images.
LDI from Real Images This is a dinosaur model reconstructed from 21 photographs.
Space Efficient Representation • It is important to maintain the spatial locality of depth pixels to exploit the cache in CPU. • Reorganize the depth pixel into a linear array ordered from bottom to top and left to right in screen space, and back to front along the ray. • A accumulated depth array is maintained in each scanline and the depth can be retrieved with the layer number as index.
Incremental Warping Computation Recall that:
Incremental Warping Computation We can simply increment the start to get the warped position of the next layered depth pixel along a scanline.
Splat Size Computation To splat the LDI into the output image, we roughly approximate the projected area of the warped pixel. res1 = 1/(w1h1) res2 = 1/(w2h2) LDI camera Output camera
Splat Size Computation The square root size can be approximated more efficiently.
Splat Size Computation The square root size can be further approximated by a lookup table. 5 bits for d1 6 bits for normal (3 bits for nx, 3bits for ny) Total 11 bits => 2048 possible values.
Splat Size Computation The paper uses four splat sizes: 1x1, 3x3, 5x5 and 7x7. Each pixel in a footprint has an alpha value to approximate a Gaussian splat kernel. These alpha value are rounded to 1, ½, or ¼, so the alpha blending can be done with integer shift and add.
Depth Pixel Representation • To fit four depth pixels into a single cache line (32 bytes in Pentiumm Pro and Pentium II) • Convert the floating point Z value into a 20 bit integer. • Splat table index size is 11 bits. • R, G, B, and alpha values fill out the other 4 bytes. • This yields a 25 percent improvement in rendering speed.
Clipping • Split the LDI frustum into two segments, a near and a far one. • Near and far segment are clipped individually. • The near frustum is kept smaller then the far segment. • Intersect the view frustum with the frustum of the LDI. • Only visible pixels are rendered. • Far segment first, then near segment. • Speed rendering time by a factor of 2 to 4.
Future Work • To explore representations and rendering algorithms combining several IBR techniques. • Automatic techniques for taking a 3D scene and re-represented it in the most appropriate fashion for image base rendering.