1 / 34

Image Compression and Graphics: More Than a Sum of Parts?

Explore the potential of 3D geometry models in image compression and learn about view-dependent texture maps, model-based compression of talking head sequences, and the incorporation of synthetic video into motion-compensated hybrid coding.

juliemurphy
Download Presentation

Image Compression and Graphics: More Than a Sum of Parts?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Image Compression and Graphics:More Than a Sum of Parts? Bernd Girod Collaborators:Peter Eisert, Marcus Magnor, Prashant Ramanathan, Eckehard Steinbach (all Stanford), Thomas Wiegand (HHI) Image, Video, and Multimedia Systems Group Information Systems Laboratory Stanford University

  2. 2048 triangles Can 3-D Geometry Help to Compress Images? Conjecture: 3-d geometry models help compression, if a single 3-D model captures the dependencies between many views (or frames of a sequence).

  3. Outline of this Talk • Compression of many simultaneous views (e.g. light-fields) • Encoding view-dependent texture maps with 4-d wavelets • Hierarchical image-domain light-field coder • Why image-domain encoding is (usually) superior to texture-map encoding • Model-based compression of talking head sequences • Modeling and estimation of facial expressions • Avatars • Incorporate synthetic video into motion-compensated hybrid coding

  4. Multi-View Image Capture • Coding schemes suitable for • 2-plane parametrization • Hemispherical image arrangement • (arbitrary recording positions)

  5. Camera views: No correlation between corresponding pixels Align Views by Mapping onto Object Surface View- dependent texture map: Strong correlation between corresponding texels

  6. 3-D Reconstruction from Many Views Volumetric Reconstruction • processes all views simultaneously • exploits texture and silhouette information • yields solid 3-D voxel model • Subdivide object’s bounding box into voxels • Generation of multiple hypotheses for each voxel • Hypothesis elimination by projecting visible voxels into light-field images • Iterate over all voxels until remaining hypotheses are “photo-consistent”

  7. voxel model 128 triangles 512 triangles 2048 triangles 8192 triangles Surface Representation • Initial octahedral geometry • Geometry refinement • determinevertex normals • move vertices to model surface • subdivide triangles

  8. Warp of each image into a view-dependent texture map • Texture map correlated in 4-d • Interpolate missing texels • 4D Haar Wavelet Transform Texture Map Encoding with 4-d Wavelets • Arrange images into 2-d array • Embedded encoding of wavelet coefficients (4D-SPIHT)

  9. Results: Wavelet Texture Map Encoder Reconstruction quality in luminance PSNR (dB)

  10. Results: Wavelet Texture Map Encoder

  11. 28.6 dB 0.0076 bpp 26.3 Kbytes 36.6 dB 0.213 bpp 736 Kbytes Progressive Decoding

  12. Given: geometry model, reference images • Render geometry for reference images and prediction image • For each pixel: determine triangle, coordinates ? • Find corresponding pixels in reference images • Copy & average visible pixels Align Views by Model-aided Prediction

  13. Hierarchical Image Coding Order • project camera positions on hemisphere • subdivide into 4 quadrants • INTRA-encode corner images • encode center image • image prediction • residual error coding • encode mid-side images • subdivide into sub-quadrants • encode center and mid-side images • subdivide repeatedly

  14. Residual-Error DCT Coder Residual-Error Decoder Image Buffer Compressed Geometry Model Geometry Decoder 3-D Geometry Reconstruction Geometry Coder Model-aided Image-Domain Light-Field Coder Light-Field Image I[u,v] DCT Coefficients - Multiframe Disparity Compensation Disparity Map Generation

  15. Picture Quality original Mouse light field 257 RGB images, 384x288 pixels 81.3 Mbytes compressed 300:1 0.077 bpp (267 KBytes) 37.9 dB PSNR

  16. Texture Model-aided Model-aided vs. Texture Coding

  17. 40 % 70 % Natural vs. Synthetic Image Set

  18. 9 dB 7 dB 2 dB 2 dB Inaccurate Geometry

  19. Model-based videophone

  20. Modeling of Facial Expressions • Head geometry composed of 101 triangular B-spline patches • Facial expressions by superposition of 66 FAPs (Facial Animation Parameters) according to MPEG-4 standard • FAPs act on control points of triangular B-spline patches

  21. Estimation of Facial Expressions Displacement field constrained by FAPs Linearize for small FAPs Optical flow constraint equation • Solve overdetermined system by linear regression • Apply iteratively in analysis-synthesis loop • Incorporate spatial resolution pyramid

  22. Results: Peter Original Synthesized Sequence: Peter, 230 frames, CIF resolution, 25 fps Compressed 25,000:1 1.2 kbps - 32.8 dB PSNR

  23. Results: Eckehard Original Synthesized Sequence: Eckehard CIF resolution, 25 fps 1.1 kbps, 32.6 dB PSNR

  24. Results: Peter as Eckehard Original Synthesized Sequence: Peter, 230 frames, CIF resolution, 25 fps

  25. Results: Eckehard as Peter Original Synthesized Sequence: Eckehard CIF resolution, 25 fps

  26. Results: Peter as Akiyo Original Synthesized Sequence: Peter, 230 frames, CIF resolution, 25 fps

  27. . . . But, What About Unknown Objects? Original Synthesized Sequence: Clap 1.2 kbps

  28. Model- based Coder Model-based Decoder FAPs Model-Aided Coding:Incorporating Synthetic Video into MC Hybrid Coding Coder Control Control data (incl. motion vectors) Input Video Intraframe DCT Coder DCT coefficients e - Intraframe Decoder Multiframe Motion Compensation Decoder

  29. R-D-Optimal Mode Decision Selection Mask minimizing D+lR Predicted frame Previous decoded frame Synthesized frame

  30. Results: Peter H.263 (TMN-10) @ 12 kbps Model-Aided Coder @ 12 kbps Sequence: Clap, 8.33 fps, CIF resolution

  31. Results: Akiyo H.263 (TMN-10) @ 10 kbps Model-Aided Coder @ 10 kbps Sequence: Akiyo, 10 fps, CIF resolution

  32. ~ 35% ~ 40% R-D Performance of Model-Aided Coder Sequence: Peter Sequence: Akiyo

  33. Conclusion Can 3-d geometry help to compress images? YES . . . . . . IF many views of the same 3-D object/scene shall be compressed. • Applications in • Multiview image coding (light-field compression) • Compression of video sequences • Very high compression ratios (100:1 . . . 25,000:1) • Require accurate vision algorithms for 3-d reconstruction • Image-domain compression more resilient against inaccurate geometry and hence more practical than texture-map encoding

  34. . . . THE END

More Related