1 / 63

Construction of a scalable and evolving 3D model for video coding

27 / 05 /2005. Construction of a scalable and evolving 3D model for video coding. Raphaèle Balter. Context. CIFRE convention between : Team TEMICS of IRISA/INRIA-Rennes Advisor: Luce Morin Background in 3 modeling for video coding. Lab TECH/IRIS of France Telecom R&D

felix-noble
Download Presentation

Construction of a scalable and evolving 3D model for video coding

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 27 / 05 /2005 Construction of a scalable and evolving 3D model for video coding Raphaèle Balter

  2. Context • CIFRE convention between: • Team TEMICS of IRISA/INRIA-Rennes • Advisor: Luce Morin • Background in 3 modeling for video coding. • Lab TECH/IRIS of France Telecom R&D • Advisor: Patrick Gioia • Background in distant visualization and scalable representation of synthetic objects. • complementary background related to the subject

  3. Context:Evolution of the numerical video world Transmission coding/decoding • New sources/terminals (dv, more powerful computer or terminals, IP...) • Various networks (internet, telephone networks RTC, GSM…) • New functionnalities (interactivity, 3D games, DVD, broadcasting...) • New orientations in video coding: • Compact transmission with addition of functionalities • Content adapted coding

  4. Context:Representations from images • + Photorealistic rendering • Dataset volume • Acquisition system • + Compact representation • + acquisition system • rendering Less geometry More geometry Rendering with no geometry Rendering with implicit geometry Rendering with explicit geometry Video Metric 3D model Lumigraph LDIs Lightfield View morphing Mosaics Texture mapped 3D model View interpolation

  5. Context: 3D model based coding Capture Digitalization Display Analysis Real world Video camera Original sequence 3D model Reconstructed sequence • Principle: • Envisioned applications: • Virtual reality: help for geo-positioning, virtual visit… • Augmented reality: impact simulation, videoconference… • Video coding: very low bitrate coding for distant applications

  6. Context: 3D model based representations • Goal: • 3D extraction • Camera parameters computation • Assisted modeling: • Human intervention [debevec96][debevec00] • Specific Acquisition system • Turn table [niem94][debevec96][gibson98] • Robot [mellor00] [zisserman01] • Knowledge on the scene contents • Faces [preteux00][girod02] • Architectural scenes [faugeras95][hartley00][dick00][bazin01][werner02][sturm02]

  7. Context:3D model based representations Original sequence I0 I5 In 3D model Original sequence I0 I3 I8 I20 In 3D models M0 M1 • Non assisted modeling: • Limits: only for static scene without reflections nor pure rotation camera motion • Single model [fitzgibbon99] [roning99][pollefeys00][yao02] [Nis03][yu04] • 3D model stream [galpin02]

  8. Objectives Services Providers • 3D representation suited for coding • Envisioned applications • Video coding for distant real-time visualization on heterogeneous terminals • Constraints: • Non assisted modeling • No assumptions on camera parameters nor on scene content • No assumptions on video length • Scalability

  9. Problems • Representations: • Single realistic model • Realistic consistent representation • Incompatible with video coding constraint on video length. • 3D model stream • No assumption on video length • Adapted to the streaming • Inconsistency of the representation • Transitions between models • solution = tradeoff between a single model and a stream

  10. Problems Video bitstream Refinement layers Base stream • Scalability: • To represent a signal with several levels of information • Allowing adaptation of a signal to • the capabilities of the networks • losses transmission • the terminals capabilities Terminal computational and rendering capabilities Network bandwidth losses • Need for multi-resolution representation

  11. Proposed scheme: ! Evolving structure ! Automatic algorithm Evolving model Construction (morphing) Hierarchical Coding A priori morphing A posteriori morphing Wavelet analysis 3D Reconstruction [galpin02] Compression Evolving model Construction (morphing) Hierarchical Coding video bitstream

  12. Overview 3D Reconstruction [galpin02] Evolving model construction Coding Compression video bitstream • 3D information extraction • Evolving model construction • Evolving model coding • Evolving model compression • Conclusions/Perspectives

  13. Overview • 3D information extraction • Evolving model construction • Evolving model coding • Evolving model compression • Conclusions/Perspectives 3D Reconstruction [galpin02] Evolving model construction Coding Compression video bitstream

  14. 3D extraction: principle of Galpin algorithm M(X,Y,Z) keyframe keyframe GOF 1 GOF 2 3D models m2 m1 Texture images Original sequence I0 I3 I8 I20 In Camera positions C2 3D models Reconstructed sequence C1 M0 M1 • Model valid for a portion of the original sequence: a GOF (Group of Frames) • GOF delimitated by keyframes used as texture images • Keyframe selection based on several criteria: • Global motion • 3D validity : epipolar residual • Ratio of the outgoing points • 3D model stream: • Classical structure from motion algorithm [faugeras93] [horaud93][hartley-zisserman2004]

  15. 3D extraction: global scheme [galpin] Dense motion field keyframes Images Motion estimation of pixels [marquant00] 3D mesh computation from a triangulation of the keyframe associated to the GOF Keyframes selection Estimation of depth image [huang84] Interest points motion Extraction and tracking of interest points [harris88] Save of keyframes Mesh and depth Estimation of camera poses intra-GOF [dementhon95] Images Textured 3D model Estimation of textured 3D model Camera positions Coder Textured 3D models

  16. 3D extraction:limits of Galpin 3D model stream Geometric jump • Stream of independant 3D models: • Uniform regular meshes • Different fields of view • Abrupt transitions between models

  17. 3D extraction: limits of Galpin 3D model stream Texture jump Texture image k Texture image k+1

  18. 3D extraction: limits of Galpin 3D model stream Connectivity jump

  19. 3D extraction:limits of Galpin 3D model stream • Independant 3D model stream: • Abrupt transitions • Fading post-treatments may introduce ghost effects • Non scalable geometry

  20. Overview • 3D information extraction • Evolving model construction • Evolving model coding • Evolving model compression • Conclusions/Perspectives 3D Reconstruction [galpin02] Evolving model construction Coding Compression video bitstream

  21. Construction:a posteriori morphing • Evolving model: • Tradeoff between a single model and a 3D model stream • Model stream with 3D morphing to link models together • Morphing [hong88][parent92][lazarus98][alexa02] • Two-steps process: • Vertex mapping • Interpolation between corresponding vertices • Efficient methods are semi-automatic [bethel89] [kent92][delingette93] [decarlo96] [lee99][zockler00] [kanai00][michikawa01] => not compatible with our scheme • Non detailed contributions: • A posteriori meshed depth maps morphing [balter03] • A posteriori 3D model morphing [leguen04]

  22. Construction:a priori morphing • Principle of the new encoding scheme: • No more uniform grid • Corresponding vertices: vertices of successive models aresame physical 3D points of the scene • Implicit morphing based on those corresponding vertices = simple linear interpolation with

  23. Construction:inputs 3D Extraction [galpin02] Dense motion field Depth maps Images Texture Images Camera positions

  24. Construction:proposed algorithm 1 2 3 • Fixed connectivity and time evolving geometry • Initialisation with a uniform regular mesh covering the whole image surface • Tracking and update of vertices still visible from the next point of view to get the corresponding mesh • Integration of the new parts appearing in the next model to get the new-vertices mesh (NVM) • Merge • Reinitialisation of the model for long sequences to avoid drifts => GGOF (group of GOFs)

  25. Construction:additional constraints • Merge of CMn and NVMn: • Not call into question the existant connectivity • Not create a non manifold mesh • Vertices must be valid: • Validity map

  26. Construction:constrained merge Caption CMn envelope CMn face CMn vertex NVMn face Superimposed NVMn face NVMn vertices CMn mask • Manifold merge: • New vertices triangulated under the CMn envelope constraint • CMn envelope vertices are included in the delaunay triangulation • Faces overlapped CMn mask

  27. Construction:constrained merge Caption CMn envelope CMn face CMn vertex NVMn face Superimposed NVMn face NVMn vertices CMn mask • Proposed solution for 2-manifold merge • Elimination of all the faces containing only vertices of the CMn envelope • Recovery of the faces eliminated that do not overlap with CMn mask • Convex areas of the enveloppe • Detection of holes in the mesh (Euler formula : S-A+F = 2(1-g) )

  28. Construction:matching information Cn+1 Cn • How to transmit the matching information? • No additional information to transmit • Known at the encoding stage with the motion field • Retrieved at the decoding stage by: • reprojecting the model on the following point of view • identifying of vertices having the same 2D coordinates.

  29. Construction:validity map Cn+1 Cn • Uncertainty on the motion and on decoded models due to the errors in 3D estimation • Vertices are chosen among valid points • Reinitialisation : threshold on the ratio of valid points • Validity map: to ensure matching consistency

  30. Construction:results • Stair sequence: lateral translation • Green: current mesh • Yellow: next mesh • Red: morphing source (subset of the current mesh) • Blue: morphing target (subset of the next mesh)

  31. Construction:results • Stair sequence: virtual navigation • Tradeoff between single model and model stream • => evolving model = consistent 3D model stream

  32. Overview • 3D information extraction • Evolving model construction • Evolving model coding • Evolving model compression • Conclusions/Perspectives 3D Reconstruction [galpin02] Evolving model construction Coding Compression video bitstream

  33. Coding: wavelet analysis • Surfaces case • Goal: scalable multi-resolution representation • Classical efficient signal processing tool: wavelets [mallat89][derose96] • Interest: • hierarchical representation of a signal => provides multiresolution • good compression • Principle: • low frequencies representation refined by well located high frequencies (details) • Successive filterings • Example: image case [jpeg00]

  34. Coding: 2nd generation wavelet analysis Surface C B Base Mesh C A • 2nd generation wavelets [loop87][dyn90] [schröder95][lounsbery97] [sweldens98]: • For non regular surfaces • Coarse base mesh + refinements (wavelet coefficients)

  35. Coding: 2nd generation wavelets analysis 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 • envisionned applications: real-time reconstruction in a adaptive way • need of a fast algorithm => tradeoff compression/speed 0 0 • Filters: • Generated by "lifting scheme" • Can have various sizes according to the properties wanted for wavelets • Compression requires a minimal size filter • Examples of reconstruction high pass filters Midpoint lifted Butterfly Lifted Midpoint non lifted

  36. Coding: Independent analysis • not satisfying • Independent analysis: • provides independent 3D model stream at each resolution

  37. Coding: proposed representation • Proposed representation: • Decompositions based on the same support • Transformation of each dense depth map into consistent hierarchical triangular meshes • Support dissociated of geometry : the single connectivity mesh (SCM)

  38. Coding:base meshes construction • Base mesh = coarse mesh • Evolving model construction • Large faces: need for accurate represent the scene despite the face sizes => content based vertices ≠ regular vertices • Time evolution: increased size and stretched faces management

  39. Coding:base meshes construction Validity Map computation Delaunay triangulation Harris corner detector Init Canny edge detector Update Canny edge detector • New evolving model generation

  40. Coding: wavelet decomposition Mni+1 Mni pd Mmi Mni+1 Mmi Mni Camera and associated view lines • Decomposition scheme: Base model MBn Dense model MDn Dense model MDn Hierarchical 3D meshes construction Canonical facets quadri-section to define scale and wavelets spaces Information computation Depth difference p Wavelet coefficients computation Filtering

  41. Coding: Consistent wavelet decomposition • Single Connectivity Mesh (SCM) • Common connectivity decomposition support: • sufficient since wavelet coefficients are added on edges by face quadrisection • Purpose: • To gather connectivity information • Easy to construct thanks to evolving model structure with consistent connectivity • correspondances/implicit morphing at each level

  42. Coding: Consistent wavelet decomposition Unique global index , ,  global face index k (max resolution) • New global indexing system for vertices and faces • Global index: constant for one physical point in all meshes • Allows to quickly identify corresponding vertices • Index= appearing order in the list of base mesh vertices and faces • Computed with barycentric coordinates for other resolution 10 11 10 11 12 7 8 11 11 9 12 12 16 13 8 7 10 7 9 9 1 6 7 8 8 15 6 8 9 2 9 7 7 1 2 5 6 6 14 3 13 14 4 6 6 5 3 5 4 5 15 1 : face global indices 1: vertex global indices

  43. Coding: Results • Street sequence: travelling: • Green: current mesh • Yellow: next mesh • Red: morphing source (subset of the current mesh) • Blue: morphing target (subset of the next mesh) Base mesh Base mesh + refinments

  44. Overview • 3D information extraction • Evolving model construction • Evolving model coding • Evolving model compression • Conclusions/Perspectives 3D Reconstruction [galpin02] Evolving model construction Coding Compression video bitstream

  45. Compression: Media interrelations (1) (2) 3D 3D encoder Mesh geometry & connectivity (1) bitstream 2D Mux 2D encoder EBCOT Texture 1D 1D encoder Camera positions • Redundancies => Exploiting interrelations between medias (1) Texture image prediction using previous texture image + 3D model + camera position (2) Texture coordinates of vertices retrieved using camera position => by reprojection of the model on camera position => 3 coordinates instead of 5 or + x,y,z u,v or x,y,z u,v,p

  46. Compression: camera position compression Intermediate positions: Key positions: • Camera positions compression [galpin02] • Intra • All camera positions are encoded in intra mode • Inter/ Predictive scheme • The first camera is encoded in intra mode • Key cameras are encoded incrementaly compared to the previous key position • Other cameras are encoded incrementaly compared to linear prediction

  47. Compression: geometry compression • Geometry compression: • Base mesh: Topological Surgery (TS) encoder also known as MPEG4-3DMC for (XYZ) encoding • Wavelets: adaptation [koda00] of the SPIHT algorithm (Set Partitioning In Hierarchical Tree) [said96]: • bitplane scalability is added to spatial and temporal scalability • Based on clever partitioning of coefficient hierarchy • Hierarchy not obtained by face subdivision but trough edge based hierarchy

  48. Compression: texture compression Previous texture image Predicted image Next texure image • Predictive scheme [galpin02] texturing projection

  49. Compression: texture compression Network transmission Predicted image Debased ^ Debased Reconstructed image Difference image Difference image + - + + + Original image T’  + n+1  E’ n+1 Predicted image Predicted image ^ Network transmission  ^ Debased Difference image Reconstructed image Difference image + - + + + Original image +  T T T T T E E T T T T’ E E T ¨T’ n+1 n+2 n+1 n+2 n+2 n+2 n+1 n+2 n+1 n+2 n+2 n+2 n+2 n+1 n+1 Debased

  50. Compression: texture compression difference Higher level layers difference Base layer prediction difference 0 0 2 1 K K K K K K K K K K K K K K 2 2 1 3 2 1 2 • Base layers • ensures no error on predicted image

More Related