240 likes | 492 Views
Efficient Prediction Structure for Multi-view Video Coding. Philipp Merkle, Aljoscha Smolic Karsten Müller, Thomas Wiegand CSVT 2007. Outline. Multi-view video coding (MVC) introduction Requirements and test conditions for MVC Prediction structures Experimental results Conclusion.
E N D
Efficient Prediction Structure for Multi-view Video Coding Philipp Merkle, Aljoscha Smolic Karsten Müller, Thomas Wiegand CSVT 2007
Outline • Multi-view video coding (MVC) introduction • Requirements and test conditions for MVC • Prediction structures • Experimental results • Conclusion
MVC Introduction • MVC: Multi-view Video Coding • Multi-view video (MVV): A system that uses multiple camera views of the same scene is called. • Usage: 3DTV, free viewpoint video(FVV), etc.
Requirements for MVC • Temporal random access • View random access • Scalability • Backward compatibility • Quality consistency • Parallel processing
Temporal and inter-view correlation temporal/inter-view mixed mode temporal/inter-view mixed mode Temporal T Inter-view T T
Temporal and inter-view correlation analysis • H.264/AVC encoder was used with the following settings: • Motion compensation block size of 16*16 • Search range of ±32 pixels • Lagrange parameter (λ) of 29.5 • denotes the decrease of the average in comparison to temporal prediction only.
Temporal and inter-view correlation analysis (cont’d) • Simply including temporal and inter-view prediction modes
Lagrangian cost function • Lagrangian cost function: • D denotes distortion. • R denotes number of bits to transmit all components of the motion vector. • For each block in a picture, algorithm chooses MVwithin a search rage that minimizes . • The distortion in the subject macroblock B is calculated by: (1) (2) (3)
Test data and test conditions • 1D camera: Ballroom, Exit, Rena, Race1, Uli, (line) Breakdancers (arched) • 2D camera: Flamenco2 (cross), AkkoKayo (array) • Use 5 to 16 camera views • Target high quality TV-type video (640*480 or 1024*768) then limited channel communication-type video.
Knowledge – hierarchical B picture, QP cascading • Hierarchical B picture, key picture, non-key picture: • QP cascading : [1] key picture key picture [1] “Analysis of hierarchical B pictures and MCTF”, ICME 2006, IEEE International Conference on Multimedia and Expo, Toronto, Ontario, Canada, July 2006
Knowledge – DPB size • Decoded Picture Buffer (DPB) size is increased to:[2] Memory-efficient reordering of multi-view input for compression [2] “Efficient Compression of Multi-view Video Exploiting Inter-view Dependencies Based on H.264/AVC”, ICME 2006, IEEE International Conference on Multimedia and Expo, Toronto, Ontario, Canada, July 2006
Two tasks • To adapt the multi-view prediction schemes to the specific camera arrangements of the test data sets. • To adapt the prediction structures to the random access specification.
Prediction structure • Simulcast coding structure • To allow synchronization and random access, all key pictures are coded in intra mode.
Prediction structure (cont’d) • The first view is called base view (remains the I frame).
Prediction structure (cont’d) • Alternative structures of inter-view for key pictures Linear camera arrangement 2D Camera array KS_IPP KS_PIP KS_IBP KS_IPP KS_PIP KS_IBP
Prediction structure (cont’d) • Inter-view prediction for key and non-key pictures AS_IPP mode
Experimental results – objective evaluation Average coding gains compared with anchor coding Ballroom test result
Experimental results – subjective evaluation • Different bit-rates were selected for the different data sets. Ballroom test result Race1 test result
Experimental results – subjective evaluation • AS_IBP outperforms the anchors significantly. • The gain decreases slightly with higher bit-rates. Average results over all test sequences
Influence of camera density • Using Rena sequence, and consisting of 16 linear arranged cameras with a 5 cm distance between two adjacent cameras • Repeated for each shifted set of 9 adjacent cameras • The structure are applied to every time instance of the MVV sequence without temporal prediction.
Results of experiments on camera density • Coding gain increases with decreasing camera distance and decreasing reconstruction quality.
Results of experiments on camera density (cont’d) • Results of average per camera rate relative to the one camera case(→) • Alarger QP value leads to a larger coding gain
Conclusion • Resulting multi-view prediction: achieving significant coding gains and being highly flexible. • Parallel processing is supported by the presented sequential processing approach. • Problems: • Large disparities between the different views of multi-view video sequences • Illumination and color inconsistencies across views