480 likes | 601 Views
Pré-analyse de la vidéo pour un codage adapté Application au codage de la TVHD en flux H.264. Olivier Brouard. 20 juillet 2010 Encadrants : Dominique Barba et Vincent Ricordel. École Doctorale Sciences et Technologie de l’Information et Mathématiques (EDSTIM)
E N D
Pré-analyse de la vidéo pour un codage adapté Application au codage de la TVHD en flux H.264 Olivier Brouard 20 juillet 2010 Encadrants : Dominique Barba et Vincent Ricordel École Doctorale Sciences et Technologie de l’Information et Mathématiques (EDSTIM) Spécialité : Automatique, Robotique, Traitement du Signal et Informatique Appliquée
Pre-analysis of video for its advanced coding Application to the HDTV coding in H.264 streams Olivier Brouard July 20th 2010 Supervisors : Dominique Barba and Vincent Ricordel École Doctorale Sciences et Technologie de l’Information et Mathématiques (EDSTIM) Spécialité : Automatique, Robotique, Traitement du Signal et Informatique Appliquée
Introduction Motivations • Emergence of the HDTV • New displays • From SDTV to HDTV • SDTV: 720x576 pixels • HDTV: 1920x1080 pixels • from 4% to 20% of the visual field • better immersion for the users • more pixels (5x) • Need for a new video coding standard • H.264 (or MPEG-4 AVC) 20 October, 2014 Olivier Brouard Slide 3/47
Introduction H.264 Reference frames • Advanced video coder (dissymetrical coding) + prediction modes richness + advanced entropy coding • higher bit rate reduction (up to 50% MPEG-2) • But • short term decisions, « low level » signal based • no coding consistency 20 October, 2014 Olivier Brouard Slide 4/47
Introduction Human as the final observer Needs • Control the perceptual quality • Ensure the coding temporal coherence of the objects • the rendering of an object has to be consistent temporally • avoid the perceptible distortions • blocking effects • flickering effects 20 October, 2014 Olivier Brouard Slide 5/47
Introduction Objectives & proposals • How to do ? • medium/long term decisions • « high level » considerations • no such tools within the current encoders • Solution • realize a video pre-analysis before the encoding step • guide the encoder in its decisions 20 October, 2014 Olivier Brouard Slide 6/47
Outline • Video pre-analysis • Video pre-analysis 1.1 Advanced motion estimation 1.2 Spatio-temporal segmentation 1.3 Visual attention modeling • Applications: H.264 video coding 2.1 GOP structure adaptation 2.2 Adaptive quantization 20 October, 2014 Olivier Brouard Slide 7/47
1- Video pre-analysis Video pre-analysis • Based on HVS properties • « high level » information to the encoder • The Human Visual System (HVS) • Luminance perception • Color perception • Contrast sensibility • Masking effects • Visual Attention • Bottom-Up guided by the saliency • Top-Down guided by the tasks 20 October, 2014 Olivier Brouard Slide 8/47
1- Video pre-analysis Visual attention • Attributes guiding the deployment of visual attention [Wolfe 04] • Contrast, Motion, Color, Orientation, … • Visual attention modeling [Itti 01; Le Meur 07; Marat 10] based on the Koch and Ullman model [Koch 85] • Perceptually important regions most salient objects (physically and semantically) • Shapes of regions (saliency maps) shape of objects [Milanese 1993] • moving objects attract our visual attention 20 October, 2014 Olivier Brouard Slide 9/47
1- Video pre-analysis Video pre-analysis 20 October, 2014 Olivier Brouard Slide 10/47
Assumption • uniform motion • spatio-temporal tube • coherence of the motion along a perceptually significant duration • motion vectors field more homogeneous 1- Video pre-analysis – Advanced motion estimation Spatio-temporal tube (1) • Visualfixing time in the HVS ~ 200 ms • Next generation of HDTV • 1920x1080 in progressive mode at 50Hz • temporal segment of 9 frames: 180ms [Péchard 2007] 20 October, 2014 Olivier Brouard Slide 11/47
The spatio-temporal tubeminimizes => MSEG with k = -4, -2, +2, +4 • MSEk based on the 3 YUV components 1- Video pre-analysis – Advanced motion estimation Spatio-temporal tube (2) • Implementation • spatial down-sampling • temporal down-sampling - central frame current frame - 4 reference frames 20 October, 2014 Olivier Brouard Slide 12/47
Apparent motions due to • moving objects • camera motion • Motion segmentation • based on the residual motion • Affine model a1, a2, a3, a4: deformation parameters tx, ty: translation parameters Vx, Vy: horizontal and vertical components of each MV (spatio-temporal tube) 1- Video pre-analysis – Spatio-temporal segmentation Global motion 20 October, 2014 Olivier Brouard Slide 13/47
Global motion estimation in 2 steps: 1. For each MV (tube) calculation of the derivatives • accumulation of the parameters assumptions • localization of the main peak 2.Accumulation of the residual MVs (tubes) 2-D histogram (tx, ty) 1- Video pre-analysis – Spatio-temporal segmentation Global motion parameters estimation • Motion vectors fields parameters estimation [Coudray 2005] 20 October, 2014 Olivier Brouard Slide 14/47
Iterative approach • Initialisation detection of the main peak greedy approach (local gradient) 2. Detection of the other peaks greedy approach Accumulation histogram Main peak Secondary peak Segmented space 1- Video pre-analysis – Spatio-temporal segmentation Motion segmentation • 2-D Histogram of the translation parameters • residual MVs (tx, ty) • Each histogram peak => a moving object • analysis of all the peaks 20 October, 2014 Olivier Brouard Slide 15/47
1- Video pre-analysis – Spatio-temporal segmentation Motion segmentation – results • need of a spatial and temporal regularization 20 October, 2014 Olivier Brouard Slide 16/47
1- Video pre-analysis Video pre-analysis 20 October, 2014 Olivier Brouard Slide 17/47
1- Video pre-analysis – Spatio-temporal segmentation Spatio-temporal regularization • Motion-based segmentation some blocks are misclassified • more criteria to improve the segmentation • connexity • color • texture • motion • Markovian approach 20 October, 2014 Olivier Brouard Slide 18/47
Markovian property • U(o, e): sum of potential functions defined on cliques • site spatio-temporal tube 1- Video pre-analysis – Spatio-temporal segmentation Markovian approach • The Hammersley-Clifford theorem [Besag 1974] • Gibbs distribution Markov Random Field • the optimal label configuration minimize a global energy function E: label field O: observation field 20 October, 2014 Olivier Brouard Slide 19/47
1- Video pre-analysis – Spatio-temporal segmentation Spatial regularization • Spatial connexity • Segmented region • locally homogeneous • Color features • color distributions • Bhattacharrya coefficient discrete densities • Texture features • texture distributions 2 spatial gradients (Sobel filters) • Bhattacharrya coefficient 20 October, 2014 Olivier Brouard Slide 20/47
1- Video pre-analysis – Spatio-temporal segmentation Temporal regularization • Motion features • distance between the MVs • Temporal connexity • Segmented region => temporally homogeneous • segmentation map of the previous temporal segment • Regions tracking • criteria - color, texture, recovery video objects tracking 20 October, 2014 Olivier Brouard Slide 21/47
1- Video pre-analysis – Spatio-temporal segmentation Energy minimization • The global energy function - potential functions - weigthing factors • Sequential sites processing • stack of instability 20 October, 2014 Olivier Brouard Slide 22/47
1- Video pre-analysis – Spatio-temporal segmentation Results motion segmentation only regularized spatio-temporal segmentation 20 October, 2014 Olivier Brouard Slide 23/47
1- Video pre-analysis Video pre-analysis 20 October, 2014 Olivier Brouard Slide 24/47
1- Video pre-analysis – Visual attention modeling Spatial saliency • Spatial saliency based on the color contrast [Aziz 2008] • color transformation: YUV to HSV • color features influencing the visual attention 1- Saturation Contrast 2- Intensity Contrast 3- Hue Contrast 4- Opponents Contrast 5- Warm andColdcolors Contrast 6- Dominance of the warm colors 7- Dominance of the luminance and saturation Spatial saliency: SSP => combination of these 7 features 20 October, 2014 Olivier Brouard Slide 25/47
1- Video pre-analysis – Visual attention modeling Temporal saliency • Temporal saliency based on the relative motion : MV of the site s : dominant motion : relative motion of s => • maximum velocity of smooth pursuit of the eye [Daly 1998]: => 80°/s => temporal saliency ST 20 October, 2014 Olivier Brouard Slide 26/47
1- Video pre-analysis – Visual attention modeling Spatio-temporal saliency • Fusion of the spatial saliency and temporal saliency maps • Observers => focus on the center of the screen [Le Meur 2005] • weighting by a 2-D gaussian function 20 October, 2014 Olivier Brouard Slide 27/47
1- Video pre-analysis – Visual attention modeling Results 20 October, 2014 Olivier Brouard Slide 28/47
1- Video pre-analysis Possible applications • Video pre-analysis • information • moving objects segmentation, objects tracking • color, texture • salient regions • applications • advanced video coding • video transmission with priority (saliency maps) • video summarization, indexation • … • ArchiPEG (ANR Project) • HD MPEG-4 AVC real-time compression • pre-analysis video resource 20 October, 2014 Olivier Brouard Slide 29/47
Outline • Video pre-analysis 1.1 Advanced motion estimation 1.2 Spatio-temporal segmentation 1.3 Visual attention modeling • Applications: H.264 video coding 2.1 GOP structure adaptation 2.2 Adaptive quantization • Applications: H.264 video coding 20 October, 2014 Olivier Brouard Slide 30/47
2- Applications: H.264 video coding – GOP structure adaptation GOP structure • Three kinds of frames: I, P, B • GOP begins by a I frame intra coded • P frames at regular intervals predicted • B frames between P frames bi-predicted • Fixed interval between I frames • not adapted to changing scenes and temporal variations of the video => more bits • dynamic GOP size irregular I-frames insertion • Typically: number of B frames = 1 or 2 good trade-off between bitrate and quality • low motion or panning of the camera • increase the number of B-frames 20 October, 2014 Olivier Brouard Slide 31/47
2- Applications: H.264 video coding – GOP structure adaptation B frames adaptation (1) • Analysis of the video sequences • x264 encoder • different fixed number of B frames: 0, 1, 2, 3 • optimal number of B frames => content dependent • classify videos according to their content 20 October, 2014 Olivier Brouard Slide 32/47
2- Applications: H.264 video coding – GOP structure adaptation B frames adaptation (2) • Spatio-temporal characterization -> 2 indices to evaluate the spatio-temporal activity - IT: temporal activity => MVs - IS: spatial activity => MSEG For each temporal segment For the entire sequence 20 October, 2014 Olivier Brouard Slide 33/47
2- Applications: H.264 video coding – GOP structure adaptation B frames adaptation (3) • Classification space function of IT and IS • classe Ci => i B frames between P-P or I-P frames • IT constant between P-P or I-P frames • same rule for IS 20 October, 2014 Olivier Brouard Slide 34/47
2- Applications: H.264 video coding – GOP structure adaptation GOP size adaptation (1) • Changes detection within a video shot • high motion • significant changes • reduce the interval • low motion • little variation • increase the interval • mid-range motion • classical approach => fixed GOP size • 2 thresholds to detect critical changes - sh => high motion - sb => low motion 20 October, 2014 Olivier Brouard Slide 35/47
2- Applications: H.264 video coding – GOP structure adaptation GOP size adaptation (2) • Analysis of IT evolution 3 cases Mid-range motion High motion Low motion 20 October, 2014 Olivier Brouard Slide 36/47
2- Applications: H.264 video coding – GOP structure adaptation Performances • 8 video sequences • 4 different bitrates defined by an experts group • Comparison between • x264 encoder: GOP size = 25, 2 B frames • a modified version => GOP structure adaptation 20 October, 2014 Olivier Brouard Slide 37/47
2- Applications: H.264 video coding – GOP structure adaptation Results • Rate – Distortion (PSNR) [Bjontegaard 2001] 20 October, 2014 Olivier Brouard Slide 38/47
2- Applications: H.264 video coding – GOP structure adaptation Subjective tests • Setup • display resolution 1920x1080 • normalized room [BT.500-11] • ~30 naïve observers • (72=8x4x2+8) video sequences • Methodology ACR • for each sequence observers have to assess the quality 20 October, 2014 Olivier Brouard Slide 39/47
2- Applications: H.264 video coding – GOP structure adaptation Results • QGOP: MOS modified coder • Qx264: MOS x264 coder • sequences with a high IT value high motion • GOP structure adaptation 20 October, 2014 Olivier Brouard Slide 40/47
2- Applications: H.264 video coding – Adaptive quantization Adaptive quantization • Objective • control the distribution of binaries resources saliency maps • increase the perceived visual quality • Modification of the saliency maps quantization and morphological filtering • Modification of the coder 20 October, 2014 Olivier Brouard Slide 41/47
2- Applications: H.264 video coding – Adaptive quantization Results (1) • Rate – Distortion (PSNR) [Bjontegaard 2001] 20 October, 2014 Olivier Brouard Slide 42/47
2- Applications: H.264 video coding – Adaptive quantization Subjective assessments • Results • QQA: MOS modified coder (adaptive quantization) • Qx264: MOS x264 coder • no specific content suitable unsuitable for coding and broadcasting of HDTV at high bitrate • overhead, linear law ? 20 October, 2014 Olivier Brouard Slide 43/47
Conclusion Conclusion (1) • Video pre-analysis • spatio-temporal segmentation • detection of moving objects • objects tracking • visual attention modeling • saliency maps • Applications • advanced video coding • video transmission with priority based on the saliency maps [Boulos 2010] • video summarization, indexation • … 20 October, 2014 Olivier Brouard Slide 44/47
Conclusion Conclusion (2) • Applications of the video pre-analysis • GOP structure adaptation • B frames dynamic variation • temporal segment classification • IT and IS • GOP size adaptation • I frame insertion change detection: IT • Adaptive quantization based on the saliency maps 20 October, 2014 Olivier Brouard Slide 45/47
Conclusion Conclusion (3) • Subjective quality assessment tests • GOP structure adaptation no significant differences • +0.18 (on a scale of 1 to 5) • well suited for sequences with high motion • Adaptive quantization • no clearly content suitability seems unsuitable for coding and broadcasting of HDTV at high bitrate … adaptation law could be modified … 20 October, 2014 Olivier Brouard Slide 46/47
Conclusion Perspectives • Better performance evaluation of our visual attention model • eye-tracking experiments • Psychophysical experiments to optimize the model parameters improve the fusion process [Marat 2010] • Add high-level visual information face, flesh hue, … 20 October, 2014 Olivier Brouard Slide 47/47
Thank you. Questions ? 20 October, 2014 Olivier Brouard Slide 48