Advanced Video Coding Techniques for Quality Optimization in Machine Learning Era

On PerceptualCoding: Quality, Content Features and Complexity Patrick Le Callet Université de Nantes

VideoCoding @ Machine (Deep) Learning Era (1) STANDARD PLAYGROUND DISRUPTIVE PLAYGROUND Full Deep Autoendoder GANs … Geometricdeeplearning • Hybridapproach: symbolic, statistic and/or deeplearningfully compatible withexistingCoDecs • Modelsthatcanpredict: • Optimal/ad hoc transform S. Puri, S. Lasserre, et P. Le Callet, « Annealedlearningbased block transforms for HEVC videocoding », ICASSP 2016 • Optimal Syntaxprediction/signaling S. Puri, S. Lasserre, et P. Le Callet, « CNN-basedtransform index prediction in multiple transformsframework to assistentropycoding »EUSIPCO 20172017, p. 798‑802. • Ad hoc VideoQualitymeasures…

VideoCoding @ Machine (Deep) Learning Era (2) Hyper Space of possibilities: codec complexity, content diversity, viewingexperience Content type (PGC, UGC…) New viewingExperience (technology push) HDR/WCG VR/AR FTV New Distortions (e.g. CODECs) Distortion level

VideoCoding @ Machine (Deep) Learning Era (3): STANDARD PLAYGROUND OUR CURRENT FOCUS Rate / Distortion / Complexity optimisation (RDCO) UGC encodingRecipe Pre-processing –coding optimisation (PPCO) Ad hoc testingmethodologiesboosted by AI VideoQualityMeasure for ContextResilientCoding • Hybridapproach: symbolic, statistic and deeplearning • Modelsthatcanpredict: • Optimal/ad hoc transform • Optimal Syntaxprediction/signalling • Ad hoc VideoQualitymeasures… Characterize Content

Local Vs Global Optimize a system VideoQualityAssessment? …a matter of use case Quality Range Use case: how the the media isexploited Benchmark Systems « codec A vs codec B » Display proc. 1 Display proc. 2

VIDEO QUALITY ASSESSMENT @ MACHINE LEARNING ERA BOOSTING SUBJECTIVE TEST: active sampling IMPROVING Metric learning : active sampling SMART DATA vs BIG DATA: - data augmentation - Full Reference metric as annotation => global or local

CONTENT CHARACTERIZATIONTOWARDS Rate DISTORTION COMPLEXITY OPTIMISATION (RDCO)

Content Influence [motion search, CU size, Depth …] A. Aldahdooh, M. Barkowsky, and P. Le Callet, “The impact of complexity in the rate-distortionoptimization: A visualizationtool,” IWSSIP 2015

Learning Content Features 144 features: spatial and temporal, luma and chroma Motion Range prediction (HM/Qp 32) Predicting Block size (x.265/Qp 32) A. Aldahdooh, M. Barkowsky, and P. Le Callet, “The impact of complexity in the rate-distortionoptimization: A visualizationtool,” (IWSSIP 2015

CONTENT CHARACTERISATION towarDS USER GENERATED CONTENTS (UGC) ENCODING RECIPES

Exploring the Characteristics of UGC map any uploaded UGC that have the same encoding characteristics (similar R-D curves obtained using certain codecs) with already known UGC predict UGC encoding characteristics (R-D category) from Content characteristics? Distance of R-D curves: BD-Rate/Distortion/Quality

BD-Rate/PSNR based clustering Different color represent different R-D related category. AVC/H264 • Encoded using AVC/H264 with fixed QP values of 20, 22, 25, 27, 32, 36, 41, 46, and 51 • Cluster results based on BD-rate VP9 • Encoded using VP9 with fixed CRF values of 20, 22, 25, 27, 32, 36, 41, 46, and 51 • Cluster results based on BD-rate

Clustering based on BDR/quality + Feature Selections Explainable AI Feature Selection Classification BD-Rate/Quality based clustering BD-Rate/Quality based clustering R-D labels (Facebook) R-D labels (YouTube) Selected Feature Set Feature Selection RD-Curve related classification Selected content feature Content feature Feature Extraction Feature Extraction Predicted R-D related labels

THE NEED FOR CONTENT CHARACTERISATION …and GOOD (NEW?) VIDEO QUALITY MEASUREFOR PREPROCESSING/CODING OPTIMIZATION (PPCO)

PreProcessing (content Based) – Codingevaluation Source video Reconstructed HVSPP video Encoded Bit-stream Encode HVSPP Decode Transmit Compare performance (HOW?) Network Reconstructed original video Needs for accurate Quality Estimator at given operating points (not global): + check the pre-processing/encoding efficiency for multiple viewing conditions Subjective tests as a ground truth: Paired comparison (PC) has higher discriminatory power Image quality estimator: often not optimized for different viewing conditions M. Bhat, JM Thiesse, P. Le Callet“HVS based perceptual pre-processing for video coding ”, EUSIPCO 2019 M. Bhat, JM Thiesse, P. Le Callet, “OnAccuracy of Objective Metrics For Assessment of PerceptualPre-Processing for VideoCoding », ICIP 2019

-ΔRmax -ΔRmean -ΔRmin Bit-rate savings at same quality for observer at 3H and 4.5H

THE NEED FOR CONTENT CHARACTERISATION …and GOOD (NEW?) VIDEO QUALITY MEASUREFOR DYNAMIC CODING

CODECs and Dynamiccoding Selecting optimalResolution & frame Rate: content dependency Quality Bit Rate

CODECs and Dynamiccoding Quality -ΔRmax -ΔRmean -ΔRmin • Confidence in Subjective Score? Confidence in Videoqualitymetric? a.k.ametricresolution Bit Rate

DEVELOPING MEANINGFULL VIDEO QUALTY MEASURES

Objective VideoQualityMeasure native spaces? Michel Saad, Patrick Le Callet and Phil Corriveau «Blind Image Quality Assessment: Unanswered Questions and Future Directions in the light of consumers needs », 2nd VQEG eLetter, 2015 VQM predict a quality score on a scale And validatedwith subjective data obtained on Scaletoo (ACR, DSIS, SSQE, DSCQS, SAMVIQ …) Figure of merit of VQM: correlation Coefficient?

MEANINGFULL METRIC? We want a metric to… • …say if A is of better quality of B • => PAIRWISE problem that should addressed as such OM = 0.80 OM = 0.80 OM = 0.75 OM = 0.45 |ΔOM| = 0.05 |ΔOM| = 0.35 ΔOM = +0.35 ΔOM is irrelevant

VIDEO CODING and QUALITY ASSESSMENT: What do we need?GETTING MORE RELIABLE DATA iN NARROW QUALITY RANGE => improving confidence of GROUND TRUTH

MEANINGFULL METRIC? We want a metric to… • …give closer scores for qualitatively similar pairs and distant scores for significantly different pairs • …give higher score for significantly preferred stimulus Left image preferred with statistical significance No significant difference in preferences OM = 0.80 OM = 0.80 OM = 0.75 OM = 0.45 |ΔOM| = 0.05 |ΔOM| = 0.35 ΔOM = +0.35 ΔOM is irrelevant

PAIR COMPARISON (A/B) test design and analysis Ground truth for video Quality measures development: Conversion to scale values possible using Bradley-Terry or Thurstone-Mosteller models The goal: Mappingprobabilities of preferenceto a scale => Linearmodels of pairedcomparisons A3 A4 A1 A2 Each stimulus Ai has a merit «Vi »: in psychophysics, a sensation magnitude on a scale

Boosting PC test: Adaptive Square Design (ASD) ITU and IEEE standard For the scenario that the ranking order of the test stimuli is not available • Initializethe square matrix randomly • Run paired comparisonsaccording to the rules of square design. 3. Calculate the estimated scores. According to current paired comparison results calculate the scores and sort them. 4. Updatethe square matrix. The adjacent pairs could be arranged according to this spiral 5. Repeatstep 2 and 4, until certain conditions are satisfied (e.g., 40 observers) B-T model A6 A1 A5 A2 A4 A3 A8 A9 A7 Rearrange the matrix Run pair comparison A6 A5 A1 A4 A2 A3 A8 A9 A7 Final result (PoE)

Active sampling for pairwise comparison (NIPS 2018) Batchselection: Activelearningaccording toBayesiantheory,KLdivergence, Expected Information Gain (EIG), Minimum SpanningTree(MST) Batchselection: Activelearningaccording toBayesiantheory,KLdivergence, Expected Information Gain (EIG), Minimum SpanningTree(MST) A minimum spanning tree (selection of batch) J.Li,et.al.,Hybrid-MST: A Hybrid Active Sampling Strategy for Pairwise Preference Aggregation,NIPS2018

VIDEO CODING and QUALITY ASSESSMENT: What do we need?GETTING MORE RELIABLE DATA iN NARROW QUALITY RANGE => improving confidence of GROUND TRUTHVALIDATING OBJECTIVE IMAGE QUALITY PREDICTORS …REVISIT BENCHMARKING

Objective VideoQualityMeasure native spaces? VQM predict a quality score on a scale And validatedwith subjective data obtained on Scaletoo (ACR, DSIS, SSQE, DSCQS, SAMVIQ …) Figure of merit of VQM: RMSE? => needs for mapping

Objective VideoQualityMeasure native spaces? VQM predict a quality score on a scale And validatedwith subjective data obtained on Scaletoo (ACR, DSIS, SSQE, DSCQS, SAMVIQ …) Figure of merit of VQM: RMSE? => needs for mapping => Meaningregardingaccuracy of the MOS?

VQM native spaces? Most of VQM predict a quality score on a scale And validatedwith subjective data obtained on Scaletoo (ACR, DSIS, SSQE, DSCQS, SAMVIQ …) => alternative Pair Comparison • DedicatedframeworkHanhart and al. Qomex 16 …not a native space of VQM

Alternative analysis in a native space (Krasula and al. Qomex 16) Take care of Confidence of subjective scores No Mapping objective scores to common subjective scales

CODECs and Dynamiccoding Quality -ΔRmax -ΔRmean -ΔRmin • Confidence in Subjective Score? Confidence in Videoqualitymetric? a.k.ametricresolution Bit Rate

Per Quality Range Analysis Native output of VQM/OM (no mapping) MOS RANGE [1-2] MOS RANGE [2-3] MOS RANGE [3-4] MOS RANGE [4-5]

Per Quality Range Analysis All VQM challenges, AUC ≈ 0.5 (HDR-VDP2.2 slighltybetter) VIF-PU 68% HDR-VDP 59% PSNR/SSIM <50% All VQM ≈ 50% All VQM ≈ 50% (PSNR worst 45%) HDR-VDP 73% VIF-PU 66% SSIM 64%% PSNR 60% MOS RANGE [1-2] MOS RANGE [2-3] MOS RANGE [3-4] MOS RANGE [4-5] BROADCAST QUALITY

Training objective metrics on multiple databases • 20 databases • 3,606 videos • 6 different subjective procedures • ACR, DSIS, SSCQE, SAMVIQ, and PC • Proof of concept • Shallow NN with the methodology as a cost function • Conservative training strategy • Significant improvement in overall metric’s performance Krasula et al. – Training Objective Image and Video Quality Estimators Using Multiple Databases, IEEE Transactions on Multimedia 2019

VideoQualityMeasure for ContextResilientCoding

Local Vs Global Optimize a system VideoQualityAssessment? …a matter of use case Quality Range Use case: how the the media isexploited Benchmark Systems « codec A vs codec B » Display proc. 1 Display proc. 2

A HyperSpace of possibilities: more (meta)dimensions Content type IDIOSYNCHRASY New viewingExperience (technology push) Distortion level CONTEXT?

Context: Viewing distance & Visual Angle

TMO and Preference LDR HDR LDR L. Krasula, M. Narwaria, K Fliegel and P. Le Callet, “Influence of HDR Reference on Obervers Preference in Tone Mapped Images Evaluation," in Seventh International Workshop on Quality of Multimedia Experience (QoMEX), May 2015.

Context: The importance of viewing conditions B. Watson: “Viewing distance should be an input parameter of the metric” & importance of display Objective quality assessment of color images based on a generic perceptual reduced reference , M Carnec, P Le Callet and D Barba, Signal Processing: Image Communication, 2006

Perceptualbasedapproach: a closer look PerceptualErrorMaps: Errornormalizedby visibilitythreshold ViewingDistance Model of Display Reference Image From bits to Visual Unit Visual differences & Normalization Distorted Image From bits to Visual Unit • Pooling: • - Component (color) • - Orientation • Frequency • Spatial • - Temporal Quality Score • Modeling the annoyance/ the formation of qualityjudgement • Fromquantified visible errors to impact on visualquality

On PerceptualCoding: Quality, Content Features and Complexity Patrick Le Callet Université de Nantes

IMAGE ENHANCEMENT USE CASE The Non Reference Case: adhocmethods and testingmethodologies Range Effect OvershootEffect Michel Saad, Patrick Le Callet and Phil Corriveau «Blind Image Quality Assessment: Unanswered Questions and Future Directions in the light of consumers needs », 2nd VQEG eLetter, 2015

Advanced Video Coding Techniques for Quality Optimization in Machine Learning Era