1.29k likes | 1.52k Views
Perceptual Video Distortion Metrics and Coding. Dr H. R. Wu Associate Professor Audiovisual Information Processing and Digital Communications (AVIPAC) Monash University, VIC 3800, Australia TEL: +61 3 990-5 3255 or-5 3414 , FAX: +61 3 9905 5146 EMAIL: hrw,mdt@csse.monash.edu.au
E N D
Perceptual Video Distortion Metrics and Coding Dr H. R. Wu Associate Professor Audiovisual Information Processing and Digital Communications (AVIPAC) Monash University, VIC 3800, Australia TEL: +61 3 990-5 3255 or-5 3414 , FAX: +61 3 9905 5146 EMAIL: hrw,mdt@csse.monash.edu.au http://www.csse.monash.edu.au/~hrw H.R. Wu@ISEC.Stanford2K1,
Opening Remark “Cast a brick into the ring to attract jade.” or “Offer a few commonplace remarks by way of introduction so that others may come up with more valuable opinions and contributions.” -Chinese proverb. H.R. Wu@ISEC.Stanford2K1,
OUTLINE 1. Introduction 1.1. Half a century’s endeavour 1.2. Fundamental issues 1.3. In pursuit of ultimate goals in digital video compression and visual communications 1.4. A personal perspective 2. Perceptual Video Quality/Impairment Metrics 2.1. HVS based quality/impairment metrics and measurements 2.2. A vision model based quality metric (VQR) 2.3. A vision model based perceptual blocking impairment metric (PBIM) 3. Perceptual image/video coding 4. Concluding Remarks H.R. Wu@ISEC.Stanford2K1,
1. Introduction 1.1. Half a century’s endeavour 1.2. Fundamental issues 1.3. In pursuit of ultimate goals in digital video compression and visual communications 1.4. A personal perspective H.R. Wu@ISEC.Stanford2K1,
1.1. Half A Century’s Endeavour • Beginning of digital video coding research commonly acknowledged is 1950s [R.J.Clarke] • C.C. Cutler’s U.S. patent on DPCM, 1952. • C.W. Harrison’s experiments with linear prediction in television, 1952. • D.A. Huffman’s paper on a method for the construction of minimum redundancy codes, 1952. • Earlier pioneering work • C.E. Shannon’s monumental work on the mathematical theory of communication, 1948. • D. Gabor’s paper on theory of communication, 1946. • R.D. Kell’s British patent on the principle of frame difference signal transmission, 1920 [A. Seyler and Z. Budrikis]. H.R. Wu@ISEC.Stanford2K1,
1. Introduction 1.1. Half a century’s endeavour 1.2. Fundamental issues 1.3. In pursuit of ultimate goals in digital video compression and visual communications 1.4. A personal perspective H.R. Wu@ISEC.Stanford2K1,
1.2. Fundamental Issues 1.2.1. Rate-distortion optimization • Given a bit-rate budget or bandwidth (constant bit-rate), • minimize spatio-temporal (and cross scales) statistical redundancy and • psychovisual redundancy (or irrelevancy) • to achieve the best possible perceptual picture quality. • Given a desired picture quality (constant quality), • to obtain the lowest possible bit-rate or amount of data. 1.2.2. Theoretical and practical lower bounds • Theoretical lower bound for lossless image/video coding, Shannon’s entropy. • Theoretical lower bound for lossy image data coding, quantitative definition of “psychovisual redundancy”? H.R. Wu@ISEC.Stanford2K1,
1. Introduction 1.1. Half a century’s endeavour 1.2. Fundamental issues 1.3. In pursuit of ultimate goals in digital video compression and visual communications 1.4. A personal perspective H.R. Wu@ISEC.Stanford2K1,
1.3.1. En route to “superman” in image/ video coding-In Pursuit of Ultimate Goals • Model-based coding • Shape, context, region/object based coding techniques • Other coding methods-matching pursuit, fractal coding, POCS • coding of image data or transform coefficients, versus • coding of transforms or projections. • Picture restoration as an integral part of compression strategy • deblocking • deringing • deblurring • reduction of temporal granular noise, mosquito effects • HVS factors and perceptual coding • The works • Better transforms, in terms of • decorrelation and energy packing efficiencies, minimum MSE using truncated number of transform coefficients for recostruction • spatial, spatial-frequency, spatio-temporal-frequency localisation • Better prediction models, balancing • coding the model, and • coding of the prediction error or residuals. • Smarter/adaptive quantisation algorithms • Better motion prediction techniques • Fast algorithms and implementations • Rate-distortion optimisation • Entropy/variable length coding H.R. Wu@ISEC.Stanford2K1,
1.3.2. Achievements in image/video coding • Statistical redundancy and theoretical lower bound (Shannon’s entropy) for lossless image data coding; • Entropy/Practical variable length coding (Huffman code and arithmetic coding); • Modeling of natural image (first order Morkov process); • Optimal or sub-optimal transform coders • Karhunen-Loeve transform • Cosine transform • Vector quantisation • Subband and wavelet transform coding • Motion prediction and practical motion compensation algorithms; • Rate-distortion optimisation with MSE; • Standards: JPEG, ITU-T H.261, H.263, MPEG-1 & -2 & -4, JPEG2000. H.R. Wu@ISEC.Stanford2K1,
Traditional methods • constant quantisation step-size does not lead to constant perceptual picture quality, nor does constant PSNR/MSE • “Whose coder provides better visual performance?” • more elegant HVS based adaptive quantisation/rate control algorithms (avoiding bit stuffing or forced coarse quantisatiion) • spatial & temporal masking • different rate-distortion slopes for different coefficients • reduction of certain coding artifacts and manifesting other types • New methods • Model-, object-, region- or segmentation-based coding • effective and efficient segmentation algorithms • balance of bit allocations between model description and coding of residual image • Recursive or projection theory based coding (matching pursuit, fractal transforms and projection on to convex sets) • inferior rate-distortion performance with practical systems/applications • Perceptual image/video coders-if only we knew how • Constant quality coding, and above all... 1.3.3. Frustrations in image/video coding H.R. Wu@ISEC.Stanford2K1,
Quality/impairment assessment • subjective assessment (ITU-R BT.500-9) • subjectivity- “What do you regard as slightly annoying?” • large deviation, recency and contextual effects • time consuming and expensive • lack of constructive information for coder design • quantitative quality/impairment assessment • “What’s wrong with mean-squared error?” [B. Girod] • something better than PSNR or MSE • various coding artifacts • different coding artifacts dominate at different coding rates and resolutions • “VQEG movement” • Video Quality Experts Group with delegations from ITU-T Study Groups 9 and 12 and ITU-R Study Group 11; • Validating objective measures of video quality; • Leading to one or more ITU Recommendations; • Results reported in a final report, Mar 2000… 1.3.3. Frustrations in image/video coding H.R. Wu@ISEC.Stanford2K1,
Problems of existing subjective test methods • The task: • categorical response, and • absolute judgement; • Human subjects are not accustomed to this task in their daily life; • The subjective data is noisy and vulnerable to biases; • People are good at comparative judgements; • For example,... H.R. Wu@ISEC.Stanford2K1,
Is this slightly annoying? Bit rates: 1.2, 1.4, 1.6, 1.8, 2.5 Mbps H.R. Wu@ISEC.Stanford2K1,
Quality/impairment assessment • subjective assessment (ITU-R BT.500-9) • subjectivity- “What do you regard as slightly annoying?” • large deviation, recency and contextual effects • time consuming and expensive • lack of constructive information for coder design • quantitative quality/impairment assessment • something better than PSNR or MSE • various coding artifacts • different coding artifacts dominate at different coding rates and resolutions • “VQEG movement” • Video Quality Experts Group with delegations from ITU-T Study Groups 9 and 12 and ITU-R Study Group 11; • Validating objective measures of video quality; • Leading to one or more ITU Recommendations; • Results reported in a final report, Mar 2000… 1.3.3. Frustrations in image/video coding H.R. Wu@ISEC.Stanford2K1,
Contextual effects in the DSIS II[1] • Large contextual effects in the DSIS II [1] P. Corriveau et. al, ``All subjective scales are not created equal: The effects of context on different scales'', Signal Processing, Vol. 77 (1999) 1-9. H.R. Wu@ISEC.Stanford2K1,
Quality/impairment assessment • subjective assessment (ITU-R BT.500-9) • subjectivity- “What do you regard as slightly annoying?” • large deviation, recency and contextual effects • time consuming and expensive • lack of constructive information for coder design • quantitative quality/impairment assessment • something better than PSNR or MSE • various coding artifacts... • different coding artifacts dominate at different coding rates and resolutions • “VQEG movement” • Video Quality Experts Group with delegations from ITU-T Study Groups 9 and 12 and ITU-R Study Group 11; • Validating objective measures of video quality; • Leading to one or more ITU Recommendations; • Results reported in a final report, Mar 2000… 1.3.3. Frustrations in image/video coding H.R. Wu@ISEC.Stanford2K1,
Various coding artifacts 1.3.3. Frustrations in image/video coding Blocking H.R. Wu@ISEC.Stanford2K1,
Various coding artifacts 1.3.3. Frustrations in image/video coding DCT basis image & Mosaic H.R. Wu@ISEC.Stanford2K1,
Various coding artifacts 1.3.3. Frustrations in image/video coding Ringing H.R. Wu@ISEC.Stanford2K1,
Various coding artifacts 1.3.3. Frustrations in image/video coding Sons&Daughter Frame 40 MPEG-1 coded Sons&Daughter Frame 41 MPEG-1 coded Stationary area temporal fluctuations H.R. Wu@ISEC.Stanford2K1,
Various coding artifacts 1.3.3. Frustrations in image/video coding Difference between Sons&Daughter frame 40 and 41 MPEG-1 coded Stationary area temporal fluctuations H.R. Wu@ISEC.Stanford2K1,
Quality/impairment assessment • subjective assessment (ITU-R BT.500-9) • subjectivity- “What do you regard as slightly annoying?” • large deviation, recency and contextual effects • time consuming and expensive • lack of constructive information for coder design • quantitative quality/impairment assessment • something better than PSNR or MSE • various coding artifacts... • different coding artifacts dominate at different coding rates and resolutions • “VQEG movement” • Video Quality Experts Group with delegations from ITU-T Study Groups 9 and 12 and ITU-R Study Group 11; • Validating objective measures of video quality; • Leading to one or more ITU Recommendations; • Results reported in a final report, Mar 2000… 1.3.3. Frustrations in image/video coding H.R. Wu@ISEC.Stanford2K1,
1.3.3. Frustrations in image/video coding VQEG work phase 1 • 1997 - 1999 • Subjective test • 8 independent labs; • 20 test sequences (50 Hz and 60 Hz) and 16 Hypothetical Reference Circuits (HRCs); • Method: ITU-R BT.500 DSCQS. • Objective test • 10 proponents. H.R. Wu@ISEC.Stanford2K1,
1.3.3. Frustrations in image/video coding VQEG proponents • [P0] Peak Signal to Noise Ratio (PSNR) • [P1] CPqD (Brazil) • [P2] Tektronics / Sarnoff (USA) • [P3] NHK (Japan)/ Mitsubishi Electric Corp. (Japan) • [P4] KDD (Japan) • [P5] Swiss Federal Institute of Technology (EPFL) (Switzerland) • [P6] TAPESTRIES (European Union) • [P7] NASA (USA) • [P8] KPN Research (The Netherlands) / Swisscom CIT (Switzerland) • [P9] NTIA/ITS (USA) H.R. Wu@ISEC.Stanford2K1,
1.3.3. Frustrations in image/video coding VQEG result (both 50 & 60 Hz) • Pearson correlations • P0: PSNR • P2: Sarnoff • P5: EPFL • P7: Watson • P8: KPN • VQEG Statement: 8 or 9 statistical indistinguishable models H.R. Wu@ISEC.Stanford2K1,
1.3.3. Frustrations in image/video coding VQEG result (60 Hz) • Pearson correlations • P0: PSNR • P2: Sarnoff • P5: EPFL • P7: Watson • P8: KPN • P5 (EPFL) the highest correlation in 60Hz test H.R. Wu@ISEC.Stanford2K1,
1.3.4. “Is digital video compression dead?” • Who asked the question?-Prof Edward J. Delp of Purdue University, in a keynote at VCIP 2000, Perth, Australia. • The answer?-No, of course. • A historical lessen in speech and audio coding • prior to 1988, focusing on low bit rate and MSE/SNR • source model and MSE-based coding; • weighted MSE. • PAC-perceptual audio coder, Bell Labs [1] • research in the area is unabated. • Not all methods extend well from 1-D to 2-D • weighted MSE has not worked as well as we would hope. [1] J.D. Johnston, “Transform coding of audio signals using perceptual noise criteria”, IEEE Journal on Selected Areas in Communications, 6(2), pp.314-323, 1988. H.R. Wu@ISEC.Stanford2K1,
1. Introduction 1.1. Half a century’s endeavour 1.2. Fundamental issues 1.3. In pursuit of ultimate goals in digital video compression and visual communications 1.4. A personal perspective H.R. Wu@ISEC.Stanford2K1,
1.4. A personal perspective • 1994 discussion with Prof Martin Vetterli • What’s the next move, VQ, Wavelet Transform or…? • Mixed channel and source coding; • HVS based quantitative quality metrics. • 1996’s visit to Prof Tsuhan Chen at AT&T Research • talking about BIM; • question raised if BIM would work on video; • blockiness under cell-loss conditions? • Dr Christian ven dan Lambrecht’s (EPFL, HP Labs, EMC) workshop on his MPQM and NVFM at Monash University in 1996 • multichannel vision models! • software implementations; • threshold vision models for supra-threshold vision applications. H.R. Wu@ISEC.Stanford2K1,
1.4. A personal perspective • 1997 discussion with Prof Bernd Girod on GBIM and evaluation of s-hat, MPQM and NVFM in Erlangen and PCS’97 in Berlin • inability to reliably assess digital video quality or degree of impairment; • inability to quantitatively define “psychovisual redundancy”. • 1999 discussion with Prof Bernd Girod at Monash • vision model plus • parameterization and maybe more… • 1999 discussions with Profs Brian Wandell and David Heeger • vision models so far based on very “sparse data”(!!!) • Discussions with Dr Jeff Lubin and Albert Pica at Sarnoff • various models used in quality metric design. H.R. Wu@ISEC.Stanford2K1,
1.4. A personal perspective • (1999) Generous supports from Stephen Wolf and his colleagues at NTIA/ITS and VQEG co-chairs Philip Corriveau and Arthur Webster • VQEG subjective test data; • VQEG test sequence; • Performance criteria. • 2000 VQEG final report • “8 or 9 statistical indistinguishable models” • including PSNR! • Forced to modified our approach • multichannel vision model; • parameterization using application specific data, i.e., VQEG subjective test data. • Bernd was right in 1999 after all. H.R. Wu@ISEC.Stanford2K1,
OUTLINE 1. Introduction 1.1. Half a century’s endeavour 1.2. Fundamental issues 1.3. In pursuit of ultimate goals in digital video compression and visual communications 1.4. A personal perspective 2. Perceptual Video Quality/Impairment Metrics 2.1. HVS based quality/impairment metrics and measurements 2.2. A vision model based quality metric (VQR) 2.3. A vision model based perceptual blocking impairment metric (PBIM) 3. Perceptual image/video coding 4. Concluding Remarks H.R. Wu@ISEC.Stanford2K1,
2. Perceptual Video Quality/Impairment Metrics “When you can measure what you are speaking about and express it in numbers, you know something about it.” -Lord William Thomson Kelvin, (1824-1907) Physicist 2.1. HVS based quality/impairment metrics and measurements 2.2. A vision model based quality metric (VQR) 2.3. A vision model based perceptual blocking impairment metric (PBIM) H.R. Wu@ISEC.Stanford2K1,
2.1. HVS based quality/impairment metrics and measurements for digital video 2.1.1. Introduction to HVS based metrics • Why they are required? • What are they?-quality metrics v.s. impairment metrics • HVS modelling. • How to measure the metrics performance? 2.1.2. Previous work in the area 2.1.3. Latest development H.R. Wu@ISEC.Stanford2K1,
2.1.1. Introduction to HVS based metrics Applications and standards • Digital video compression techniques are widely used in • Digital TV; • Video conferencing; • Video phone; • Internet video; • VCD, DVD, etc. • International video coding standards • H.261/H.263/MPEG-1/2/4. H.R. Wu@ISEC.Stanford2K1,
2.1.1. Introduction to HVS based metrics • Distortions introduced by digital video coding algorithms • fundamentally differ from analog video distortions • Structured distortions. • Various types of distortions • Blocking; • Blurring; • Ringing; • Mosquito; • Jerkiness; • etc. H.R. Wu@ISEC.Stanford2K1,
2.1.1. Introduction to HVS based metrics Quantitative quality/impairment assessment and metrics are required to • measure/monitor video coding/transmission system performance; • provide a better understanding of the distortions introduced by the video coding system and to improve coding algorithms, such as • reliable HVS-based adaptive quantisation, and • bit rate control algorithms; • design perceptual digital video codec providing constant video quality for visual communication services; and • quantitatively define “psychovisual redundancy” and corresponding lower bound, if possible. H.R. Wu@ISEC.Stanford2K1,
2.1.1. Introduction to HVS based metrics Assessment methods • Subjective • The quality is evaluated by a group of assessors subjectively; • Very expensive and time-consuming; • Defined in the ITU-R BT.500. • Objective • Given a processed video sequence with or without a reference sequence, a computer program or system will evaluate the quality or the impairment of the processed sequence with an objective score. H.R. Wu@ISEC.Stanford2K1,
2.1.1. Introduction to HVS based metrics Vision modelling • Vision research is experimental science. • Relies on experiments to reveal mechanisms. • Categorised by test methods: • Detection vs discrimination; • Threshold vs suprathreshold. H.R. Wu@ISEC.Stanford2K1,
2.1.1. Introduction to HVS based metrics Mechanisms of vision • Colour encoding; • Pattern sensitivity • Spatial contrast sensitivity; • Temporal sensitivity. • Multiresolution image representations; • Masking • Spatial • intra-band; • inter-band; • inter-orientation; • temporal; and • colour masking. H.R. Wu@ISEC.Stanford2K1,
2.1.1. Introduction to HVS based metrics Colour encoding • Opponent-colours: B-W, R-G and B-Y • Othe colour spaces:YCbCr, CIE L*u*v*, CIE L*a*b*, and CIE XYZ H.R. Wu@ISEC.Stanford2K1,
2.1.1. Introduction to HVS based metrics Contrast sensitivity - definition • Contrast threshold is the necessary contrast to elicit/get a response; • Contrast sensitivity is defined as the inverse of contrast threshold. H.R. Wu@ISEC.Stanford2K1,
2.1.1. Introduction to HVS based metrics Spatio-temporal contrast sensitivity function H.R. Wu@ISEC.Stanford2K1,
2.1.1. Introduction to HVS based metrics Space-time separability • The sensitivity scaling hypothesis H.R. Wu@ISEC.Stanford2K1,
2.1.1. Introduction to HVS based metrics Temporal channels • One sustained (low-pass) and one transient (band-pass) temporal channel H.R. Wu@ISEC.Stanford2K1,
2.1.1. Introduction to HVS based metrics Spatial contrast sensitivity H.R. Wu@ISEC.Stanford2K1,
2.1.1. Introduction to HVS based metrics Multiresolution image representation • 5 frequency levels and 4 orientations in this example H.R. Wu@ISEC.Stanford2K1,
2.1.1. Introduction to HVS based metrics Implementation by the steerable pyramid • 6 orientations in this example H.R. Wu@ISEC.Stanford2K1,
2.1.1. Introduction to HVS based metrics Steerable Pyramid decomposition 4 orientation and 5/6 spatial frequency levels H.R. Wu@ISEC.Stanford2K1,