620 likes | 1.3k Views
Image and Video Quality Assessment Objective Quality Metrics. Dr. David Corrigan. Outline. Motivation Subjective Image and Video Quality Assessment Test Methodologies Benchmarking objective Metrics Objective IQA Metrics Metrics based on models of the HVS
E N D
Image and Video Quality Assessment Objective Quality Metrics Dr. David Corrigan
Outline • Motivation • Subjective Image and Video Quality Assessment • Test Methodologies • Benchmarking objective Metrics • Objective IQA Metrics • Metrics based on models of the HVS • Metrics based on structural distortion • Objective VQA Metrics • Metrics based on HVS • “Feature-based” metrics • “Motion-based” metrics Covered in this Presentation
What is different about video? • It is more difficult to model perceptual quality of video • All the factors governing image quality perception apply • Temporal effects • Simply combining scores estimated on each frame separately is not wise. • Much more distortions to consider. • What bout audio? • Much more data to process • Basic Strategies • Models based on the HVS • Models based on “features” (including SSIM-based methods) • Methods based on motion modelling.
Key References • The Essential Guide to Video Processing. A. Bovik, Academic Press, 2009. ISBN: 978-0-12-374456-2 • K. Seshadrinathan, R. Soundararajan, A. C. Bovik and L. K. Cormack, "Study of Subjective and Objective Quality Assessment of Video", IEEE Transactions on Image Processing, vol.19, no.6, pp.1427-1441, June 2010. • Z. Wang, L. Lu, and A. C. Bovik, “Video quality assessment based on structural distortion measurement,” Signal Process.: Image Commun., vol. 19, no. 2, pp. 121–132, Feb. 2004. • M. H. Pinson and S.Wolf, “A new standardized method for objectively measuring video quality,” IEEE Trans. Broadcast., vol. 50, no. 3, pp. 312–322, Sep. 2004. • K. Seshadrinathan and A. C. Bovik, “Motion tuned spatio-temporal quality assessment of natural videos,” IEEE Trans. Image Process., vol. 19, no. 2, pp. 335–350, Feb. 2010.
Pre-Processing/Calibration • Temporal Calibration as well as Spatial Calibration • Sequences must be temporally aligned • Eg. In cases where frame loss occurs. • Calibration of Display Device • Gamma correction • Gain and Offset • Aspect Ratio/Eccentricity • Viewing Distance
HVS-based Models • Temporal Aspects of the Human Visual System • Less well understood compared to spatial mechanisms. Sustained Mechanism Transient Mechanism Overall the HVS is low pass temporally. Not Surprising
HVS-Based Metrix • MPQM (v.d. Branden Lambrecht et al.) • PDM (Perceptual Distortion Metric) • Genista Corp / Symmetricom • JNDMetrix/PQA 200/PQA 500 (Lublin /Sarnoff/Tektronix) • All use low pass and bandpass temporal filtering • DVQ – (Watson et al / NASA) uses DCT and low pass IIR temporal filtering • VQA (AccepTV) – based on Daly’s DVP IQA method. • Note how many have been commercialised!
VQA Methods based on Features • Quality computation based on comparison of “features” between test and reference videos. • Feature extraction from sequences before comparison. • Examples of Features • Specific distortion types • Bluriness, Noisiness, Blockiness • Descriptions of image content • Mean luminance, contrast, structure, edginess, motion information. • We will look at a metric based on SSIM and the popular Video Quality Metric (VQM)
SSIM for Video (vSSIM) • Key Features • Computational Efficiency – local SSIM index for a small number of windows • Colour Pooling – SSIM measured on all 3 channels • Spatial Pooling – based on local luminance of each window • Temporal Pooling – based on amount of motion in each frame
Local SSIM Index Calculation • Same formula as before • For each window • is the frame index, is the window index and is the colour channel index. • The SSIM index is measured for (?) randomly chosen windows in each frame. • Index calculated for Y, Cb and Crchannels. The combined value is
Frame Quality Measure (Spatial Pooling) • Authors argue that dark regions do not cause fixations and thus should be given lower weights. • For each window, the mean intensity of the Y channel of the reference image is used to calculate
Sequence Quality Measure (Temporal Pooling) is the number of frames in the sequences • Authors argue that spatial distortions are less visible when there is a lot of motion present. • Block based motion estimation used to estimate the motion vector of each window in each frame. • Perhaps using block matching
Sequence Quality Measure (Temporal Pooling) • A frame level motion quantity value is defined for each frame • If is large we should assign a lower weight . • The authors define as Where
Video Quality Metric (NTIA) • Standardised by both ANSI and ITU after VQEG FRTV Phase II tests. • Overall quality value is a linear combination of 7 different quality parameters pooled both spatially and temporally. • Each quality parameter is calculated base on the comparison of a feature extracted from a spatio-temporal subregion of both test and reference sequences. • Calibration • Spatial and temporal alignment • Valid region extraction • Gain and Offset Correction
Quality Parameters used in VQM • si_loss– measures decrease of spatial information (ie. Gradient magnitude between reference and test sequences) • hv_loss– measures a shift (w.r.t. the reference sequence) in edge orientation from horiz./vert. to diagonal. • hv_gain– measures a shift in edge orientation from diagonal to horiz./vert. • chroma_spread– measures changes in the Cband Cr channels.
Quality Parameters used in VQM • si_gain– designed to detect improvements in edge quality due to edge sharpening in the test sequence. • ct_ati_gain– designed to detect moving edge impairments (ie. temporal distortions in areas of rich texture content). • chroma_extreme– detects localized colour impairments in the test signal.
Basic Steps in Calculating Quality Parameters Inputs: Calibrated Test and Reference Sequences ( and ) • Divide and into non-overlapping and abutting spatio-temporal regions of specified dimensions. • Calculate features for each s.-t. region • Calculate quality parameters for each region by applying an appropriate comparison function. • Apply spatial and temporal pooling operations to the regions’ quality parameters
Calculating Overall quality • Values below 0 are clipped. A value of 0 implies no loss of quality in the test sequence. • Nominal maximum value of 1. Value > 1 possible. • Values over 1 are crushed such that as
Video Quality Metrics Motion Modelling based Approaches
MOtion based Video Integrity Evaluation (MOVIE) Index • Philosophy • Other Methods too focused on spatial distortions • HVS-methods – weakly incorporated by use of temporal filters • Much research wrt HVS and Motion but these models have not been incorporated yet • vSSIM – only use motion for weighting spatial distortions • VQM – features mainly capture spatial distortions • Special effort to focus on temporal distortions • Detect Spatial Distortions & weak temporal distortions • Detect distortions which would cause the estimated trajectories of patches to change – (eg, flicker, shake etc) • MOVIE decomposes both sequences using a bank of Gabor Filters
MOVIE Index Motion Vectors
Gabor Filters • Neurons in Visual Cortex thought to respond best to a stimulus moving in a specific direction • Gabor filters thought to capture this neuron response well Gabor Filters are Band Pass Can vary as we want: Frequency of the oscillations (ie centre frequency) Phase of the oscillations Orientation of the oscillations The decay rate of the oscillations (ie. The bandwidth) Impulse Responses (Real Parts) of 40 different Gabor Filters
The N-D Gabor Filter A Gabor filter is a complex sinusoid modulated by a Gaussian function Impulse Response: • is a complex number and is a real valued matrix. In the Fourier Domain • is a complex number and is a real valued matrix. • “ ”
2D Gabor Filter Example Real Part FT Magnitude 2D FT Imaginary Part
Gabor Filter Bank in MOVIE • 3 levels of filters arrange in a sphere around the origin • 35 (or 70?) filters per level – each at a different orientation • 3 levels – for 3 different frequency ranges • Constant Octave b/w – (ie. b/w distance from origin) • An extra low pass Gaussian filter (DC filter) centred at the origin (b) (c)
Notation • is the output of filter no. in the filter bank. There are a total of filters in the filter bank • Like SSIM, differences between signal calculated over windows • Define a window centred at position in the video. • Define a vectors and to contain the complex magnitude of all 3D pixels near in for each filtered ref. and distorted vids. • Define to be the value of the nth 3D pixel in the window. There are pixels in the window. Gabor Filter Bank Gabor Filter Bank Reference Video DistortedVideo … …
Spatial Movie (See Paper for full description) • Calculate a normalised MSE for each pair of filter outputs • is bounded between 0 and 1. The energy is 0 when . • The average energy wrt is computed . • It is then computed to a per pixel quality score. A separate energy term is calculated for the outputs of the DC filter. RMS value of or whichever is higher
Temporal Movie (Measuring Temporal Distortions) • Consider an image patch with Fourier Transform . If the patch is moving with constant velocity such that the video then the Fourier transform of is
Temporal Movie (Measuring Temporal Distortions) Temporal Distortions can be estimated by estimating how well the test and reference sequences obey this model. Velocities are computed (ie. Using motion estimation) on the reference signal. If temporal distortions exist, in the test sequence, these velocities will be wrong and thus the fourier coefficients will not lie on this plane.
Temporal Movie • All of the filters will respond somewhat to fourier components on the plane but those closest to the plain will have a greater response. Plane in the frequency spectrum along which all non zero values should lie Define relative filter weight Centres of the Gabor Filters There is a different value of for each pixel as the velocity field is defined at full resolution
Temporal Movie (Look at paper for full details) • For each window we measure how well the filtered values each pixel agrees its motion. • The error is the MSE between these 2 vectors • The per pixel quality score is then Reference Signal: NOTE: See paper. Distorted Signal:
Pooling • and are defined for at every pixel () and (potentially) every frame (frame number ). • Both scores are first pooled to create a score per frame based on both the mean and standard deviation of the metric for that frame. • Lower mean implies lower quality • Higher variance implies lower quality • The frame level scores are both pooled to create separate Spatial MOVIE and Temporal MOVIE scores for the sequence. • The overall score is then
Video Quality Metrics Benchmarking VQA Metrics
Benchmarking on the FRTV1 dataset • Most methods are statistically indistinguishable from PSNR • vSSIM & MOVIE perform well but perhaps not by much. • Also tests were conducted after the database was published. • The FRTV Phase 1 dataset has been criticised • Does not include distortions from H.264. • Contains only interlaced videos. • Distribution of perceptual quality is bi-modal (high and low quality clusters). • The FRTV Phase 2 dataset showed more conclusive results – (eg. VQM performed very well compared to PSNR)
Benchmarking on the LIVE VQA Dataset Statistical Significance Testing • 0 – VQM for row is worse than VQM for column • 1 – VQM for row is better than VQM for column • x – no statistically significant difference
Health Warning • Scores of methods developed after the dataset are potentially unreliable • Algorithms can be tuned to the test data => Overfitting • Hence algorithms perform poor on unseen data (eg. See vSSIM results here) • The people that developed the LIVE IQA and VQA database are the same people who developed MOVIE & SSIM • You have to trust that methods were tested fairly. Eg: • Parameter values could have been tuned to get better results • Were the database sequences used previously to tune the algorithm? • Beware of Confirmation Bias • People tend to publish results that put their work in the best light.
Summary • We have looked at • Perceptual Study Methodologies • HVS-based metrics • Feature based metrics • Motion-tuned metrics • Benchmarking metrics • Class Test • 1 hour open book exam – 10% of total mark • Any written/printed material allowed, but no laptops/smartphones etc. or internet. • Will test your understanding of the concepts rather than information retention.