480 likes | 490 Views
Scalable Video Coding with Wavelet-Based Approaches. Presenter: Mahin Torki. Paper Title : “State-of-the-Art and Trends in Scalable Video Compression With Wavelet-Based Approaches” Authors : Nicola Adami, Alberto Signoroni, Ricardo Leonardi
E N D
Scalable Video Coding with Wavelet-Based Approaches Presenter: Mahin Torki ENSC 820 - Simon Fraser University
Paper Title: “State-of-the-Art and Trends in Scalable Video Compression With Wavelet-Based Approaches” Authors: Nicola Adami, Alberto Signoroni, Ricardo Leonardi IEEE Transactions on Circuits and Systems for Video Technology, Vol. 17, No. 9, September 2007 ENSC 820 - Simon Fraser University
Outline • Motivation • Wavelet SVC (WSVC) Fundamentals • Coding Architectures for WSVC Systems • WSVC Reference Platform in MPEG • Comparison between WSVC and SVC • Conclusion ENSC 820 - Simon Fraser University
Motivation • Several working points corresponding to different quality, picture size and frame rate in a unique bit stream • Two types of SVC systems: • Hybrid schemes (used in all MPEG-x or H.26x standards) • Spatio-temporal wavelet technologies • Main difference of SVC and transcoding systems • Low complexity • Do not require coding/decoding operations • Simple parsing operation on the coded bitstream ENSC 820 - Simon Fraser University
Motivation Decode according torequired QoS oravailable hardware resources. Encode once ENSC 820 - Simon Fraser University
A Typical SVC System ENSC 820 - Simon Fraser University
A possible structure of an SVC bitstream ENSC 820 - Simon Fraser University
Extracting a scaled bitstream ENSC 820 - Simon Fraser University
Tools Enabling Scalability • A multi-resolution signal decomposition inherently enables a low to high • resolution scalability by representing the signal in transformed domain ENSC 820 - Simon Fraser University
Tools Enabling Scalability • Inter-Scale Prediction (ISP) • The simplest way to represent a signal with two resolutions • The signal x can be seen as a coarse resolution c and a detailed signal • Not critically sampled • Laplacian Pyramid • An iterated version of ISP • Results in a coarsest resolution signal c and a set of details ENSC 820 - Simon Fraser University
Laplacian Pyramid ENSC 820 - Simon Fraser University
Spatial Scalability • Discrete Wavelet Transform (DWT) • Projects the signal in a set of multi-resolution (MR) subspaces • Critically sampled • Generates a coarse signal and a set of details • For multi-dimensional signals like images • Separable pyramidal and DWT decompositions • Separate filtering on rows and columns ENSC 820 - Simon Fraser University
DWT Filter Bank • Implementing DWT by a two-channel filter bank iterated on a dyadic tree path ENSC 820 - Simon Fraser University
Bit-plane Coder 2D-DWT Transform • 2D Wavelet decomposition inherently provides spatial scalability ENSC 820 - Simon Fraser University
Spatial Scalability • Lifting scheme • Alternative spatial domain processing introduced by Sweldens • Generates a critically sampled (c,d) representation of the signal x ENSC 820 - Simon Fraser University
Lifting Scheme • Signal x is split in two polyphase components, even and odd samples(each one half the original resolution) • Two components are correlated • A prediction can be performed • The subsampled signal could contain a lot of aliased components, so, it should be updated • Perfect reconstruction is guaranteed • Every DWT can be factorized in a chain of lifting steps • Has a fundamental role in MC Temporal Filtering (MCTF) ENSC 820 - Simon Fraser University
Temporal Scalability • Motion Compensating Temporal Filter (MCTF) • A key tool enabling temporal scalability while exploiting temporal correlation ENSC 820 - Simon Fraser University
MCTF implementation by Lifting steps • Index i has now a temporal meaning • P and U can be guided by motion information ENSC 820 - Simon Fraser University
MCTF implementation by Lifting steps • ME/MC implemented according to a certain motion model • ME/MC usually generate a set of motion vector fields mv(l,k) • mv(l,k) is estimation of the trajectory of the blocks between the temporal frames, at spatial level l, involved in the kthMCTF temporal decomposition level • With lifting structure, non-dyadic temporal decomposition is possible • Temporal scalability factors different from a power of two ENSC 820 - Simon Fraser University
Some benefits of MCTF • By exploiting local adaptability of P and U operators and using mv(l,k) information, MCTF can handle: • Handle occlusion and uncovered area problems • Blocking effects can be reduced by considering adjacent blocks • When fractional pixel MVs are provided, the lifting structure can be modified to implement the necessary pixel interpolation ENSC 820 - Simon Fraser University
L L L 0 0 0 L L L 0 0 0 L L L 0 0 0 L L L 0 0 0 L L L 0 0 0 L L L 0 0 0 L L L 0 0 0 L L L 0 0 0 L L L 0 0 0 L L L 0 0 0 L L L 0 0 0 L L L 0 0 0 H H H 1 1 1 H H H 1 1 1 H H H 1 1 1 H H H 1 1 1 H H H 1 1 1 H H H 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 L L L L L L L L L L L L L L L L L L 2 2 2 2 2 2 2 2 2 H H H H H H H H H 2 2 2 2 2 2 2 2 2 L L L L L L L L L H H H 3 3 3 H H H 3 3 3 L L L 3 3 3 MCTF ENSC 820 - Simon Fraser University
Hybrid temporal and spatial scalability video sequence 1 temporal level st H 2 temporal level nd LH 3 temporal level rd LLH LLL ENSC 820 - Simon Fraser University
Quality Scalability • Wavelet-based image compression schemes, provide high R-D performance with limited computational complexity • They do not interfere with spatial scalability requirements • High degree of quality scalability • Truncating the coded bitstream at arbitrary points • Most techniques are inspired from zero tree idea • Embedded Zero Tree Wavelet (EZTW) by Shapiro • SPIHT, reformulated EZTW by Said and Pearlman • Embedded Zero Block Coding (EZBC), with higher performance • Embedded Block Coding with Optimized Truncation (EBCOT) • Do not use zero tree idea • Adopted in JPEG2000 • Combines layered block coding, block-based R-D optimizations, and Context-based arithmetic coding • Good scalability and high coding efficiency ENSC 820 - Simon Fraser University
WSVC Notation • xS(n) (xT(m)):the original signal undergoes an n-level (m-level) multi-resolutional spatial (temporal) Transform S(n) (T(n)) • The spatially transformed signal consist of the subband set: • is the decoded version of the original signal x, at given temporal resolution k and spatial resolution l at reduced quality rate ENSC 820 - Simon Fraser University
Basic WSVC Architectures • T+2D • 2D+T • Adaptive Architectures • Multiscale Pyramids ENSC 820 - Simon Fraser University
Basic WSVC Architectures • T+2D • Temporal transform is applied before spatial • Guarantees critically sampled subbands • Low spatial scalability performance • Full resolution motion vectors ENSC 820 - Simon Fraser University
Basic WSVC Architectures • 2D+T • Spatial transform is applied before temporal • Often called In-band MCTF (IBMCTF) • Estimation of mv(l,k) is made independently on each spatial level • Leading to a structurally scalable motion representation • Spatial and temporal scalability are more decoupled • Lower coding efficiency especially at higher temporal resolutions ENSC 820 - Simon Fraser University
Basic WSVC Architectures • Adaptive Architectures • Combine the positive aspects of T+2D and 2D+T structures • Adaptive spatio-temporal decompositions optimized with respect to suitable criteria • Content-adaptive 2D+T versus T+2D improves coding performance • Multiscale Pyramids • Also called 2D+T+2D • Compensates the T+2D versus 2D+T drawbacks • Uses ISP to exploit the multiscale representation redundancy • Disadvantage: over-complete transforms, which result in a full size residual image ENSC 820 - Simon Fraser University
Pyramidal WSVC with pyramidal decomposition before MCTF ENSC 820 - Simon Fraser University
Pyramidal WSVC with pyramidal decomposition after MCTF ENSC 820 - Simon Fraser University
Spatio-Temporal prediction (STP)-Tool Scheme • Promising WSVC architecture which presents some similarities to the SVC standard • Adopted as a possible configuration of the MPEG VidWav (Video Wavelet) reference software • Based on a multiscale pyramid but differs in the ISP mechanism ENSC 820 - Simon Fraser University
STP-Tool Scheme ENSC 820 - Simon Fraser University
Advantages of STP-Tool Scheme • Prediction is performed between two signals which are likely to bear similar pattern in the spatio-temporal domain • No need to perform any interpolation • Instead of full resolution residuals, the spatio- temporal subbands and residues are produced for different resolutions ENSC 820 - Simon Fraser University
WSVC Reference Platform in MPEG • In 2004, the ISO/MPEG set up a formal evaluation of SVC • Performance of H.264/AVC pyramid appeared the most competitive • Later, MPEG and IEC/ITU-T jointly adopted JSVM (Joint Scalable Video Coding) • As scalable reference model and software platform • Microsoft Research Asia (MRA) was selected as the reference for wavelet technologies • The MPEG WSVC reference model and software (RM/RS) is indicated as VidWav (Video Wavelet) ENSC 820 - Simon Fraser University
VidWav: General framework ENSC 820 - Simon Fraser University
VidWav: Main modules • Spatial Transform • with pre- and post-spatial decomposition, different SVC configurations (T+2D, 2D+T, STP-Tool) can be implemented. • Temporal Transform • Framewise MC wavelet transform on a lifting structure • ME and Coding • MB-based motion model with H.264/AVC like partition patterns • Forward, backward or bidirectional motion model for each block • Entropy coding • 3D extension of the EBCOT algorithm is used for entropy coding of the resulted coeficients ENSC 820 - Simon Fraser University
VidWav STP-Tool Configuration ENSC 820 - Simon Fraser University
Comparison between WSVC and SVC • Single layer coding tools • Scalable coding tools ENSC 820 - Simon Fraser University
Comparison between WSVC and SVC • Single layer coding tools • VidWav uses a block-based motion model • Block mode types are similar to JSVM but no Intra-mode is supported by VidWav • JSVM operates in a local manner • Divides frames into MB and treats MB separately in all coding phases • VidWav operates with a global approach • Spatio-temporal transform applied to a group of frames • Unlike JSVM, single layer VidWav only supports open loop encoding/decoding • In-loop deblocking filter in JSVM due to closed loop encoding ENSC 820 - Simon Fraser University
Comparison between WSVC and SVC • Scalable coding tools • Spatial scalability in JSVM compared to VidWav in STP-Tool configuration • Block-based versus frame-based • Similar to JSVC, STP-Tool can use both closed and open loop inter layer encoding ENSC 820 - Simon Fraser University
Objective and Visual Result Comparisons • Fair objective comparison is impaired due to • Visually, the ref. seq. generated by wavelet filters are more detailed, but sometimes have spatial aliasing effects due to different down sampling filters • Depending on the spatial down-sampling filter used, reduced spatial resolution decoded seq. differ even at full quality • PSNR is used as the performance criterion at intermediate spatio-temporal resolution levels ENSC 820 - Simon Fraser University
Objective Comparison Results ENSC 820 - Simon Fraser University
Subjective Comparison Results • Visual tests conducted by ISO/MPEG included 12 expert viewers • On average JSVM 4.0 is superior • Marginal gains in SNR conditions • Superior gains in combined scalability settings ENSC 820 - Simon Fraser University
Applications of WSVC • Based on a series of experiments: • DCT-based technologies outperform wavelet-based ones for relatively smooth signals and vice versa • Eligible applications for WSVC are those that produce or use High Definition/High Resolution content ENSC 820 - Simon Fraser University
Home distribution of HD video using WSVC ENSC 820 - Simon Fraser University
New Application Potentials for WSVC • HD material storage and distribution • Use nondyadic wavelet decomposition to support multiple HD formats to be used in video surveillance and mobile video • efficient similarity search in large video databases • Multiple descriptions coding • Space variant resolution adaptive decoding • Only a certain region of the image is decoded at high resolution ENSC 820 - Simon Fraser University
Conclusion • Brief review of different tools used in WSVC • WSVC architectures are introduced • Comparison of WSVC with SVC • Potential applications for WSVC ENSC 820 - Simon Fraser University
Any questions? • Thank you! ENSC 820 - Simon Fraser University