1 / 48

Scalable Video Coding with Wavelet-Based Approaches

Scalable Video Coding with Wavelet-Based Approaches. Presenter: Mahin Torki. Paper Title : “State-of-the-Art and Trends in Scalable Video Compression With Wavelet-Based Approaches” Authors : Nicola Adami, Alberto Signoroni, Ricardo Leonardi

vbenavides
Download Presentation

Scalable Video Coding with Wavelet-Based Approaches

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scalable Video Coding with Wavelet-Based Approaches Presenter: Mahin Torki ENSC 820 - Simon Fraser University

  2. Paper Title: “State-of-the-Art and Trends in Scalable Video Compression With Wavelet-Based Approaches” Authors: Nicola Adami, Alberto Signoroni, Ricardo Leonardi IEEE Transactions on Circuits and Systems for Video Technology, Vol. 17, No. 9, September 2007 ENSC 820 - Simon Fraser University

  3. Outline • Motivation • Wavelet SVC (WSVC) Fundamentals • Coding Architectures for WSVC Systems • WSVC Reference Platform in MPEG • Comparison between WSVC and SVC • Conclusion ENSC 820 - Simon Fraser University

  4. Motivation • Several working points corresponding to different quality, picture size and frame rate in a unique bit stream • Two types of SVC systems: • Hybrid schemes (used in all MPEG-x or H.26x standards) • Spatio-temporal wavelet technologies • Main difference of SVC and transcoding systems • Low complexity • Do not require coding/decoding operations • Simple parsing operation on the coded bitstream ENSC 820 - Simon Fraser University

  5. Motivation Decode according torequired QoS oravailable hardware resources. Encode once ENSC 820 - Simon Fraser University

  6. A Typical SVC System ENSC 820 - Simon Fraser University

  7. A possible structure of an SVC bitstream ENSC 820 - Simon Fraser University

  8. Extracting a scaled bitstream ENSC 820 - Simon Fraser University

  9. Tools Enabling Scalability • A multi-resolution signal decomposition inherently enables a low to high • resolution scalability by representing the signal in transformed domain ENSC 820 - Simon Fraser University

  10. Tools Enabling Scalability • Inter-Scale Prediction (ISP) • The simplest way to represent a signal with two resolutions • The signal x can be seen as a coarse resolution c and a detailed signal • Not critically sampled • Laplacian Pyramid • An iterated version of ISP • Results in a coarsest resolution signal c and a set of details ENSC 820 - Simon Fraser University

  11. Laplacian Pyramid ENSC 820 - Simon Fraser University

  12. Spatial Scalability • Discrete Wavelet Transform (DWT) • Projects the signal in a set of multi-resolution (MR) subspaces • Critically sampled • Generates a coarse signal and a set of details • For multi-dimensional signals like images • Separable pyramidal and DWT decompositions • Separate filtering on rows and columns ENSC 820 - Simon Fraser University

  13. DWT Filter Bank • Implementing DWT by a two-channel filter bank iterated on a dyadic tree path ENSC 820 - Simon Fraser University

  14. Bit-plane Coder 2D-DWT Transform • 2D Wavelet decomposition inherently provides spatial scalability ENSC 820 - Simon Fraser University

  15. Spatial Scalability • Lifting scheme • Alternative spatial domain processing introduced by Sweldens • Generates a critically sampled (c,d) representation of the signal x ENSC 820 - Simon Fraser University

  16. Lifting Scheme • Signal x is split in two polyphase components, even and odd samples(each one half the original resolution) • Two components are correlated • A prediction can be performed • The subsampled signal could contain a lot of aliased components, so, it should be updated • Perfect reconstruction is guaranteed • Every DWT can be factorized in a chain of lifting steps • Has a fundamental role in MC Temporal Filtering (MCTF) ENSC 820 - Simon Fraser University

  17. Temporal Scalability • Motion Compensating Temporal Filter (MCTF) • A key tool enabling temporal scalability while exploiting temporal correlation ENSC 820 - Simon Fraser University

  18. MCTF implementation by Lifting steps • Index i has now a temporal meaning • P and U can be guided by motion information ENSC 820 - Simon Fraser University

  19. MCTF implementation by Lifting steps • ME/MC implemented according to a certain motion model • ME/MC usually generate a set of motion vector fields mv(l,k) • mv(l,k) is estimation of the trajectory of the blocks between the temporal frames, at spatial level l, involved in the kthMCTF temporal decomposition level • With lifting structure, non-dyadic temporal decomposition is possible • Temporal scalability factors different from a power of two ENSC 820 - Simon Fraser University

  20. Some benefits of MCTF • By exploiting local adaptability of P and U operators and using mv(l,k) information, MCTF can handle: • Handle occlusion and uncovered area problems • Blocking effects can be reduced by considering adjacent blocks • When fractional pixel MVs are provided, the lifting structure can be modified to implement the necessary pixel interpolation ENSC 820 - Simon Fraser University

  21. L L L 0 0 0 L L L 0 0 0 L L L 0 0 0 L L L 0 0 0 L L L 0 0 0 L L L 0 0 0 L L L 0 0 0 L L L 0 0 0 L L L 0 0 0 L L L 0 0 0 L L L 0 0 0 L L L 0 0 0 H H H 1 1 1 H H H 1 1 1 H H H 1 1 1 H H H 1 1 1 H H H 1 1 1 H H H 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 L L L L L L L L L L L L L L L L L L 2 2 2 2 2 2 2 2 2 H H H H H H H H H 2 2 2 2 2 2 2 2 2 L L L L L L L L L H H H 3 3 3 H H H 3 3 3 L L L 3 3 3 MCTF ENSC 820 - Simon Fraser University

  22. Hybrid temporal and spatial scalability video sequence 1 temporal level st H 2 temporal level nd LH 3 temporal level rd LLH LLL ENSC 820 - Simon Fraser University

  23. Quality Scalability • Wavelet-based image compression schemes, provide high R-D performance with limited computational complexity • They do not interfere with spatial scalability requirements • High degree of quality scalability • Truncating the coded bitstream at arbitrary points • Most techniques are inspired from zero tree idea • Embedded Zero Tree Wavelet (EZTW) by Shapiro • SPIHT, reformulated EZTW by Said and Pearlman • Embedded Zero Block Coding (EZBC), with higher performance • Embedded Block Coding with Optimized Truncation (EBCOT) • Do not use zero tree idea • Adopted in JPEG2000 • Combines layered block coding, block-based R-D optimizations, and Context-based arithmetic coding • Good scalability and high coding efficiency ENSC 820 - Simon Fraser University

  24. WSVC Notation • xS(n) (xT(m)):the original signal undergoes an n-level (m-level) multi-resolutional spatial (temporal) Transform S(n) (T(n)) • The spatially transformed signal consist of the subband set: • is the decoded version of the original signal x, at given temporal resolution k and spatial resolution l at reduced quality rate ENSC 820 - Simon Fraser University

  25. Basic WSVC Architectures • T+2D • 2D+T • Adaptive Architectures • Multiscale Pyramids ENSC 820 - Simon Fraser University

  26. Basic WSVC Architectures • T+2D • Temporal transform is applied before spatial • Guarantees critically sampled subbands • Low spatial scalability performance • Full resolution motion vectors ENSC 820 - Simon Fraser University

  27. Basic WSVC Architectures • 2D+T • Spatial transform is applied before temporal • Often called In-band MCTF (IBMCTF) • Estimation of mv(l,k) is made independently on each spatial level • Leading to a structurally scalable motion representation • Spatial and temporal scalability are more decoupled • Lower coding efficiency especially at higher temporal resolutions ENSC 820 - Simon Fraser University

  28. Basic WSVC Architectures • Adaptive Architectures • Combine the positive aspects of T+2D and 2D+T structures • Adaptive spatio-temporal decompositions optimized with respect to suitable criteria • Content-adaptive 2D+T versus T+2D improves coding performance • Multiscale Pyramids • Also called 2D+T+2D • Compensates the T+2D versus 2D+T drawbacks • Uses ISP to exploit the multiscale representation redundancy • Disadvantage: over-complete transforms, which result in a full size residual image ENSC 820 - Simon Fraser University

  29. Pyramidal WSVC with pyramidal decomposition before MCTF ENSC 820 - Simon Fraser University

  30. Pyramidal WSVC with pyramidal decomposition after MCTF ENSC 820 - Simon Fraser University

  31. Spatio-Temporal prediction (STP)-Tool Scheme • Promising WSVC architecture which presents some similarities to the SVC standard • Adopted as a possible configuration of the MPEG VidWav (Video Wavelet) reference software • Based on a multiscale pyramid but differs in the ISP mechanism ENSC 820 - Simon Fraser University

  32. STP-Tool Scheme ENSC 820 - Simon Fraser University

  33. Advantages of STP-Tool Scheme • Prediction is performed between two signals which are likely to bear similar pattern in the spatio-temporal domain • No need to perform any interpolation • Instead of full resolution residuals, the spatio- temporal subbands and residues are produced for different resolutions ENSC 820 - Simon Fraser University

  34. WSVC Reference Platform in MPEG • In 2004, the ISO/MPEG set up a formal evaluation of SVC • Performance of H.264/AVC pyramid appeared the most competitive • Later, MPEG and IEC/ITU-T jointly adopted JSVM (Joint Scalable Video Coding) • As scalable reference model and software platform • Microsoft Research Asia (MRA) was selected as the reference for wavelet technologies • The MPEG WSVC reference model and software (RM/RS) is indicated as VidWav (Video Wavelet) ENSC 820 - Simon Fraser University

  35. VidWav: General framework ENSC 820 - Simon Fraser University

  36. VidWav: Main modules • Spatial Transform • with pre- and post-spatial decomposition, different SVC configurations (T+2D, 2D+T, STP-Tool) can be implemented. • Temporal Transform • Framewise MC wavelet transform on a lifting structure • ME and Coding • MB-based motion model with H.264/AVC like partition patterns • Forward, backward or bidirectional motion model for each block • Entropy coding • 3D extension of the EBCOT algorithm is used for entropy coding of the resulted coeficients ENSC 820 - Simon Fraser University

  37. VidWav STP-Tool Configuration ENSC 820 - Simon Fraser University

  38. Comparison between WSVC and SVC • Single layer coding tools • Scalable coding tools ENSC 820 - Simon Fraser University

  39. Comparison between WSVC and SVC • Single layer coding tools • VidWav uses a block-based motion model • Block mode types are similar to JSVM but no Intra-mode is supported by VidWav • JSVM operates in a local manner • Divides frames into MB and treats MB separately in all coding phases • VidWav operates with a global approach • Spatio-temporal transform applied to a group of frames • Unlike JSVM, single layer VidWav only supports open loop encoding/decoding • In-loop deblocking filter in JSVM due to closed loop encoding ENSC 820 - Simon Fraser University

  40. Comparison between WSVC and SVC • Scalable coding tools • Spatial scalability in JSVM compared to VidWav in STP-Tool configuration • Block-based versus frame-based • Similar to JSVC, STP-Tool can use both closed and open loop inter layer encoding ENSC 820 - Simon Fraser University

  41. Objective and Visual Result Comparisons • Fair objective comparison is impaired due to • Visually, the ref. seq. generated by wavelet filters are more detailed, but sometimes have spatial aliasing effects due to different down sampling filters • Depending on the spatial down-sampling filter used, reduced spatial resolution decoded seq. differ even at full quality • PSNR is used as the performance criterion at intermediate spatio-temporal resolution levels ENSC 820 - Simon Fraser University

  42. Objective Comparison Results ENSC 820 - Simon Fraser University

  43. Subjective Comparison Results • Visual tests conducted by ISO/MPEG included 12 expert viewers • On average JSVM 4.0 is superior • Marginal gains in SNR conditions • Superior gains in combined scalability settings ENSC 820 - Simon Fraser University

  44. Applications of WSVC • Based on a series of experiments: • DCT-based technologies outperform wavelet-based ones for relatively smooth signals and vice versa • Eligible applications for WSVC are those that produce or use High Definition/High Resolution content ENSC 820 - Simon Fraser University

  45. Home distribution of HD video using WSVC ENSC 820 - Simon Fraser University

  46. New Application Potentials for WSVC • HD material storage and distribution • Use nondyadic wavelet decomposition to support multiple HD formats to be used in video surveillance and mobile video • efficient similarity search in large video databases • Multiple descriptions coding • Space variant resolution adaptive decoding • Only a certain region of the image is decoded at high resolution ENSC 820 - Simon Fraser University

  47. Conclusion • Brief review of different tools used in WSVC • WSVC architectures are introduced • Comparison of WSVC with SVC • Potential applications for WSVC ENSC 820 - Simon Fraser University

  48. Any questions? • Thank you! ENSC 820 - Simon Fraser University

More Related