1 / 63

Lecture 3: Multimedia Networks Lecture 4: Audio/Video Compression

Audio/Video Compression 4. Lecture 3: Multimedia Networks Lecture 4: Audio/Video Compression Image & Video Compression Standards Speech & Audio Compression Standards

draco
Download Presentation

Lecture 3: Multimedia Networks Lecture 4: Audio/Video Compression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Audio/Video Compression 4 • Lecture 3: Multimedia Networks • Lecture 4: Audio/Video Compression • Image & Video Compression Standards • Speech & Audio Compression Standards • Wavelet Transform & its Application in Compression

  2. Introduction to Audio/Video Compression 4 • With today’s technology, only compression makes storage/transmission of digital audio/video streams possible • Redundancy exploitation for compression based on human perceptive features

  3. Introduction to Audio/Video Compression 4 • Spatial redundancy: Values of neighboring pixels strongly correlated in natural images • Temporal redundancy: Adjacent frames in a video sequence often show very little change, a strong audio signal in a given time segment can mask certain lower level distortion in future & past segments

  4. Introduction to Audio/Video Compression 4 • Spectral redundancy: In multispectral images, spectral values of same pixel across spectral bands correlated, an audio signal can completely mask a sufficiently weaker signal in its frequency-vicinity • Redundancy across scale: Distinct image features invariant under scaling • Redundancy in stereo: Correlations between stereo images/audio channels

  5. Introduction to Audio/Video Compression 4 • Spatial/spectral redundancies: Transform Coding • Temporal redundancy: DPCM (differential pulse code modulation), motion estimation/motion compensation • First compression methods: lossless • Huffman coding • Ziv-Lempel coding • Arithmetic coding • Inadequate for transmission media of low bandwidth (e.g., ISDN) or for devices of low data throughput (e.g., CD-ROM)

  6. Introduction to Audio/Video Compression 4 • Lossless vs. lossy compression • Intraframe vs. interframe compression • Symmetrical vs. asymmetrical compression • Real-time: Encoding-decoding delay<=50 ms • Scalable: Frames coded at different resolutions or quality levels • Recent advanced compression methods reduce bandwidths enormously without reduction of perceptive quality

  7. Introduction to Audio/Video Compression 4 • Entropy coding: Arithmetic coding, Huffman coding, Run-length coding • Source coding: DPCM, DCT, DWT, motion-estimation/motion compensation • Hybrid Coding: H.261, H.263, H.263+, JPEG, MPEG1, MPEG2, MPEG4, Perceptual Audio Coder Preprocessing Source coding Entropy coding Uncompressed data Compressed data Hybrid coding = source coding + entropy coding

  8. Wavelet Theory 4 • A unified framework for analysis of non-stationary signals • Wavelet transform (WT): Alternative to classical Short-Time Fourier Transform (STFT) or Gabor Transform • By contrast to STFT, WT does “constant-Q” or relative bandwidth frequency analysis: short windows at high frequencies and long windows at low frequencies

  9. Short-Time Fourier Transform 4 • Fourier Transform (FT): • X(f): Projection of signal x(t) along exp(j2ft) • How signal energy being distributed over frequencies

  10. Short-Time Fourier Transform 4 • To know local energy distribution, STFT is introduced: • g(t): A window of finite support • Around local time , how signal energy being distributed over frequencies

  11. Short-Time Fourier Transform 4 • Given f, STFT(,): Output of a bandpass filter having the window function (modulated to f) as its impulse response • Resolution in time/frequency by window g(t):

  12. Short-Time Fourier Transform 4 • Uncertainty Principle (Heisenberg): • Once window g(t) chosen, resolution in time/frequency fixed

  13. Continuous Wavelet Transform (CWT) 4 • If  can be kept constant, resolution in frequency becomes arbitrarily good at low frequencies while resolution in time becomes arbitrarily good at high frequencies • CWT follows the above idea but all impulse responses of filter bank are defined as scaled versions of the same prototype or basic wavelet h(t)

  14. Continuous Wavelet Transform (CWT) 4 • Let • h(t): Any bandpass function

  15. Continuous Wavelet Transform (CWT) 4 • FT of ha(t):

  16. Continuous Wavelet Transform (CWT) 4 • Resolution in frequency of ha(t):

  17. Continuous Wavelet Transform (CWT) 4 • Given a fixed frequency f0, if scale a is chosen as

  18. Continuous Wavelet Transform (CWT) 4 • By definition of CWT: • Scale a not linked to frequency modulation but related to time-scaling

  19. Continuous Wavelet Transform (CWT) 4 • Signal x(at) seen through a constant length filter centered at /a • Larger scale a is, more contracted signal x(t) becomes • Smaller scale a is, more dilated signal x(t) becomes • Larger scales: CWT(,a) provides more global view of signal x(t) • Smaller scales: CWT(,a) provides more detailed view of signal x(t)

  20. Continuous Wavelet Transform (CWT) 4 • Define wavelet ha, • :Inner product or correlation between x(t) and ha, • CWT(,a) called analysis stage (of signal x(t)) at scale a

  21. Continuous Wavelet Transform (CWT) 4 • x(t) can be recovered from multi-scale analysis if

  22. Continuous Wavelet Transform (CWT) 4 • Energy conservation: • Signal energy distributed at scale a by: • : wavelet spectrogram, or scalogram, distribution of signal energy in time-scale plane (associated with area measure )

  23. Continuous Wavelet Transform (CWT) 4 • Larger scales  more global view  courser resolutions • Smaller scales  more detailed view  finer resolutions • CWT decomposition of signal over scales  signal energy distribution with various resolutions

  24. Discrete Wavelet Transform (DWT) 4 • Two methods developed independently in late 70’s and early 80’s • Subband Coding • Pyramid Coding or multiresolution signal analysis

  25. Multiresolution Pyramid 4 • Given an original sequence x(n), n  Z, define a lower resolution signal: Where g(n) : a halfband lowpass filter

  26. Multiresolution Pyramid 4 • An approximation of x(n) from y(n) : Where y’(2n) = y(n), y’(2n+1) = 0 g’(n) : an interpolative filter

  27. Multiresolution Pyramid 4 • If g(n) and g’(n) are perfect halfband filters, i.e., then a(n) provides a perfect halfband lowpass approximation to x(n)

  28. Multiresolution Pyramid 4 • It can be proved :

  29. Multiresolution Pyramid 4 • Let d(n) = x(n) - a(n) • Then x(n) = a(n) +d(n) • But  redundancy between a(n) and d(n) : • If x(n) uses sampling rate fs , d(n) and y(n) use sampling rate fs or fs /2, respectively

  30. Multiresolution Pyramid 4 • Pyramid decomposition : a redundant representation • But redundancy upper bounded by : 1 + 1/2 + 1/4 + … < 2 in one dimensional system x(n) y(n) y (n) d(n) d (n)

  31. Multiresolution Pyramid 4 • For perfect halfband lowpass filters g(n) and g’(n), it is clear that d(n) contains frequencies above /2 of x(n), and thus can also be subsampled by two without loss of information. • In a pyramid, it is possible to take very good lowpass filters and derive visually pleasing course versions • In a subband scheme, critical sampling is accomplished at a price of a constraint filter design and a relatively poor lowpass version as a course approximation : undesirable if the course version is used for viewing in a compatible subchannel

  32. Subband Coding 4 • One stage of a pyramid decomposition  a half rate low resolution signal + a full rate difference signal • # (samples) increased by 50% • If filter g(n) and g’(n) meet certain conditions, oversampling can be avoided • Subband coding first popularized in speech compression does not produce such redundancy

  33. Subband Coding 4 • A full-band one dimensional signal is decomposed into two subbands using an analysis filter bank • Ideally, the analysis filter bank consists of a lowpass filter and a highpass filter with nonoverlapping frequency responses and unit gain over their respective bandwidth • After filtering, lowpass and highpass signals each have only a half of original bandwidth or “frequency content”, and thus can be downsampled in half • But ideal filters are unrealizable

  34. Subband Coding 4 • By using overlapping responses, frequency gaps in subband signals can be prevented • Aliasing will be introduced when lowpass and highpass signals are downsampled in half • The aliasing effect can be eliminated to produce perfect reconstruction at synthesis stage • Lowpass and highpass signals will each have a bandwidth more than a half of original bandwidth • Quadrature Mirror Filters (QMF) for analysis/synthesis filtering

  35. Subband Coding 4 • Output signals from analysis bank after downsampling: y1(n)=(h1*x)(2n) y2(n)=(h2*x)(2n) • After quantization, y1(n) and y2(n)  • After upsampling, become:

  36. Subband Coding 4 • Output signals from synthesis bank: • Reconstructed signal:

  37. Subband Coding 4 • Ignoring quantization or coding effect, • If H1(z), G1(z) are ideal lowpass filters and H2(z), G2(z) are ideal highpass filters,

  38. Subband Coding 4 • Then

  39. Subband Coding 4 • Implying • Indicating is the aliasing component when filters are not ideal, which is desired to be zero

  40. Subband Coding 4 • To have perfect reconstruction in non-ideal filtering case, the iff conditions are: • If H2(z)=H1(-z), G1(z)=2H1(z), G2(z)=-2H1(-z), the aliased term becomes zero and the reconstructed is given:

  41. Subband Coding 4 • For perfect reconstruction, we need or • Using symmetric linear phase FIR of length N for H1 results in

  42. Subband Coding 4 • As N=even, • QMF filters 1  0 /2 

  43. Subband Coding 4 • If subband filters Hi(z), Gi(z) satisfy three conditions perfect reconstruction results, too • Aliased term

  44. Multiresolution Wavelet Representation and Approximation 4 • Embedded linear spaces in L2(R): • Let Aj be an orthogonal projection on Vj: • Let Ojbe the orthonormal complement of Vj in Vj+1:

  45. Multiresolution Wavelet Representation and Approximation 4 • Let Dj be an orthogonal projection on Oj : • Then an original signal A0f can be decomposed as:

  46. Multiresolution Wavelet Representation and Approximation 4 • A-J f = the orthogonal projection of A0f on • D-j f = the orthogonal projection of A0f on O-j • D-j f and D-k f : orthogonal to each other or uncorrelated to each other • D-j f : orthogonal to A-J f , or uncorrelated to A-J f • A-J f : a coarse version of A0f • : details of A0f arranged from coarser to finer

  47. Multiresolution Wavelet Representation and Approximation 4 • Let be an orthonormal basis of Vj: • Aj f can be characterized by the coefficients of orthonormal expansion: • The sequence denoted by and called a discrete approximation of f in Vj

  48. Multiresolution Wavelet Representation and Approximation 4 • Let be an orthonormal basis of Oj • Dj f characterized by the coefficients • The sequence denoted by and called a discrete approximation of f in Oj

  49. Multiresolution Wavelet Representation and Approximation 4 • Thus, A0f can be characterized by • can be further characterized by • This set of discrete signals is called orthogonal “wavelet” representation • is organized as a coarse version added by increasing fine details • The orthogonal representation: decorrelated representation

  50. Multiresolution Wavelet Representation and Approximation 4 • If we require: • Aj f is band-limited such that it can be sampled by a rate of 2j, i.e., 2j samples per time or length unit

More Related