Lecture 3: Multimedia Networks Lecture 4: Audio/Video Compression

Audio/Video Compression 4 • Lecture 3: Multimedia Networks • Lecture 4: Audio/Video Compression • Image & Video Compression Standards • Speech & Audio Compression Standards • Wavelet Transform & its Application in Compression

Introduction to Audio/Video Compression 4 • With today’s technology, only compression makes storage/transmission of digital audio/video streams possible • Redundancy exploitation for compression based on human perceptive features

Introduction to Audio/Video Compression 4 • Spatial redundancy: Values of neighboring pixels strongly correlated in natural images • Temporal redundancy: Adjacent frames in a video sequence often show very little change, a strong audio signal in a given time segment can mask certain lower level distortion in future & past segments

Introduction to Audio/Video Compression 4 • Spectral redundancy: In multispectral images, spectral values of same pixel across spectral bands correlated, an audio signal can completely mask a sufficiently weaker signal in its frequency-vicinity • Redundancy across scale: Distinct image features invariant under scaling • Redundancy in stereo: Correlations between stereo images/audio channels

Introduction to Audio/Video Compression 4 • Spatial/spectral redundancies: Transform Coding • Temporal redundancy: DPCM (differential pulse code modulation), motion estimation/motion compensation • First compression methods: lossless • Huffman coding • Ziv-Lempel coding • Arithmetic coding • Inadequate for transmission media of low bandwidth (e.g., ISDN) or for devices of low data throughput (e.g., CD-ROM)

Introduction to Audio/Video Compression 4 • Lossless vs. lossy compression • Intraframe vs. interframe compression • Symmetrical vs. asymmetrical compression • Real-time: Encoding-decoding delay<=50 ms • Scalable: Frames coded at different resolutions or quality levels • Recent advanced compression methods reduce bandwidths enormously without reduction of perceptive quality

Introduction to Audio/Video Compression 4 • Entropy coding: Arithmetic coding, Huffman coding, Run-length coding • Source coding: DPCM, DCT, DWT, motion-estimation/motion compensation • Hybrid Coding: H.261, H.263, H.263+, JPEG, MPEG1, MPEG2, MPEG4, Perceptual Audio Coder Preprocessing Source coding Entropy coding Uncompressed data Compressed data Hybrid coding = source coding + entropy coding

Wavelet Theory 4 • A unified framework for analysis of non-stationary signals • Wavelet transform (WT): Alternative to classical Short-Time Fourier Transform (STFT) or Gabor Transform • By contrast to STFT, WT does “constant-Q” or relative bandwidth frequency analysis: short windows at high frequencies and long windows at low frequencies

Short-Time Fourier Transform 4 • Fourier Transform (FT): • X(f): Projection of signal x(t) along exp(j2ft) • How signal energy being distributed over frequencies

Short-Time Fourier Transform 4 • To know local energy distribution, STFT is introduced: • g(t): A window of finite support • Around local time , how signal energy being distributed over frequencies

Short-Time Fourier Transform 4 • Given f, STFT(,): Output of a bandpass filter having the window function (modulated to f) as its impulse response • Resolution in time/frequency by window g(t):

Short-Time Fourier Transform 4 • Uncertainty Principle (Heisenberg): • Once window g(t) chosen, resolution in time/frequency fixed

Continuous Wavelet Transform (CWT) 4 • If  can be kept constant, resolution in frequency becomes arbitrarily good at low frequencies while resolution in time becomes arbitrarily good at high frequencies • CWT follows the above idea but all impulse responses of filter bank are defined as scaled versions of the same prototype or basic wavelet h(t)

Continuous Wavelet Transform (CWT) 4 • Let • h(t): Any bandpass function

Continuous Wavelet Transform (CWT) 4 • FT of ha(t):

Continuous Wavelet Transform (CWT) 4 • Resolution in frequency of ha(t):

Continuous Wavelet Transform (CWT) 4 • Given a fixed frequency f0, if scale a is chosen as

Continuous Wavelet Transform (CWT) 4 • By definition of CWT: • Scale a not linked to frequency modulation but related to time-scaling

Continuous Wavelet Transform (CWT) 4 • Signal x(at) seen through a constant length filter centered at /a • Larger scale a is, more contracted signal x(t) becomes • Smaller scale a is, more dilated signal x(t) becomes • Larger scales: CWT(,a) provides more global view of signal x(t) • Smaller scales: CWT(,a) provides more detailed view of signal x(t)

Continuous Wavelet Transform (CWT) 4 • Define wavelet ha, • :Inner product or correlation between x(t) and ha, • CWT(,a) called analysis stage (of signal x(t)) at scale a

Continuous Wavelet Transform (CWT) 4 • x(t) can be recovered from multi-scale analysis if

Continuous Wavelet Transform (CWT) 4 • Energy conservation: • Signal energy distributed at scale a by: • : wavelet spectrogram, or scalogram, distribution of signal energy in time-scale plane (associated with area measure )

Continuous Wavelet Transform (CWT) 4 • Larger scales  more global view  courser resolutions • Smaller scales  more detailed view  finer resolutions • CWT decomposition of signal over scales  signal energy distribution with various resolutions

Discrete Wavelet Transform (DWT) 4 • Two methods developed independently in late 70’s and early 80’s • Subband Coding • Pyramid Coding or multiresolution signal analysis

Multiresolution Pyramid 4 • Given an original sequence x(n), n  Z, define a lower resolution signal: Where g(n) : a halfband lowpass filter

Multiresolution Pyramid 4 • An approximation of x(n) from y(n) : Where y’(2n) = y(n), y’(2n+1) = 0 g’(n) : an interpolative filter

Multiresolution Pyramid 4 • If g(n) and g’(n) are perfect halfband filters, i.e., then a(n) provides a perfect halfband lowpass approximation to x(n)

Multiresolution Pyramid 4 • It can be proved :

Multiresolution Pyramid 4 • Let d(n) = x(n) - a(n) • Then x(n) = a(n) +d(n) • But  redundancy between a(n) and d(n) : • If x(n) uses sampling rate fs , d(n) and y(n) use sampling rate fs or fs /2, respectively

Multiresolution Pyramid 4 • Pyramid decomposition : a redundant representation • But redundancy upper bounded by : 1 + 1/2 + 1/4 + … < 2 in one dimensional system x(n) y(n) y (n) d(n) d (n)

Multiresolution Pyramid 4 • For perfect halfband lowpass filters g(n) and g’(n), it is clear that d(n) contains frequencies above /2 of x(n), and thus can also be subsampled by two without loss of information. • In a pyramid, it is possible to take very good lowpass filters and derive visually pleasing course versions • In a subband scheme, critical sampling is accomplished at a price of a constraint filter design and a relatively poor lowpass version as a course approximation : undesirable if the course version is used for viewing in a compatible subchannel

Subband Coding 4 • One stage of a pyramid decomposition  a half rate low resolution signal + a full rate difference signal • # (samples) increased by 50% • If filter g(n) and g’(n) meet certain conditions, oversampling can be avoided • Subband coding first popularized in speech compression does not produce such redundancy

Subband Coding 4 • A full-band one dimensional signal is decomposed into two subbands using an analysis filter bank • Ideally, the analysis filter bank consists of a lowpass filter and a highpass filter with nonoverlapping frequency responses and unit gain over their respective bandwidth • After filtering, lowpass and highpass signals each have only a half of original bandwidth or “frequency content”, and thus can be downsampled in half • But ideal filters are unrealizable

Subband Coding 4 • By using overlapping responses, frequency gaps in subband signals can be prevented • Aliasing will be introduced when lowpass and highpass signals are downsampled in half • The aliasing effect can be eliminated to produce perfect reconstruction at synthesis stage • Lowpass and highpass signals will each have a bandwidth more than a half of original bandwidth • Quadrature Mirror Filters (QMF) for analysis/synthesis filtering

Subband Coding 4 • Output signals from analysis bank after downsampling: y1(n)=(h1*x)(2n) y2(n)=(h2*x)(2n) • After quantization, y1(n) and y2(n)  • After upsampling, become:

Subband Coding 4 • Output signals from synthesis bank: • Reconstructed signal:

Subband Coding 4 • Ignoring quantization or coding effect, • If H1(z), G1(z) are ideal lowpass filters and H2(z), G2(z) are ideal highpass filters,

Subband Coding 4 • Then

Subband Coding 4 • Implying • Indicating is the aliasing component when filters are not ideal, which is desired to be zero

Subband Coding 4 • To have perfect reconstruction in non-ideal filtering case, the iff conditions are: • If H2(z)=H1(-z), G1(z)=2H1(z), G2(z)=-2H1(-z), the aliased term becomes zero and the reconstructed is given:

Subband Coding 4 • For perfect reconstruction, we need or • Using symmetric linear phase FIR of length N for H1 results in

Subband Coding 4 • As N=even, • QMF filters 1  0 /2 

Subband Coding 4 • If subband filters Hi(z), Gi(z) satisfy three conditions perfect reconstruction results, too • Aliased term

Multiresolution Wavelet Representation and Approximation 4 • Embedded linear spaces in L2(R): • Let Aj be an orthogonal projection on Vj: • Let Ojbe the orthonormal complement of Vj in Vj+1:

Multiresolution Wavelet Representation and Approximation 4 • Let Dj be an orthogonal projection on Oj : • Then an original signal A0f can be decomposed as:

Multiresolution Wavelet Representation and Approximation 4 • A-J f = the orthogonal projection of A0f on • D-j f = the orthogonal projection of A0f on O-j • D-j f and D-k f : orthogonal to each other or uncorrelated to each other • D-j f : orthogonal to A-J f , or uncorrelated to A-J f • A-J f : a coarse version of A0f • : details of A0f arranged from coarser to finer

Multiresolution Wavelet Representation and Approximation 4 • Let be an orthonormal basis of Vj: • Aj f can be characterized by the coefficients of orthonormal expansion: • The sequence denoted by and called a discrete approximation of f in Vj

Multiresolution Wavelet Representation and Approximation 4 • Let be an orthonormal basis of Oj • Dj f characterized by the coefficients • The sequence denoted by and called a discrete approximation of f in Oj

Multiresolution Wavelet Representation and Approximation 4 • Thus, A0f can be characterized by • can be further characterized by • This set of discrete signals is called orthogonal “wavelet” representation • is organized as a coarse version added by increasing fine details • The orthogonal representation: decorrelated representation

Multiresolution Wavelet Representation and Approximation 4 • If we require: • Aj f is band-limited such that it can be sampled by a rate of 2j, i.e., 2j samples per time or length unit

Lecture 3: Multimedia Networks Lecture 4: Audio/Video Compression