650 likes | 834 Views
Chapter 12 Multimedia Information. Lossless Data Compression Compression of Analog Signals Image and Video Coding. Bits, numbers, information. Bit: number with value 0 or 1 n bits: digital representation for 0, 1, … , 2 n Byte or Octet, n = 8 Computer word, n = 16, 32, or 64
E N D
Chapter 12Multimedia Information Lossless Data Compression Compression of Analog Signals Image and Video Coding
Bits, numbers, information • Bit: number with value 0 or 1 • n bits: digital representation for 0, 1, … , 2n • Byte or Octet, n = 8 • Computer word, n = 16, 32, or 64 • n bits allows enumeration of 2n possibilities • n-bit field in a header • n-bit representation of a voice sample • Message consisting of n bits • The number of bits required to represent a message is a measure of its information content • More bits → More content
Block Information that occurs in a single block Text message Data file JPEG image MPEG file Size = Bits / block or bytes/block 1 kbyte = 210 bytes 1 Mbyte = 220 bytes 1 Gbyte = 220 bytes Stream Information that is produced & transmitted continuously Real-time voice Streaming video Bit rate = bits / second 1 kbps = 103 bps 1 Mbps = 106 bps 1 Gbps =109 bps Block vs. Stream Information
L number of bits in message R bps speed of digital transmission system L/R time to transmit the information tprop time for signal to propagate across medium d distance in meters c speed of light (3x108 m/s in vacuum) Use data compression to reduce L Use higher speed modem to increase R Place server closer to reduce d Transmission Delay
Compression • Information usually not represented efficiently • Data compression algorithms • Represent the information using fewer bits • Noiseless: original information recovered exactly • E.g. zip, compress, GIF, fax • Noisy: recover information approximately • JPEG • Tradeoff: # bits vs. quality • Compression Ratio #bits (original file) / #bits (compressed file)
W W W W H H H H Color Image Red component image Green component image Blue component image Color image = + + Total bits = 3 H W pixels B bits/pixel = 3HWB Example: 810 inch picture at 400 400 pixels per in2 400 400 8 10 = 12.8 million pixels 8 bits/pixel/color 12.8 megapixels 3 bytes/pixel = 38.4 megabytes
Chapter 12Multimedia Information Lossless Data Compression
Lossless data compression Data expansion ASDF9H... ASDF9H... 11010101... Data Compression • Information is produced by a source • Usually contains redundancy • Lossless Data Compression system exploits redundancy to produce a more efficient (usually binary) representation of the information • Compressed stream is stored or transmitted depending on application • Data Expansion system recovers exact original information stream
Suppose information source generates symbols from A = {a1, a2, … , aK} Binary tree code K leafs 1 leaf assigned to each symbol Binary codeword for symbol aj is sequence of bits from root to corresponding leaf Encoding use table Decoding: trace path from root to leaf, output corresponding symbol; repeat 0 1 a2 0 1 a1 0 1 a3 a4 Binary Tree Codes Encoding Table a1 00 a2 1 a3 010 a4 011
Performance of Tree Code • Average number of encoded bits per source symbol • Let l(aj) = length of codeword for aj • To minimize above expression, assign short codeword to frequent symbols and longer codewords to less frequent symbols
Example • Assume • 5 symbol information source: {a,b,c,d,e} • symbol probabilities: {1/4, 1/4,1/4,1/8,1/8} 0 1 Symbol Codeword a 00 b 01 c 10 d 110 e 111 0 1 0 1 00 01 1 10 0 c a b 110 111 e d 17 bits aedbbad.... mapped into 00 111 110 01 01 00 110 ... Note: decoding done without commas or spaces
0 1 a1 a2 0 1 0 1 ? a1 a3 a2 Finding Good Tree Codes • What is the best code if K=2? • Simple! There is only one tree code: assign 0 or 1 to each of the symbols • What about K=3? • Assign the longest pair of codeword to the two least frequent symbols • If you don’t, then switching most frequent symbol to shortest codeword will reduce average length • Picking the two least probable symbols is always best thing to do
Huffman Code • Algorithm for finding optimum binary tree code for a set of symbols • A={1,2,…,K}, denote symbols by index • Symbol probabilities: {p1, p2, p3, … , pK} • Basic step: • Identify two least probable symbols, say i and j • Combine them into new symbol (i,j) with probability pi + pj • Remove i and j from A and replace them with (i,j) • New alphabet A has 1 fewer symbol • If A has two symbols, stop • Else repeat Basic Step • Building the tree code • Each time two symbols are combined join them in the binary tree
The final tree code 1 0 a a 0 b 10 c 110 d 1110 e 1111 1 0 b 1 0 c 1 0 e d Building the tree code by Huffman algorithm e d c b a .05 .15 .10 .50 .20 .15 .30 .50 1.00 E[l]=1(.5)+2(.20)+3(.15)+4(.1+.05)=1.95
What is the best performance? • Can we do better? • Huffman is optimum, so we cannot do better for A • If we take pairs of symbols, we have a different alphabet • A’={aa, ab, ac, …, ba, bb, …, ea, eb, …ee} • {(.5)(.5), (.5)(.2), ….., (.05)(.05)} • By taking pairs, triplets, and so on, we can usually improve performance • So what is the best possible performance? • Entropy of the source
Entropy of an Information Source • Suppose a source: • produces symbols from alphabet A={1,2,…,K} • with probabilities {p1, p2, p3, … , pK} • Source outputs are statistically independent of each other • Then the entropy H of the source is the best possible performance
Examples • Example 1: source with {.5, .2, .15, .10, .05} • Huffman code gave E[l]=1.95, so its pretty close to H • Example 2: source with K equiprobable symbols • Example 3: source with K=2m equiprobable symbols • Fixed-length code with m bits is optimum!
Run-Length Codes • “Blank” in strings of alphanumeric information ------$5----3-------------$2--------$3------ • “0” (white) and “1” (black) in fax documents • When one symbol is much more frequent than the rest, block codes don’t work well • Runlength codes work better • Parse the symbol stream into runs of the frequent symbol • Apply Huffman or similar code to encode the lengths of the runs
Run Length Codeword Codeword (m = 4) 1 0 00..00 0000 01 1 00..01 0001 001 2 00..10 0010 0001 3 00..11 0011 00001 4 . . 000001 5 . . 0000001 6 . . . . . . . . . . 000...01 2m – 2 11..10 1110 000...00 run >2m – 2 11..11 1111 Inputs: Binary Runlength Code 1 m • Use m-bit counter to count complete runs up to length 2m-2 • If 2m-1 consecutive zeros, send m 1s to indicate length>2m-2
Example: Code 1, m = 4 000…001000…01000…0100…001… 137 bits Runs 25 15 57 36 Symbols >14 10>14>14>14 12 >14>14 6 >14 0 1111 1010 1111 1111 1111 1100 1111 1111 0110 1111 0000 44 bits 15w 10wb 15w 15w 15w 12wb 15w 15w 6wb 15w b Example: Code 1 Code 1 performance: m / E[R] encoded bits/source bits
Inputs: Binary Runlength Code 2 Run Length Codeword Codeword (m = 4) 1 0 10..00 10000 01 1 10..01 10001 001 2 10..10 10010 0001 3 10..11 10011 00001 4 . . 000001 5 . . 0000001 6 . . . . . . . . . . 000... 01 2m – 1 11..11 11111 000... 00 run >2m – 1 0 0 m + 1 • When all-zero runs are frequent, encode event with 1 bit to get higher compression
Example: Code 2, m = 4 000…001000…01000…0100…001… 137 bits Runs 25 15 57 36 Symbols >15 9 >15>15>15 9 >15>15 4 15 0 11001 0 0 0 11001 0 0 10100 11111 26 bits Encoded Stream Decoded Stream 16w 9wb 16w 16w 16w 9wb 16w 16w 4wb 15wb Example: Code 2 Code 2 performance: E[ l ] / E[R] encoded bits/source bits
(a) Huffman code applied to white runs and black runs (b) Encode differences between consecutive lines Predictive Coding
Fax Documents use Runlength Encoding • CCITT Group 3 facsimile standard • Default: 1-D Huffman coding of runlengths • Option: 2-D (predictive) run-length coding
Adaptive Coding • Adaptive codes provide compression when symbol and pattern probabilities unknown • Essentially, encoder learns/discovers frequent patterns • Lempel-Ziv algorithm powerful & popular • Incorporated in many utilities • Whenever a pattern is repeated in the symbol stream, it is replaced by a pointer to where it first occurred & a value to indicate the length of the pattern All tall We all are tall. All small We all are small All_ta[2,3]We_[6,4]are[4,5]._[1,4]sm[6,15][31,5]. All_ _tall all_ ll small all_We_all_are_
Chapter 12Multimedia Information Compression of Analog Signals
Th e s p ee ch s i g n al l e v el v a r ie s w i th t i m(e) Stream Information • A real-time voice signal must be digitized & transmitted as it is produced • Analog signal level varies continuously in time
x(nT) x(nT) x(t) x(t) Sampler t t t t Sampling Theorem Nyquist: Perfect reconstruction if sampling rate 1/T > 2Ws (a) (b) Interpolation filter
7D/2 5D/2 3D/2 D/2 -D/2 -3D/2 -5D/2 -7D/2 Original signal Sample value Approximation 3 bits / sample Quantization of Analog Samples • Quantizer maps input • into closest of 2m • representation values 3.5 output y(nT) 2.5 1.5 0.5 Quantization error: “noise” = x(nT) – y(nT) -0.5 input x(nT) -1.5 -2.5 -3.5
Bit Rate of Digitized Signal • Bandwidth Ws Hertz: how fast the signal changes • Higher bandwidth → more frequent samples • Minimum sampling rate = 2 x Ws • Bit Rate = 2 Ws samples/second x m bits/sample • Representation accuracy: range of approximation error • Higher accuracy → smaller spacing between approximation values → more bits per sample SNR = 6m – 7 dB
Telephone voice Ws = 4 kHz → 8000 samples/sec 8 bits/sample Rs=8x8000 = 64 kbps Cellular phones use more powerful compression algorithms: 8-12 kbps CD Audio Ws = 22 kHz → 44000 samples/sec 16 bits/sample Rs=16x44000= 704 kbps per audio channel MP3 uses more powerful compression algorithms: 50 kbps per audio channel Example: Voice & Audio
Smooth signal Successive differences Differential Coding • Successive samples tend to be correlated • Use prediction to get better quality for m bits
Quantize the difference between prediction and actual signal: Encoder + d ( n ) to channel QUANTIZER + - + LINEAR PREDICTOR h x ( n ) + + x ( n ) Decoder ~ n d y ( n ) ( ) + + + LINEAR PREDICTOR h x ( n ) ˜ ˜ ˆ y ( n ) - x ( n ) = x ( n ) + d ( n ) - x ( n ) = d ( n ) - d ( n ) = e ( n ) Differential PCM The end-to-end error is only the error introduced by the quantizer!
Voice Codec Standards A variety of voice codecs have been standardized for different target bit rates and implementation complexities. These include: G.711 64 kbps using PCM G.723.1 5-6 kbps using CELP G.726 16-40 kbps using ADPCM G.728 16 kbps using low delay CELP G.729 8 kbps using CELP
X(f) Signal spectrum Q(f) Noise spectrum f W -W 0 Transform Coding • Quantization noise in PCM is “white” (flat spectrum) • At high frequencies, noise power can be higher than signal power • If coding can produce noise that is “shaped” so that signal power is always higher than noise power, then masking effects in ear results in better subjective quality • Transform coding maps original signal into a different domain prior to encoding
X(f) Q(f) f W 0 -W Subband Coding • Subband coding is a form of transform coding • Original signal is decomposed into multiple signals occupying different frequency bands • Each band is PCM or DPCM encoded separately • Each band allocated bits so that signal power always higher than noise power in that band
MP3 Audio Coding • MP3 is coding for digital audio in MPEG • Uses subband coding • Sampling rate: 16 to 48 kHz @ 16 bits/sample • Audio signal decomposed into 32 subbands • Fast Fourier transform used for decomposition • Bits allocated according to signal power in subbands • Adjustable compression ratio • Trade off bitrate vs quality • 32 kbps to 384 kbps per audio signal
Chapter 12Multimedia Information Image and Video Coding
Image Coding • Two-dimensional signal • Variation in Intensity in 2 dimensions • RGB Color representation • Raw representation requires very large number of bits • Linear prediction & transform techniques applicable • Joint Picture Experts Group (JPEG) standard
Transform Coding • Time signal on left side is smooth, that is, it changes slowly with time • If we take its discrete cosine transform (DCT) we find that the non-negligible frequency components are clustered near zero frequency; other components are negligible. 1-D DCT (a) X(f) x(t) (time) (frequency)
(b) 2-D DCT 100 95 85 .. 102 99 70.. 101 80 70.. 95 77 65.. 80 10 3 2 0 8 2 1 0 .. 2 0 0 .. 0 0 ... 0 ... (n,m) Space (u,v) Frequency Image Transform Coding • Take a block of samples from a smooth image • If we take two-dimensional DCT, non-negligible values will cluster near low spatial frequencies (upper left-hand corner)
111 22 15 5 1 0 0 0 14 17 10 4 1 0 0 0 2 2 1 0 0 0 0 0 -4 -4 -2 -1 0 0 0 0 -3 -3 -1 0 0 0 0 0 -1 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 180 150 115 100 100 100 100 100 250 180 128 100 100 100 100 100 190 170 120 100 100 100 100 100 160 130 110 100 100 100 100 100 110 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 DCT 8x8 block of 8-bit pixel values Quantized DCT Coefficients DCT Coding In image and video coding, the picture array is divided into 8x8 pixel blocks which are coded separately. • Quantized DCT coefficients are scanned in zigzag fashion • Resulting sequence is run-length and variable-length (Huffman) coded
8x8 block Quantization VLC coding DCT Symmetric DCT/I-DCT transfer Huffman Tab DC: DPCM VLI AC: O-run/VLI Quantization Matrices JPEG Image Coding Standard • JPEG defines: • Several coding modes for different applications • Quantization matrices for DCT coefficients • Huffman VLC coding tables • Baseline DCT/VLC coding gives 5:1 to 30:1 compression
Low (23.5 kb) High (64.8 kb) Look for jaggedness along boundaries
30 fps Video Signal • Sequence of picture frames • Each picture digitized & compressed • Frame repetition rate • 10-30-60 frames/second depending on quality • Frame resolution • Small frames for videoconferencing • Standard frames for conventional broadcast TV • HDTV frames Rate = M bits/pixel x (WxH) pixels/frame x Fframes/second
m x n block red component . . . . . . m x n block green component . . . . . . . . . m x n pixel color picture . . . m x n block component signal A scanned color picture produces 3 color component signals Luminance signal (black & white) Chrominance signals
Color Representation • RGB (Red, Green, Blue) • Each RGB component has the same bandwidth and Dynamic Range • YUV • Commonly used to mean YCbCr, where Y represents the intensity and Cr and Cb represent chrominance information • Derived From "Color Difference" Video Signals: Y, R–Y, B–Y Y = 0.299R + 0.587G + 0.114B • Sampling Ratio of Y:Cr:Cb • Y is typically sampled more finely than Cr & Cb • 4:4:4, 4:2:2, 4:2:0, 4:1:1
176 (a) QCIF videoconferencing 144 at 30 frames/sec = 760,000 pixels/sec 720 (b)Broadcast TV at 30 frames/sec = 10.4 x 106 pixels/sec 480 1920 (c) HDTV at 30 frames/sec = 67 x 106 pixels/sec 1080