1 / 20

Source Coding

Source Coding. Hafiz Malik Dept. of Electrical & Computer Engineering The University of Michigan-Dearborn hafiz@umich.edu http://www-perosnal-engin.umd.umich.edu/~hafiz. Overview. Information Source. Source Encode. Encrypt. Channel Encode. Multiplex. Pulse Modulate. Multiple Access.

jenna
Download Presentation

Source Coding

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Source Coding Hafiz Malik Dept. of Electrical & Computer Engineering The University of Michigan-Dearborn hafiz@umich.edu http://www-perosnal-engin.umd.umich.edu/~hafiz

  2. Overview Information Source Source Encode Encrypt Channel Encode Multiplex Pulse Modulate Multiple Access Frequency Spreading BandpassModulate XMT Source Coding – eliminate redundancy in the data, send same information in fewer bits Channel Coding – Detect/Correct errors in signaling and improve BER

  3. What is information? • Student slips on ice in winter in Southfield, MI • Professor slips on ice, breaks arm, class cancelled • Student slips on ice in July in Southfield, MI • Which statement contains the most information? • Statement with unpredictability factor conveys the most new information and surprise Information (I) is related to -log(Pr{event}) Here, Pr{event} 1  I  0, & Pr{event} 0  I ∞]

  4. Measure of Randomness • Let we have two events, E1 & E2 with Pr{E1} = p & Pr{E2} = q • {E1 U E2} = S • Here p=1-q • Entropy: average information per source output 1.0 0.5 0 Entropy H (bits) 0 0.5 1.0 Probability, p

  5. Comments on Entropy • When log is taken to base 2, unit of H bits/event & for base e nats • Entropy is maximum for a uniformly distributed random variable • Example:

  6. Calculate the average information in bits/character in English assuming each letter is equally likely Example: Average Information Content in English Language

  7. Real-world English But English characters do not appear with the same frequency in the real-world English literature, probabilities of each character are assumes as , Pr{a}=Pr{e}=Pr{o}=Pr{t}=0.10 • Pr{h}=Pr{i}=Pr{n}=Pr{r}=Pr{s}=0.07 • Pr{c}=Pr{d}=Pr{f}=Pr{l}=Pr{p}=Pr{u}=Pr{y} =0.02 • Pr{b}=Pr{g}=Pr{j}=Pr{k}=Pr{g}=Pr{v}=Pr{w}=Pr{x}=Pr{z} =0.01

  8. Example Binary Sequence • X, P(X=1) = P(X=0) = ½ • Pb=0.01 (1 error per 100 bits) • R=1000 binary symbol/sec

  9. Source Coding • Goal is to find an efficient description of information sources • Reduce required bandwidth • Reduce memory to store • Memoryless –If symbols from source are independent, one symbol does not depend on next • Memory – elements of a sequence depend on one another, e.g. UNIVERSIT_?, 10-tuple contains less information since dependent

  10. Source Coding (II) • This means that it’s more efficient to code information with memory as groups of symbols

  11. Desirable Properties • Length • Fixed Length – ASCII • Variable Length – Morse Code, JPEG • Uniquely Decodable – allow user to invert mapping to the original • Prefix-Free – No codeword can be a prefix of any other codeword • Average Code Length (ni is code length of ith symbol)

  12. Uniquely Decodable and Prefix Free Codes • Uniquely decodable? • Not code 1 • If “10111” sent, is code 3 ‘babbb’or ‘bacb’? Not code 3 or 6 • Prefix-Free • Not code 4, • prefix contains ‘1’ • Avg Code Length • Code 2: n=2 • Code 5: n=1.23

  13. Huffman Code • Characteristics of Huffman Codes: • Prefix-free, variable length code that can achieve the shortest average code length for an alphabet • Most frequent symbols have short codes • Procedure • List all symbols and probabilities in descending order • Merge branches with two lowest probabilities, combine their probabilities • Repeat until one branch is left

  14. 0.4 0.4 0.2 0.4 0.2 0.2 0.2 0.4 0.2 0.2 0.1 0.1 1 0 a 0.4 1 0 b 0.6 0.2 1 0 c 0.4 0.1 1 0 d 0.1 1 0 e Compression Ratio: 3.0/2.4=1.25 Entropy: 2.32 0.2 0.1 f 0.1 Huffman Code Example 0.6 0.4 1.0 The Code: A 11 B 00 C 101 D 100 E 011 F 010 0.2

  15. Example: • Consider a random vector X = {a, b, c} with associated probabilities as listed in the Table • Calculate the entropy of this symbol set • Find the Huffman Code for this symbol set • Find the compression ratio and efficiency of this code

  16. Extension Codes • Combine alphabet symbols to increase variability • Try to combine very common 2,3 letter combinations, e.g.: th,sh, ed, and, the,ing,ion

  17. Lempel-Ziv (ZIP) Codes • Huffman codes have some shortcomings • Know symbol probability information a priori • Coding tree must be known at coder/decoder • Lempel-Ziv algorithm use text itself to iteratively construct a sequence of variable length code words • Used in gzip, UNIX compress, LZW algorithms

  18. Lempel-Ziv Algorithm • Look through a code dictionary with already coded segments • If matches segment, • send <dictionary address, next character> and store segment + new character in dictionary • If no match, • store in dictionary, send <0,symbol>

  19. Code Dictionary Address Contents Encoded Packets LZ Coding: Example • Encode [a b a a b a b b b b b b b a b b b b b a] 1 a < 0 , a > Note: 9 code words, 3 bit address, 1 bit for new character, 2 b < 0 , b > 3 aa < 1 , a > 4 ba < 2 , a > 5 bb < 2 , b > 6 bbb < 5 , b > 7 bba < 5 , a > 8 bbbb < 6 , b > < 4 , - >

  20. Notes on Lempel-Ziv Codes • Compression gains occur for long sequences • Other variations exist that encode data using packets and run length encoding: <# characters back, length, next character>

More Related