640 likes | 995 Views
Information and Coding Theory Transmission over lossless channels. Entropy. Compression codes - Shannon code, Huffman code, arithmetic code. Juris Viksna, 2014. Information transmission. We will focus on compression/decompression parts, assuming that there are no losses during transmission.
E N D
Information and Coding Theory Transmission over lossless channels. Entropy. Compression codes - Shannon code, Huffman code, arithmetic code. Juris Viksna, 2014
Information transmission We will focus on compression/decompression parts, assuming that there are no losses during transmission. [Adapted from D.MacKay]
Noiseless channel [Adapted from D.MacKay]
Noiseless channel How many bits we need to transfer a particular piece of information? All possible n bit messages, each with probability 1/2n Receiver Noiseless channel Obviously n bits will be sufficient. Also, it is not hard to guess that n bits will be necessary to distinguish between all possible messages.
Noiseless channel All possible n bit messages. Msg. Prob. 000000... ½ 111111... ½ other 0 Receiver Noiseless channel n bits will still be sufficient. However, we can do quite nicely with just 1 bit!
Noiseless channel • All possible n bit messages. • Msg. Prob. • 00 ¼ • 01 ¼ • ½ • 0 Receiver Noiseless channel Try to use 2 bits for “00” and “01” and 1 bit for “10”: 00 00 01 01 10 1
Noiseless channel All possible n bit messages, the probability of message i being pi. Receiver Noiseless channel We can try to generalize this by defining entropy (the minimal average number of bits we need to distinguish between messages) in the following way: Derived from the Greek εντροπία "a turning towards" (εν- "in" + τροπή "a turning").
Entropy - The idea The entropy, H, of a discrete random variable X is a measure of the amount of uncertainty associated with the value of X. [Adapted from T.Mitchell]
Entropy - The idea [Adapted from T.Mitchell]
Entropy - Definition Example [Adapted from D.MacKay]
Entropy - Definition NB!!! If not explicitly stated otherwise, in this course (as well in Computer Science in general)expressions log x denote logarithm of base 2 (i.e. log2 x). [Adapted from D.MacKay]
Entropy - Definition The entropy, H, of a discrete random variable X is a measure of the amount of uncertainty associated with the value of X. [Adapted from T.Mitchell]
Entropy - Some examples [Adapted from T.Mitchell]
Entropy - Some examples [Adapted from T.Mitchell]
Binary entropy function Entropy of a Bernoulli trial as a function of success probability, often called the binary entropy function, Hb(p). The entropy is maximized at 1 bit per trial when the two possible outcomes are equally probable, as in an unbiased coin toss. [Adapted from www.wikipedia.org]
Entropy - some properties [Adapted from D.MacKay]
Entropy - some properties Entropy is maximized if probability distribution is uniform – i.e. all probabilities pi are equal. Sketch of proof: Assume probabilities p and q, then taking both probabilities equal to (p+q)/2 entropy does not decrease. H(p,q) = – (p log p + q log q) H((p+q)/2, (p+q)/2) = – ((p+q)/2 log ((p+q)/2)) – ((p+q)/2 log ((p+q)/2)) + (p log p + q log q) – ((p+q)/2 log ((pq)1/2) + (p log p + q log q) – ((p+q) (log p + log q) + (p log p + q log q) (p –q)(log p – log q) 0 In addition we need also some smoothness assumptions about H.
Joint entropy Assume that we have a set of symbols with known frequencies of symbol occurrences. We have assumed that on average we will need H() bits to distinguish between symbols. What about sequences of length n of symbols from (assuming independent occurrence of each symbol with the given frequency)? The entropy of n will be: it turns out that H(n) = nH(). Later we will show that (assuming some restrictions) the encoding that use nH() bits on average are the best we can get.
Joint entropy The joint entropy of two discrete random variables X and Y is merely the entropy of their pairing: (X,Y). This implies that if X and Y are independent, then their joint entropy is the sum of their individual entropies. [Adapted from D.MacKay]
Conditional entropy The conditional entropy of X given random variable Y (also called the equivocation of X about Y) is the average conditional entropy over Y: [Adapted from D.MacKay]
Conditional entropy [Adapted from D.MacKay]
Mutual information Mutual information measures the amount of information that can be obtained about one random variable by observing another. Mutual information is symmetric: [Adapted from D.MacKay]
Entropy (summarized) Relations between entropies, conditional entropies, joint entropy and mutual information. [Adapted from D.MacKay]
Entropy - example [Adapted from D.MacKay]
Binary encoding - The problem Straightforward approach - use 3 bits to encode each character (e.g. '000' for a, '001' for b, '010' for c, '011' for d, '100' for e, '101' for f). The length of the data file then will be 300 000. Can we do better? [Adapted from S.Cheng]
Variable length codes [Adapted from S.Cheng]
Encoding [Adapted from S.Cheng]
Decoding [Adapted from S.Cheng]
Prefix codes [Adapted from S.Cheng]
Prefix codes [Adapted from S.Cheng]
Binary trees and prefix codes [Adapted from S.Cheng]
Binary trees and prefix codes [Adapted from S.Cheng]
Optimal codes Is this prefix code optimal? [Adapted from S.Cheng]
Optimal codes [Adapted from S.Cheng]
Shannon encoding [Adapted from M.Brookes]
Huffman encoding [Adapted from S.Cheng]
Huffman encoding - example [Adapted from S.Cheng]
Huffman encoding - example [Adapted from S.Cheng]
Huffman encoding - example [Adapted from S.Cheng]
Huffman encoding - example [Adapted from S.Cheng]
Huffman encoding - example [Adapted from S.Cheng]
Huffman encoding - example 2 Construct Huffman code for symbols with frequencies: A 15 D 6 F 6 H 3 I 1M 2N 2 U 2 V 2 # 7
Huffman encoding - example 2 [Adapted from H.Lewis, L.Denenberg]
Huffman encoding - algorithm [Adapted from S.Cheng]
Huffman encoding - optimality [Adapted from S.Cheng]
Huffman encoding - optimality [Adapted from S.Cheng]
Huffman encoding - optimality [Adapted from S.Cheng]
Huffman encoding - optimality Huffman codes are optimal! [Adapted from S.Cheng]
Huffman encoding - optimality (proof 2) [Adapted from H.Lewis and L.Denenberg]
Huffman encoding - optimality (proof 2) • Proof by induction: • n = 1 OK • assume T is obtained by Huffman algorithm and X is an optimal tree. • Construct T’ and X’ as described by lemma. Then: • w(T’) w(X’) • w(T) = w(T’)+C(n1)+C(n2) • w(X) w(X’)+C(n1)+C(n2) • w(T) w(X) [Adapted from H.Lewis and L.Denenberg]