340 likes | 417 Views
Language and Information. September 21, 2000. Handout #2. Course Information. Instructor: Dragomir R. Radev (radev@si.umich.edu) Office: 305A, West Hall Phone: (734) 615-5225 Office hours: TTh 3-4 Course page: http://www.si.umich.edu/~radev/760
E N D
Language and Information September 21, 2000 Handout #2
Course Information • Instructor: Dragomir R. Radev (radev@si.umich.edu) • Office: 305A, West Hall • Phone: (734) 615-5225 • Office hours: TTh 3-4 • Course page: http://www.si.umich.edu/~radev/760 • Class meets on Thursdays, 5-8 PM in 311 West Hall
Readings • Textbook: • Oakes, Chapter 2, pages 53 – 76 • Additional readings • M&S, Chapter 7, pages (minus Section 7.4) • M&S, Chapter 8, pages (minus Sections 8.3-4)
Entropy • Let p(x) be the probability mass function of a random variable X, over a discrete set of symbols (or alphabet) X: p(x) = P(X=x), x X • Example: throwing two coins and counting heads and tails • Entropy (self-information): is the average uncertainty of a single random variable:
Information theoretic measures • Claude Shannon (information theory): “information = unexpectedness” • Series of events (messages) with associated probabilities: pi (i = 1 .. n) • Goal: to measure the information content, H(p1, …, pn) of a particular message • Simplest case: the messages are words • When pi is low, the word is less informative
Properties of information content • H is a continuous function of the pi • If all p are equal (pi = 1/n), then H is a monotone increasing function of n • if a message is broken into two successive messages, the original H is a weighted sum of the resulting values of H
Example • Only function satisfying all three properties is the entropy function: p1 = 1/2, p2 = 1/3, p3 = 1/6 H = - pilog2 pi
Example (cont’d) H = - (1/2 log2 1/2 + 1/3 log2 1/3 + 1/6 log2 1/6) = 1/2 log2 2 + 1/3 log2 3 + 1/6 log2 6 = 1/2 + 1.585/3 + 2.585/6 = 1.46 Alternative formula for H: H = pilog2 (1/pi)
Another example • Example: • No tickets left: P = 1/2 • Matinee shows only: P = 1/4 • Eve. show, undesirable seats: P = 1/8 • Eve. Show, orchestra seats: P = 1/8
Example (cont’d) H = - (1/2 log 1/2 + 1/4 log 1/4 + 1/8 log 1/8 + 1/8 log 1/8) H = - (1/2 x -1) + (1/4 x -2) + (1/8 x -3) + (1/8 x -3) H = 1.75 (bits per symbol)
Characteristics of Entropy • When one of the messages has a probability approaching 1, then entropy decreases. • When all messages have the same probability, entropy increases. • Maximum entropy: when P = 1/n (H = ??) • Relative entropy: ratio of actual entropy to maximum entropy • Redundancy: 1 - relative entropy
Entropy examples • Letter frequencies in Simplified Polynesian: P(1/8), T(1/4), K(1/8), A(1/4), I (1/8), U (1/8) • What is H(P)? • What is the shortest code that can be designed to describe simplified Polynesian? • What is the entropy of a weighted coin? Draw a diagram.
Joint entropy and conditional entropy • The joint entropy of a pair of discrete random variablesX, Y p(x,y) is the amount of information needed on average to specify both their values H (X,Y)= -xyp(x,y)log2p(X,Y) • The conditional entropy of a discrete random variable Y given another X, for X, Y p(x,y) expresses how much extra information is need to communicate Y given that the other party knows X H (Y|X)= -xyp(x,y)log2p(y|x)
Connection between joint and conditional entropies • There is a chain rule for entropy (note that the products in the chain rules for probabilities have become sums because of the log): H (X,Y) = H(X) + H(Y|X)H (X1,…,Xn) = H(X1) + H(X2|X1) + … + H(Xn|X1,…,Xn-1)
Mutual information H(X,Y) = H(X) + H(Y|X) = H(Y) + H(X|Y) • Mutual information: reduction in uncertainty of one random variable due to knowing about another, or the amount of information one random variable contains about another. H(X) – H(X|Y) = H(Y) – H(Y|X) = I(X;Y)
Mutual information and entropy H(X,Y) H(Y|X) H(X|Y) I(X;Y) H(X|Y) H(X|Y) • I(X;Y) is 0 iff two variables are independent • For two dependent variables, mutual information grows not only with the degree of dependence, but also according to the entropy of the variables
Formulas for I(X;Y) I(X;Y) = H(X) – H(X|Y) = H(X) + H(Y) – H(X,Y) I(X;Y) =xyp(x,y)log2 p(x,y) p(x)p(y) Since H(X|X) = 0, note that H(X) = H(X)-H(X|X) = I(X;X) p(x,y) I(x;y) =log2 : pointwise mutual information p(x)p(y)
The noisy channel model X Channel p(y|x) Y Ŵ W Encoder Decoder Message from a finite alphabet Input to channel Output from channel Attempt to reconstruct message based on output 1-p 0 0 Binary symmetric channel p 1 1 1-p
Compression • Huffman coding (prefix property) • Ziv-Lempel codes (better) • arithmetic codes (better for images - why?)
Huffman coding • Developed by David Huffman (1952) • Average of 5 bits per character • Based on frequency distributions of symbols • Algorithm: iteratively build a tree of symbols starting with the two least frequent symbols
0 1 0 1 1 0 g 0 1 0 1 0 1 i j f c 0 1 0 1 b d a 0 1 e h
Exercise • Consider the bit string: 01101101111000100110001110100111000110101101011101 • Use the Huffman code from the example to decode it. • Try inserting, deleting, and switching some bits at random locations and try decoding.
Ziv-Lempel coding • Two types - one is known as LZ77 (used in GZIP) • Code: set of triples <a,b,c> • a: how far back in the decoded text to look for the upcoming text segment • b: how many characters to copy • c: new character to add to complete segment
<0,0,p> p • <0,0,e> pe • <0,0,t> pet • <2,1,r> peter • <0,0,_> peter_ • <6,1,i> peter_pi • <8,2,r> peter_piper • <6,3,c> peter_piper_pic • <0,0,k> peter_piper_pick • <7,1,d> peter_piper_picked • <7,1,a> peter_piper_picked_a • <9,2,e> peter_piper_picked_a_pe • <9,2,_> peter_piper_picked_a_peck_ • <0,0,o> peter_piper_picked_a_peck_o • <0,0,f> peter_piper_picked_a_peck_of • <17,5,l> peter_piper_picked_a_peck_of_pickl • <12,1,d> peter_piper_picked_a_peck_of_pickled • <16,3,p> peter_piper_picked_a_peck_of_pickled_pep • <3,2,r> peter_piper_picked_a_peck_of_pickled_pepper • <0,0,s> peter_piper_picked_a_peck_of_pickled_peppers
Arithmetic coding • Uses probabilities • Achieves about 2.5 bits per character
Exercise • Assuming the alphabet consists of a, b, and c, develop arithmetic encoding for the following strings: aaa aab aba baa abc cab cba bac