250 likes | 414 Views
Chapter 5. A Measure of Information. Outline. 5.1 Axioms for the uncertainty measure 5.2 Two Interpretations of the uncertainty function 5.3 Properties of the uncertainty function 5.4 Entropy and Coding 5.5 Shannon-Fano Coding. 5.1 Axioms for the uncertainty measure.
E N D
Chapter 5 A Measure of Information
Outline • 5.1 Axioms for the uncertainty measure • 5.2 Two Interpretations of the uncertainty function • 5.3 Properties of the uncertainty function • 5.4 Entropy and Coding • 5.5 Shannon-Fano Coding
5.1 Axioms for the uncertainty measure x : discrete random variable x1x2... xM p1p2... pM h(p): the uncertainty of an event with probability p h(pi): the uncertainty of { x = xi} The average uncertainty of x: If p1 = p2 =...= pM= , we say
Axiom 1: f(M) should be a monotonically increasing function of M, that is, M<M ’ implies f(M)<f(M ’) For example, f(2)<f(6) • Axiom 2: X: (x1, . . ., xM) Y: (y1, . . ., yL) (X,Y): Joint experiment has M.L equally likely outcome. f(M.L) = f(M) + f(L) independent
Axiom 3 (Group Axiom): X = (x1, x2, . . . , xr, xr+1, . . . , xM ) Construct a compound experiment X1 A Xr X Xr+1 B XM
A B
Axiom 5: H(p,1-p) is a continuous function of p, i.e., a small change in p will correspond to a small change in uncertainty. • We can use four axioms above to find the H function. • Thm 5.1: The only function satisfying the four given axioms is H(p1, . . . , PM)= , where C > 0 and the logarithm base > 1
For example, C = 1, and base = 2 H(p,1-p) Coin : { tail, head } 1 Max. uncertainty ½ ½ ▪ ▪ ▪ 1 0 Min. uncertainty 0 1 ½
5.2 Two Interpretations of the uncertainty function • (1) H(p1, . . . , pM) may be interpreted as the expectation of a random variable W = w(x)
(2) H(p1, . . . , pM) may be interpreted as the min average number of ‘yes’ ‘no’ questions required to specify the values of x For example, H(x) = H( 0.3 , 0.2 , 0.2 , 0.15 , 0.15 ) = 2.27 x1 x2 x3 x4 x5 x1 Y x=x1? Y N x2 Does x=x1 or x2? N x3 Y x=x3? x4 Y N x=x4 N x5
Avg # of q = 2·0.7 + 3·0.3 = 2.3 > 2.27 H.W. : X = { x1, x2 } p(x1) = 0.7 p(x2) = 0.3 How many questions (in average) are required to specify the outcome of a joint experiment involving 2 independent observation of x?
5.3 Properties of the uncertainty function y • Lemma 5.2 Let p1, . . . , pM & q1, . . . , qM be arbitrary positive number with Then y = x -1 y = ln x ln x ≤ x -1 x
Thm 5.3 H(p1, . . . , pM) ≤ log M with equality iff pi =
5.4 Entropy and Coding • Noiseless Coding Theorem X : x1x2· · · · xM p1 p2· · · · pM Codeword: w1 w2· · · · wM length: n1n2· · · · nM Minimize: Code Alphabet: { a1, a2, …, aD} Ex. D = 2, { 0, 1 }
Thm (Noiseless Coding Thm) • If is the average codeword length of a uniquely decodable code for X, then with equality iff , for i = 1, 2, …, M. • Note: • is the uncertainty of X computed by using the base D.
A code is called “absolutely optimal” if it achieves the lower bound by the noiseless coding thm. • Ex. H(x) = 7/4 =
5.5 Shannon-Fano Coding • Select the integer ni s.t. => An instantaneous code can be constructed with the lengths n1, n2, …, nM obtained from Shannon-Fano coding.
In fact, we can always approach the lower bound as closely as desired if we are allowed to use “block coding”. • Take a series of observation of X Let Y = (x1, x2, …, xs) Assign a codeword to Y => Block coding decrease the average codeword length per value of X
Ex. But H(X) = 0.88129 H(p), p = 0.3 or p = 0.7 look up table
How do we find the actual code symbols? • We simply assign them in order. • By S-F coding: • We then assign