§1 Entropy and mutual information

§1 Entropy and mutual information §1.1 Discrete random variables §1.2 Discrete random vectors §1.1.1 Discrete memoryless source and entropy §1.1.2 Discrete memoryless channel and mutual information §1.1 Discrete random variables §1.2 Discrete random vectors

§1.1.1 Discrete memoryless source and entropy 1. DMS (Discrete memoryless source ) Probability Space: • Example 1.1.1 Let X represent the outcome of a single roll of a fair die.

§1.1.1 Discrete memoryless source and entropy 2. self information • Example 1.1.2 white white black red red blue Analyse the uncertainty of red ball selected from X and from Y.

§1.1.1 Discrete memoryless source and entropy 2. self information I(ai) = f [p(ai)] Satisfy: 1) I(ai) is the monotone decreasing function of p(ai): if p(a1)> p(a2), thenI(a1) < I(a2)； 2) if p(ai)＝1, thenI(ai)＝0； 3) if p(ai)＝0 , then I(ai)→∞； 4) if p(ai aj)＝p(ai) p(aj)， then I(aiaj)=I(ai)+I(aj)

I(ai) a and b are statistically independent p(ai) 0 1 §1.1.1 Discrete memoryless source and entropy • self information bit nat hart Remark: The measure of uncertainty of the random variable ai The measure of information the random variable ai provides.

§1.1.1 Discrete memoryless source and entropy 3. Entropy Definition: Suppose X is a discrete random variable, whose range R={a1,a2,…} is finite or countable. Let p(ai)=P{X=ai}. The entropy of X is defined by uncertainty (or randomness) about X. average A measure of amount of information provided by X.

§1.1.1 Discrete memoryless source and entropy • Entropy-theamount of “information”provided by an observation of X • Example 1.1.3 100 balls in a bag, 80% is red, and remain is white. Now , we fetch out a ball. How about the information of every fetching? =0.722 bit/sig

average §1.1.1 Discrete memoryless source and entropy • Entropy-the “uncertainty” or “randomness” about X • Example 1.1.4

3) If R is infinite , H(X) may be + §1.1.1 Discrete memoryless source and entropy 3. Entropy Note: 1) units: bit/sig,nat/sig,hart/sig 2) If p(ai)=0, p(ai)log p(ai)-1 = 0

probability vector §1.1.1 Discrete memoryless source and entropy 3. Entropy • Example 1.1.5 entropy of BS entropy function

§1.1.1 Discrete memoryless source and entropy 4. The properties of entropy Theorem1.1 Let X assume values in R={x1,x2,…,xr}. (Theorem 1.1 in textbook) 1) 2) H(X) = 0 iff pi = 1 for some i 3) H(X) ≤logr ,with equality iff pi = 1/r for all i ——base of data compressing Proof:

§1.1.1 Discrete memoryless source and entropy 4. The properties of entropy 4) • Example 1.1.6 Let X,Y,Z are all discrete random variables:

§1.1.1 Discrete memoryless source and entropy 4. The properties of entropy 5) If X,Y are independent , then H(XY) = H(X) + H(Y) Proof:

＋＝ §1.1.1 Discrete memoryless source and entropy Proof: Joint source: H(X) H(Y)

Theorem1.2 The entropy function H(p1,p2,…,pr) is a convex function of probability vector (p1,p2,…,pr) . H p §1.1.1 Discrete memoryless source and entropy 4. The properties of entropy 6) Convex properties • Example 1.1.5 (continued) • entropy of BS 1 0 1/2 1

§1.1.1 Discrete memoryless source and entropy 5. conditional entropy Definition: X, Y are a pair of random variables, if (X,Y)~p(x,y) Then the conditionalentropy of X , given Y is defined by

§1.1.1 Discrete memoryless source and entropy 5. conditional entropy Analyse:

Y X 3/4 0 0 1/4 ? 1/2 1/2 1 1 §1.1.1 Discrete memoryless source and entropy 5. conditional entropy • Example 1.1.7 H(X)=? H(X) = H(2/3,1/3)=0.9183 bit/sig H(X|Y=0) = 0 H(X|Y=0) = ? H(X|Y=1) = 0 H(X|Y=?) = H(1/2,1/2)=1 bit/sig H(X|Y=?) = ? H(X|Y) = ? H(X|Y) = 1/3 bit/sig pX(0)=2/3, pX(1) = 1/3

§1.1.1 Discrete memoryless source and entropy 5. conditional entropy Theorem1.3 (conditioning reduces entropy) with equality iff X and Y are independent. Proof:

Review KeyWords: Measure of information self information entropy properties of entropy conditional entropy

Homework • P44: T1.1, • P44: T1.4, • P44: T1.6, 4. Let X be a random variable taking on a finite number of values. What is the relationship of H(X) or H(Y) if (1) Y=2X ? (2) Y=cosX ?

Homework

Homework 6. Given a chessboard with 8×8=64 squares. A chessman is put randomly in a square. Guess the location of the chessman. Find the uncertainty of the result. if we mark every square by its row and column number, and already know the row number of the chessman, how about the uncertainty?

Homework • thinking： Coin flip. A fair coin is flipped until the first head occurs. Let X denote the number of flips required. Find the entropy H(X) in bits. Imply:

§1 Entropy and mutual information §1.1 Discrete random variables §1.2 Discrete random vectors §1.1.1 Discrete memoryless source and entropy §1.1.2 Discrete memoryless channel and mutual information

Channel p(y∣x) §1.1.2 Discrete memoryless channel and mutual information

0 0 1 1 p(y|x) r-1 s-1 §1.1.2 Discrete memoryless channel and mutual information 1. DMC (Discrete Memoryless Channel) • The model of DMC r input symbols, s output symbols

§1.1.2 Discrete memoryless channel and mutual information 1. DMC (Discrete Memoryless Channel) • representationof DMC • graph x y p(y|x) transition probabilities for all x,y for all x

§1.1.2 Discrete memoryless channel and mutual information 1. DMC (Discrete Memoryless Channel) • representationof DMC transition probabilities matrix • matrix

§1.1.2 Discrete memoryless channel and mutual information 1. DMC (Discrete Memoryless Channel) • representationof DMC • formula

1-p 0 0 p p 1 1 1-p §1.1.2 Discrete memoryless channel and mutual information 1. DMC (Discrete Memoryless Channel) • Example 1.1.8: BSC (Binary Symmetric Channel) r = s = 2 p(0|0) = p(1|1) = 1-p p(0|1) = p(1|0) = p

§1.1.2 Discrete memoryless channel and mutual information 1. DMC (Discrete Memoryless Channel) • Example 1.1.9: BEC (Binary Erasure Channel) 0 1 1 ？？？ 1 0 1 1 ？ 1 0 ？？

p 0 0 1-p ? 1-q q 1 1 §1.1.2 Discrete memoryless channel and mutual information 1. DMC (Discrete Memoryless Channel) • Example 1.1.9: BEC (Binary Erasure Channel) r = 2, s = 3 p(0|0) = p, p(?|0) = 1-p p(1|1) = q, p(?|1) = 1-q

Channel p(y∣x) or p(ai|bj) §1.1.2 Discrete memoryless channel and mutual information 2. average mutual information • definition The reduction in uncertainty about X conveyed by the observations Y; The information about X from Y. H(X|Y) H(X) entropy equivocation I(X;Y) = H(X) – H(X|Y) average mutual information

§1.1.2 Discrete memoryless channel and mutual information 2. average mutual information • definition I(X;Y) = H(X) – H(X|Y)

§1.1.2 Discrete memoryless channel and mutual information 2. average mutual information • definition I(X;Y)and I(x;y) mutual information I(X;Y)＝EXY[I(x;y)] I(X;Y)and H(X)

Theorem1.4 For any discrete random variables X and Y, .Moreover I(X;Y) = 0 iff X and Y are independent. §1.1.2 Discrete memoryless channel and mutual information 2. Average mutual information • properties 1) Non-negativity of average mutual information We do not expect to be misled on average by observing the output of channel. (Theorem 1.3 in textbook) Proof:

S channel decrypt D encrypt Key §1.1.2 Discrete memoryless channel and mutual information 2. Average mutual information • properties total loss X Y Y’ listener-in A cryptosystem Caesar cryptography message：arrive at four ciphertext：duulyh dw irxu

Mnemonic Venn diagram §1.1.2 Discrete memoryless channel and mutual information 2. Average mutual information • properties 2) symmetry I(X;Y) = I(Y;X) 3) relationship between entropy and average mutual information Joint entropy H(XY) I(X;Y) = H(X) – H(X|Y) H(X∣Y) H(Y∣X) I(X;Y) = H(Y) – H(Y|X) I(X;Y) = H(X) + H(Y) – H(XY) H(Y) H(X) I(X;Y)

b1 a1 1／2 a1 b1 b2 a2 1／2 b2 b3 1／5 a2 b4 2／5 2／5 ar br b5 a1 b1 a2 a3 b2 §1.1.2 Discrete memoryless channel and mutual information 2. Average mutual information • properties Recognising channel

§1.1.2 Discrete memoryless channel and mutual information 2. Average mutual information • properties 4) Convex property I(X;Y)=f [P(x)，P(y|x)]

Theorem1.5I(X;Y) is a convex function of the input probabilities P(x). Theorem1.6I(X;Y) is a convex function of the transition probabilities P(y|x). §1.1.2 Discrete memoryless channel and mutual information 2. Average mutual information • properties 4) Convex properties I(X;Y)=f [P(x)，P(y|x)] (Theorem 1.6 in textbook) (Theorem 1.7 in textbook)

1－p 0 0 source： p ，channel： p 1 1－p 1 §1.1.2 Discrete memoryless channel and mutual information 2. Average mutual information • Example 1.1.10 analyse the I(X;Y) of BSC

§1.1.2 Discrete memoryless channel and mutual information 2. Average mutual information • Example 1.1.10 analyse the I(X;Y) of BSC

Review KeyWords: Channel and it’s information measure channel model equivocation average mutual information mutual information properties of average mutual information

§1.1.2 Discrete memoryless channel and mutual information Thinking ？

Let the source have alphabet A={0,1} with p0=p1=0.5. Let encoder C have alphabet B={0,1,… ,7}and let the elements of B have binary representation The encoder is shown below. Find the entropy of the coded output and find the output sequence if the input sequence is a(t)={101001011000001100111011} and the initial contents of the registers are b0 b1 b2 a(t) D Q D Q D Q §1.1.2 Discrete memoryless channel and mutual information • Example 1.1.11

a(t)={101001011000001100111011} b = {001242425124366675013666} §1.1.2 Discrete memoryless channel and mutual information 0 0 Yt Yt+1 1 1 2 2 3 3 4 4 5 5 6 6 7 7

Homework • P45: T1.10, • P46: T1.19(except c) 3. Let the DMS conveys message through a channel: • Calculate that: • H(X) and H(Y); • the mutual information of xi and yj(i,j=1,2); • the equivocation H(X|Y) and average mutual information.

§1 Entropy and mutual information