1 / 38

THE CONCEPTS OF ENTROPY, PROBABILITY AND INFORMATION THEORY

THE CONCEPTS OF ENTROPY, PROBABILITY AND INFORMATION THEORY. By Prof. A.S. Kadi P.G. Department of Statistics Karnatak University, Dharwad.

leone
Download Presentation

THE CONCEPTS OF ENTROPY, PROBABILITY AND INFORMATION THEORY

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. THE CONCEPTS OF ENTROPY, PROBABILITY AND INFORMATION THEORY By Prof. A.S. Kadi P.G. Department of Statistics Karnatak University, Dharwad

  2. The bringing of any Scientific Theory into Practice leads to the most favorable results. From this not only does practice benefits, but also the development of further Scientific investigation under the influence of Science, which reveals new subject for investigation and new aspects for familiar subjects.

  3. One of the striking example in this regard is the concept of ENTROPY in Probability Theory. In fact this concept was evolved 50 years back from the needs of Practice. • In the beginning this concept was used to create a Theoretical Model for the transmission of information of various kinds • Later its Theoretical significance and properties and the general nature of its application to practice were gradually realized in other areas.

  4. Entropy and Disorder • If you assert that nature tends to take things from order to disorder. If you say that entropy is a measure of disorder, and that nature tends toward maximum entropy for any isolated system, then you do have some insight into the ideas of the second law of thermodynamics. • For a glass of water the number of molecules is astronomical. • The jumble of ice chips may look more disordered in comparison to the glass of water which looks uniform and homogeneous

  5. Entropy as Time's Arrow • The right hand box of molecules happened before the left in the below Figure.

  6. 3.ENTROPY OF FINITE SCHEMES: • In Probability Theory a complete system of events A1, A2, ……., An means a set of events with respective probabilities: , such that one and only one of them must occur at each trial (example: coin,die). Suppose if we are given n mutually exclusive events say A1, ….., An to gether with their probabilities then are have a finite scheme A, where • …………………….(1)

  7. For example is case of die Note that every finite scheme describes a state of uncertainty.

  8. It is obvious that the amount of uncertainty different in different scheme. • The first obviously represents much more uncertainty than the second case. • In the second scheme the result A1 is almost sure, where as in first case we refrain from making any prediction. • Again consider another scheme • Hence the amount of uncertainty in this case is intermediate between preceeding two.

  9. Considering the above situations,It seems desirable to introduce a quantity which reasonably measure the amount of un-certainty associated with a finite scheme. • In this direction Shannon(1946) has proposed a unique function

  10. This function can serve a very suitable measure of uncertainty of n finite scheme. • Here logarithms are taken to an arbitrary but fixed base. • Note that logPk =0 if Pk =0. • Measure of Uncertainty H(P1, P2, ….., Pm) is called ENTROPY of a finite scheme.

  11. H (P1,P2,…..,Pm) = 0, iff one of the number among P1, P2, …..,Pm is one and all other are zero. This kind of case associated with a scheme where prediction can be made before the experiment with complete certainty. Thus in all other case entropy is positive.

  12. Maximum Entropy (Uncertainty) of a scheme with m possible events arise only when all events are equally likely, ie Pi=1/m for i=1,2, …., m. • To examine this, consider an inequality valid for any continuous function φ(x). • i.e. where a1, a2, ….., am are positive number.

  13. Let ai = Pi and (x) =x log x, where • Then we find •  • Hence when the events in a scheme are equally likely then we find maximum entropy (uncertainty).

  14. Suppose we have two finite schemes • Let both schemes are independent. • Let ki denotes the probability of joint occurrence of the events Ak and Bi • =pkqi. • Then the set of events AkBi (1  k  n, 1  i  m) with probabilities ki represents another finite scheme.

  15. Let H(A), H(B) and H(AB) represents corresponding entropy of the schemes A, B, and AB. • Then one can show that H(AB) = H(A) + H(B) • Consider ,

  16. Now consider the case where schemes A and B are (mutually) dependent. • Let qkl be the probability that the event Bl of scheme B occurs, given that the event Ak of scheme A has occurred before Bl . • Then =pkq kl , (1  k  n, 1  l  m) • Thus

  17. Hence • The function can be regarded as the conditional entropy Hk(B) of the scheme B given the event Ak of scheme A has occurred. i.e.

  18. MEASURE OF INFORMATION AND ENTROPY • Information theory is originally concerned with the analysis of an entity called a “communication system” which tradit-ionally represented by the block diagram: Noise Source of message Encoder Channel Decoder Destination

  19. The source of the message may be a person or a machine that produce the information to be communicated. • The encoder is an object associated with each message which is suitable for transmission through a channel. • Encoder may be a sequence of binary digits, as in digital computer applications or a continuous wave form, as in radio communication. • The channel is the medium over which coded messages transmitted.

  20. The decoder operator at the output end is a device which attempts to extract original message for delivery to destination. • Generally we observe that the original message transmitted from input end not received as it is at the output end, because of the effect of Noise in the channel.

  21. Information theory is an attempt to construct a probabilistic model for each of the block show is Fig. 1. • Information theory originally developed by Shannon (1946) and Weiner (1946) independently. As per Shannon, is essentially the study of one theorem called “Fundamental theorem of information theory”, which states that, • “It is possible to transmit information through a noisy channel at any rate less than channel capacity with arbitrarily small probability of error”. • With this, we shall consider the construction of probabilistic measure of information conveyed by a message through channel. • Now As a first step to construct a probabilistic measure of information conveyed by a message., suppose that a random variable X takes on the values 1, 2, 3, 4 and 5 with equal probability 1/5.

  22. Let x be discrete r.v taking 1,2,3,4,5 values with p(x)=1/5 we want to know how much information about the r.v X is conveyed by the message that 1X2. • As per the original information we have to guess the value of X with probability 1/5. • Now from the subsequent message that 1X 2, we know X is either 1 or 2 will increase the probability of guessing the correct information.

  23. In other world, there is less uncertainty about the value of X as per second statement. • Thus by telling the statement 1  X  2 reduced the uncertainty about the actual value of X. This reduction in the notion of uncertainty will be able to measure precisely the transformed information. • Hence here we have to make an attempt to design a measure of uncertainty based on certain assumptions.

  24. AXIOMS OF THE UNCERTAINTY MEASURE • Let X be a random variable taking finite number of possible values x1, x2, …., xm with respective probabilities p1, p2, ….., pm. Then for each m, we shall define a function Hm of the m variables p1, p2 ….. pm (with restriction on each pi > 0 and ). The function Hm(p1, p2, ……, pm) is to be interpreted as average uncertainty associated with the events {X = xi}, simply denoted as H(p1, …….., pm) or H(X) or H.

  25. CONDITIONS ON H(X): • Now we shall impose certain requirements on the function H. • Let X take the values x1, x2, ……, xm, all are equally likely events. • Denote f(M) as average amount of uncertainty associated with these outcomes, then f(M) = H(1/m, 1/m, ………….., 1/m) • For Example: • f(2) is the uncertainty associated with the toss of an unbiased coin • f(2x108) would be uncertainty with selecting a Indian randomly form Indian population. • Here we note that the uncertainty in the latter situation is quite larger than that of former case.

  26. Thus our first axiom on H would be: • (1) Should be monotonically increasing function of m( M is taken as m). • i.e. if m < m1 implies f(m) < f(m1) for m, m1 = 1, 2, 3……..This ascertain the Axiom-1. • (2) Again consider an experiment involving two independent r.vs X and Y. • X takes x1, x2, ….., xM and • Y takes y1, y2, ….., yL. With equal probabilities. • Thus the joint experiment involving X and Y has ML equally likely out comes, and in this case the uncertainty measure should satisfy • f(ML) = f(M) + f(L), for M, L = 1, 2, 3,…… • i.e H(XY) = H(X) + H(Y)

  27. 3) Now we remove the restriction of equally likely comes and turn o the general case. Divide the values of the r.v X is to two groups A and B such that • A: x1, x2, ………., xr with probability p1, p2, …….., pr • B: xr+1, xr+2, ………., xm with probability pr+1, pr+2, ……., pm • P(A) = p1 + p2 + ………., pr • P(B) = pr+1 + pr+2 + ……., + pm. • The p(xi / A) = pi / p1 + p2 + ………., pr where i = 1, 2, ….., r

  28. The p (xi / B) = pi / pr+1 + pr+2 + ……., + pm where i = r+1, r+2,…., m • Suppose Y is the compound experiment, then

  29. Similarly we can show that • P(y=xi)=pi when i = 1, 2, ……., m so that Y and X have same distribution. • Now before the compound experiment, the average uncertainty with X = x1, x2, …., xm is H(p1, p2, ……., pm). • If we reveal which of the two groups A and B is selected, then we remove on the average an amount of uncertainty H(p1+p2+……. +pr, pr+1 + pr+2 + ……., + pm).

  30. Suppose group A is choose with probability , then the remaining uncertainty • If group B is chosen with probability , then the remaining uncertainty is

  31. Thus on the average the uncertainty reaming after the group is specified is: • Hence , the third requirement of the measure H is: • H(p1, p2, …… pm) = H(p1+p2+……+pr ; pr+1+pr+2+ ….. pm) +(p1+….. +pr).

  32. +( pr+1+pr+2+ ….. pm) Finally the fourth requirement is H(p, 1-p) is a continuous function of p.

  33. . PROPERTIES OF UNCERTAINTY FUNCTION: • Lemma 1: Let p1, p2, …., pm and q1, q2, …., qm are arbitrary positive number with • with equality if pi = qi for all i. • Theorem 1: H(p1, p2, ……, pm)  log m with equality iff pi = 1/m. This theorem emphasize that when the number of alternatives are equally likely, then uncertainty attain maximum value. then

  34. Theorem 2: H(x, y)  H(x) + H(y) with equally iff x and y are independent. • Cor: H(x1, x2, x3, ….., xm)  H(x1) + H(x2) + ……. + H(xn) • Theorem 3: H(x, y) = H(x) + H(y/x) = H(y) + (H(x/y) • When one of the values of two r.v x and y observed and x is revealed, then remaining uncertainty about y is H(y/x) • Theorem 4: H(y/x)  H(y) with equality iff x and y are indept.

  35. RELATIONSHIP BETWEEN ENTROPY AND MEASURE OF INFORMATION: • Information is a message that is previously uncertain to the receives. Already known result is certainly non – information. Thus the key element is the study of information theory is measuring amount of uncertainty is the confront of information. We that measure of information as a reductions in uncertainty. For deifications, consider the analyze of two course one is unbiased and another one is two headed coin. • Let x is the random variable. That has two variety 0 and 1. • 0 denoted the chosen coin is unbiased.

  36. 1 denoted the chosen coin is tossed twice, if the no. of heads < 2 then chosen coin unbiased other it may be Two headed coin. • Let H(x) = Initial uncertainty about the coin • H(x/y) = uncertainty about the coin after choosing no. of heads in tossing a chosen coin twice. • Therefore Amount of Information conveyed about x by y is = I(x/y) = H(x) – H(x/y) .

  37. Thus I(x/y) = E[U(x/y)] • When x = xi, y = yj implied • U(x/y) = - log [p(xi)/p(xi/yj].

  38. REFERENCE: • Chinchin A.I (1957): Mathematical Functions of Information Theory, Dover Publications, Inc, New York. • Robert Ash (1965): Information Theory, Inter Science Publications, New York. • Robert M. Gray (1990): Entropy and Information Theory, Springer Verlag, New York.

More Related