Information and Entropy

Information and Entropy

pi= 1/8 1/2 1/8 1/4 1/4 1/4 1/4 1/4 H=1.21 H=1.39 Shannon information entropy on discrete variables • Consider W discrete events with probabilities pi such that ∑i=1Wpi=1. • Shannon’s(1) measure of the amount of choice for the pi is H = -k ∑i=1Wpilogpi, where k is a positive constant • If pi =1/W and k=Boltzmann’s constant, then H = -k W/W log1/W = k logW,which is the entropy of a system with W microscopic configurations • Hence (using k=1),H=-∑i=1Mpilogpi is the Shannon’sinformation entropy • Example: • Schneider(2) notes that H is a measure of entropy/disorder/incertitude. It is a measure of information in Shannon’s sense only if considering it as the information gained by complete incertitude removal (i.e. noiseless channel) (1) C. E. Shannon. A mathematical theory of communication. Bell Sys. Tech. J., 1948 (2) T. D. Schneider, Information Theory Primer, last updated Jan 6, 2003 Second law of thermodynamic: The entropy of a system increases until it reaches equilibrium within the constraints imposed on it.

Information Entropy on continuous variables • Information about a random variable xmap taking continuous values arises from the exclusion its possible alternatives (realizations) • Hence a measure of information for continuous valued xmap is Info(xmap) = -log f (xmap) • The expected information is then H(xmap) = -∫dcmapf (cmap) log f (cmap) • By noting the similarity with H=-∑i=1Mpilogpi for discrete variables, we see that H(xmap) = -∫dcmapf (cmap) log f (cmap) is Shannon’s information entropy associated with the PDF f (cmap) for continuous variables xmap

Maximizing entropy given knowledge constraints • Example 1: Given knowledge that “two blue toys are in the corner of a room”, consider the following two arrangements • Example 2: Given knowledge that “the PDF has mean m=0 and variance s2=1”, consider the following uniform and Gaussian PDFs • Hence, the prior stage of BME aims at informativeness by using all but no more general knowledge than is available, i.e. we will seek to maximize information entropy given constraints expressing general knowledge. Out of these two arrangements, arrangement (a) maximizes entropy given the knowledge constraint, hence given our knowledge, it is the most likely toy arrangement (would kids produce (b)?) (b) (a) Out of these two PDFs, the Gaussian PDF maximizes information entropy given the knowledge constraintthat m=0 and s2=1 Uniform:s2=1, H= 1.24 Gaussian:s2=1, H= 1.42

Information and Entropy