110 likes | 127 Views
COT 5611 Operating Systems Design Principles Spring 2014. Dan C. Marinescu Office: HEC 304 Office hours: M- Wd 3:30 – 5:30 PM. Lecture 16. Reading assignment: Chapter 8 from the on-line text Claude Shannon’s paper. Today. Information Theory
E N D
COT 5611 Operating Systems Design Principles Spring 2014 Dan C. Marinescu Office: HEC 304 Office hours: M-Wd3:30 – 5:30 PM
Lecture 16 • Reading assignment: • Chapter 8 from the on-line text • Claude Shannon’s paper Lecture 16
Today • Information Theory • Information theory - a statistical theory of communication • Random variables, probability density functions (PDF), cumulative distribution functions (CDF), • Thermodynamic entropy • Shannon entropy Lecture 16
Continuous and discrete random variables, pdf, cdf Given a discrete random variable X which takes the values xi with probability pi ,1 ≤i≤ n the expected value is and the variance is Given a continuous random variable X which takes values x in some interval I with probability fX(x) the probability density function is the cumulative distribution function is the expected value is the variance is Lecture 16
Joint and conditional probability density functions • Discrete random variables X and Y • Joint probability density function pXY(x,y) : pXY(xi, yj) the probability that X=xi and, at the same time, Y=yj • Conditional probability density function of X given Y pX|Y(x|y) pXY(xi |yj) the probability that X=xi when Y=yj • Continuous random variables X and Y : • Joint probability density functionpXY(x,y) pXY(x,y) the probability that X=x and, at the same time Y=y • Conditional probability density function of X given Y pX|Y(x|y) pXY(xi |yj) the probability that X=x when Y=y Lecture 16
Normal distribution: probability density function cumulative distribution function Lecture 16
Exponential distribution PROBABILITY DENSITY FUNCTION Cumulative distribution function Lecture 16
Information theory • The statistical theory of communication introduced by Claude Shannon in 1949 answers fundamental questions: • How much information can be generated by an agent acting as a source of information? • How much information can be squeezed through a channel? • Communication model: • sender – communication channel –receiver • the sender and the receiver share a common alphabet • Entropy – the amount of uncertainty in a system. Important concept for any physical system. Do not forget that information is physical!!! • Thermodynamic entropy: related to the number of microstates of the system. S = kBlnΩ kB the Boltzmann constant and Ω is the number of microstates • Shannon entropy: measures the quantity of information necessary to remove the uncertainty. Used to measure the quantity of information a source can produce: • The two are related: Shannon entropy represents the number of bits required to label the individual microstates as we can see on the next slide. Lecture 16
Shannon’s entropy Consider an event which happens with probability p; we wish to quantify the information content of a message communicating the occurrence of this event. The measure should reflect the “surprise" brought by the occurrence of this event. An initial guess for a measure of this surprise would be 1/p, the lower the probability of the event the larger the surprise. But the surprise should be additive. If an event is composed of two independent events which occur with probabilities q and r then the probability of the event should be p = qr, but we see that If the surprise is measured by the logarithm of 1/p, then additivity is obeyed The entropy of a random variable X with a probability density function pX(x) is Example: if X is a binary random variable, x={0,1} and p = pX(x = 1) then H(p) = -p log p - (1 - p) log(1 - p) Lecture 16
Example • Eight cars compete in several Formula I races. The probability of winning calculated based on the past race history for the eight cares are: p1=1/2, p2=1/4, p3=1/8, p4=1/16, p5- p8=1/64 • To send a binary message revealing the winner of a particular race we could encode the identities of the winning car in several ways. • For example, we can use an ``obvious'' encoding scheme; the identities of the eight cars could be encoded using three bits, the binary representation of integers 0-7, namely 000, 001, 010, 011, 100, 101, 110, 111, respectively. Obviously, in this case we need three bits, the average length of the string used to communicate the winner of any race is l=3. • The cars have different probability to win a race and it makes sense to assign a shorter string to a car which has a higher probability to win. Thus, a better encoding of the identities is: 0, 10, 110, 1110, 111100, 111101, 111110, 111111. In this case the corresponding lengths of the strings encoding the identity of each car are: 1, 2, 3, 4, 6, 6, 6, 6 for an average l=2 bits. Note that we have computed the average as l=∑pili This is optimal encoding is possible because the Shannon entropy is 2 bits Lecture 16