630 likes | 761 Views
Today:. Entropy <break> Information Theory. Information Theory. Claude Shannon Ph.D. 1916-2001. Entropy. Entropy. A measure of the disorder in a system. Entropy. The (average) number of yes/no questions needed to completely specify the state of a system.
E N D
Today: Entropy <break> Information Theory
Claude Shannon Ph.D. 1916-2001
Entropy A measure of the disorder in a system
Entropy The (average) number of yes/no questions needed to completely specify the state of a system
The (average) number of yes/no questions needed to completely specify the state of a system
number of yes-no questions number of states = 2 2 states. 1 question. 4 states. 2 questions. 8 states. 3 questions. 16 states. 4 questions.
number of yes-no questions number of states = 2 log2(number of states) = number of yes-no questions
His entropy, the number of yes-no questions required to specify the state of the system n is the number of states of the system, assumed (for now) to be equally likely
The Six Sided Die H = log2(6) = 2.585 bits
The Four Sided Die H = log2(4) = 2.000 bits
The Twenty Sided Die H = log2(20) = 4.322 bits
What about all three dice? H = log2(4620)
What about all three dice? H = log2(4)+log2(6)+log2(20)
What about all three dice? H = 8.907 bits
What about all three dice? Entropy, from independent elements of a system, adds
Trivial Fact 1: log2(x) = - log2(1/x) Let’s the rewrite this a bit...
Trivial Fact 2:if there are n equally likely possibilites p = (1/n) Trivial Fact 1: log2(x) = - log2(1/x)
Trivial Fact 2:if there are n equally likely possibilites p = (1/n)
What if the n states are not equally probable? Maybe we should use the expected value of the entropies, a weighted average by probability
Let’s do a simple example: n = 2 , how does H change as we varyp1andp2 ?
n = 2 p1 + p2 = 1
n = 3 p1 + p2 + p3 = 1 how about n = 3
The bottom line intuitions for Entropy: • Entropy is a statistic for describing a probability distribution. • Probabilities distributions which are flat, broad, sparse, etc. have HIGH entropy. • Probability distributions which are peaked, sharp, narrow, compact etc. have LOW entropy. • Entropy adds for independent elements of a system, thus entropy grows with the dimensionality of the probability distribution. • Entropy is zero IFF the system is in a definite state, i.e. p = 1 somewhere and 0 everywhere else.
Pop Quiz: 2. 1. 3. 4.
Entropy The (average) number of yes/no questions needed to completely specify the state of a system
11:16 am (Pacific) on June 29th of the year 2001, there were approximately 816,119 words in the English Language H(english) = 19.6 bits Twenty Questions:220 = 1,048,576 What’s a winning 20 Questions Strategy?
So, what is information? It’s a change in what you don’t know. It’s a change in the entropy.
1 1 probability probability 1/2 1/2 0 0 tails tails heads heads I (X;Y) = H(Y) - H(Y|X) = 0 bits H(Y) = 1 H(Y|x=heads) = 1 P(Y) P(Y|x=heads )
1 1 probability probability 1/2 1/2 0 0 tails tails heads heads I (X;Y) = H(Y) - H(Y|X) ~ 1 bit H(Y) = 1 H(Y|x=heads) ~ 0 P(Y) P(Y|x=heads )
The Critical Observation: Information is Mutual I(X;Y) = I(Y;X) H(Y)-H(Y|X) = H(X)-H(X|Y)
The Critical Observation: I(Stimulus;Spike) = I(Spike;Stimulus) What a spike tells the Brain about the stimulus, is the same as what our stimulus choice tells us about the likelihood of a spike.
The Critical Observation: stimulus response This, we can measure.... What our stimulus choice tells us about the likelihood of a spike.
Estimate: P( neural response |stimulus presented ) From that, Estimate: P( neural repsones ) How to use Information Theory: Show your system stimuli. Measure neural responses. Compute: H(neural response) and H(neural response | stimulus presented) Calculate: I(response ; stimulus)