390 likes | 542 Views
The Origin of Entropy. Rick Chang. Agenda. Introduction Reference What is information? A straight forward way to derive the form of entropy A mathematical way to derive the form of entropy Conclusion. Introduction. We use entropy matrices
E N D
The Origin of Entropy Rick Chang
Agenda • Introduction • Reference • What is information? • A straight forward way to derive the form of entropy • A mathematical way to derive the form of entropy • Conclusion TEIL @ NTU
Introduction • We use entropy matrices to measure dependencies of any pairs of genes, but why ? • What is entropy? TEIL @ NTU
Introduction – cont. • I will : try to explain what information, entropy are • I will not : tell you how entropy is related to GA - I don’t know (may be a future work) TEIL @ NTU
References • A mathematical theory of communication By C.E. Shannon 1949 part I , Appendix 2 • Information theory, Inference, and learning algorithms By David J.C MacKay 2003 chapter 1, 4 • Information theory and reliable communication By Robert G. Gallager 1976 chapter 2 TEIL @ NTU
Shannon 1916 ~ 2001 TEIL @ NTU
What is information? • Ensemble • The outcome x is the value of a random variable, which takes on one of a set of possible values, having probabilities with and TEIL @ NTU
What is information? TEIL @ NTU
What is information? • Hartley R. V. L. “Transmission of Information“ : If the number of messages in the set is finite then this number or any monotonic function of this number can be regarded as a measure of the information produced when one message is chosen from the set, all choices being equally likely. TEIL @ NTU
A Straight forward way • When we try to measure the influence of event y to event x, we may consider > 1 : when occurrence of event y increase our belief of event x = 1 : event x and y are independent < 1 TEIL @ NTU
A Straight forward way – cont. • We define the information provided about the event x by the occurrence of event y is > 0 : when appearance of event y increase our belief of event x = 0 : event x and y are independent < 0 TEIL @ NTU
Why use logarithmic? • More convenient • practically more useful • nearer to our intuitive feeling we intuitively measures entities by linear comparison • mathematically more suitable Many of the limiting operations are simple in terms of the logarithm TEIL @ NTU
Mutual information = I (y ; x) Mutual information between event x and event y TEIL @ NTU
Mutual information – cont. • Mutual information => use logarithmic to quantify the difference between the belief of event x given event y and the belief of event x => the amount of uncertainty of event x we can resolve after the occurrenceof event y TEIL @ NTU
Self-information • Consider an event y, p(x | y) = 1 => the amount of uncertainty of event x we resolve after we know event x will certainly occur => the priori uncertainty of the event x • Define Self-information of event x TEIL @ NTU
Intuitively Information about the system We know everything about the system Our priori knowledge about event x TEIL @ NTU
Intuitively – cont. Information about the system We know everything about the system Our priori knowledge about event x After we know event x will certainly occur TEIL @ NTU
Intuitively – cont. Information about the system Information of event x TEIL @ NTU Uncertainty of event x
Conditional Self-information • Same, define conditional self-information of event x, given the occurrence of event y • We now have TEIL @ NTU
Intuitively – cont. Information about event x We know everything about event x (we know event x will certainly occur) Our priori knowledge about event x After the occurrence of event y TEIL @ NTU
Intuitively – cont. Information about event x Mutual Information between event x and event y TEIL @ NTU
A Straight Forward Way – cont. • Like above, define self-information of event x and event y • We now have TEIL @ NTU
A Straight Forward Way – cont. • The uncertainty of event y is never increased by knowledge of x TEIL @ NTU
From instance to expectation • I(x;y) • I(x) • I(x|y) • I(x,y) • I(x;y)=I(x)-I(x|y) • I(x,y)=I(x)+I(y)-I(x;y) • I(X;Y) • H(X) • H(X|Y) • H(X,Y) • I(X;Y)=H(X)-H(X|Y) • H(X,Y)=H(X)+H(Y)-I(X;Y) Average TEIL @ NTU
Relationship H(X,Y) H(X) H(Y) H(X|Y) I(X;Y) H(Y|X) TEIL @ NTU
Entropy • The entropy of an ensemble is defined to be the average value of the self-information of all event x TEIL @ NTU Average priori uncertainty of an ensemble
Interesting Properties of H(X) • H = 0 if and only if all the but one are zero, this one having the value unity. Thus only when we are certain of the outcome does H vanish. Otherwise H is positive. • For a given n, H is a maximum and equal to log(n) when all the are equal, i.e., . This is also intuitively the most uncertain situation. • Any change toward equalization of the probabilities ,…,increases H. TEIL @ NTU
A mathematical way • Can we find a measure of how uncertain we are of an ensemble ? • If there is such a measure, say, it is reasonable to require of it the following properties: • H should be continuous in the • If all the are equal, =1/n, then H should be a monotonic increasing function of n. • If a choice be broken down into two successive choices, the original H should be the weighted sum of the individual values of H. TEIL @ NTU
A mathematical way – cont. • If a choice be broken down into two successive choices, the original H should be the weighted sum of the individual values of H. TEIL @ NTU Second choice occurs half the time
A mathematical way – cont. • Theorem: The onlyH satisfying the three above properties is of the form: TEIL @ NTU
A mathematical way – cont. • Proof: Let From property(3) we can decompose a choice from equally likely possibilities into a series of m choices from s equally likely possibilities and obtain TEIL @ NTU m A(s)
A mathematical way – cont. • Similarly • We can choose n arbitrarily large and find an m to satisfy TEIL @ NTU
A mathematical way – cont. • from the monotonic property of A(n) TEIL @ NTU
A mathematical way – cont. • From equation (1) and (2) • We get A(t) = K log(t) , K must be positive to satisfy property (2) TEIL @ NTU
A mathematical way – cont. • Now suppose we have a choice from n possibilities with commeasurable probabilities where all are integers. • We can break down a choice from possibilities into a choice from n possibilities with probabilities and then, if the was chosen, a choice from with equal probabilities. TEIL @ NTU
A mathematical way – cont. • Using property (3) again, we equate the total choice from as computed by two methods TEIL @ NTU
A mathematical way – cont. • Hence • If the pi are not commeasurable, they may be approximated by rational and the same expression must hold by our continuity assumption (property(1) ). • The choice of coefficient K is a matter of convenience and amounts to the choice of a unit of measure. TEIL @ NTU
Conclusion • We first use a intuitive method to measure information content of an event or an ensemble • We explain why we choose logarithm intuitively • Mutual information, entropy is introduced • We show the relationship between information content and uncertainty • At last, we set three assumptions and derive the only way to measure information content and show that logarithm must be adopted. TEIL @ NTU
Thanks TEIL @ NTU