1 / 49

Understanding Information Theory and Complexity

This text introduces the concepts of information theory and complexity, covering topics such as entropy, self-information, and probability spaces. It explores the relationship between information and complexity in complex systems.

Download Presentation

Understanding Information Theory and Complexity

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.


Presentation Transcript

  1. 6 5 4 I(p) 3 2 1 0 1 0 0.5 p NECSI Summer School 2008Week 3: Methods for the Study of Complex SystemsInformation Theory Hiroki Sayama sayama@binghamton.edu

  2. Four approaches to complexity Nonlinear Dynamics Complexity = No closed-form solution, Chaos Information Complexity = Length of description, Entropy Computation Complexity = Computational time/space, Algorithmic complexity Collective Behavior Complexity = Multi-scale patterns, Emergence

  3. Information? • Matter Known since ancient times • Energy Knows since 19th century (industrial revolution) • Information Known since 20th century (WW’s, rise of computers)

  4. An informal definition of information Aspects of some physical phenomenon that can be used to select a smaller set of options out of the original set of options (Things that reduce the number of possibilities) • An observer or interpreter involved • A default set of options needed

  5. Quantitative Definition of Information

  6. Quantitative definition of information • If something is expected to occur almost certainly, its occurrence should have nearly zero information • If something is expected to occur very rarely, its occurrence should have very large information If an event is expected to occur with probability p,the information produced by its occurrence (self-information) is given by I(p) = - log p

  7. 6 5 4 3 2 1 0 0 0.5 1 Quantitative definition of information I(p) = - log p • 2 is often used as the base of log • Unit of information is bit (binary digit) I(p) p

  8. Why log? • To fulfill the additivity of information • For independent events A and B: Self-information of “A happened”: I(pA) Self-information of “B happened”: I(pB) Self-information of “A and B happened”: I(pApB) = I(pA) + I(pB) “I(p) = - log p” satisfies this additivity

  9. Exercise • You picked up a card from a well-shuffled deck of cards (w/o jokers): • How much self-information does the event “the card is of spade” have? • How much self-information does the event “the card is a king” have? • How much self-information does the event “the card is a king of spades” have?

  10. Information Entropy

  11. Some terminologies • Event: An individual outcome (or a set of outcomes) to which a probability of its occurrence can be assigned • Sample space: A set of all possible individual events • Probability space: A combination of sample space and probability distribution (i.e., probabilities assigned to individual events)

  12. Probability distribution and expected self-information • Probability distribution in probability space A: pi (i = 1…n, Si pi = 1) • Expected self-information H(A) when one of the individual events happened: H(A) = Si pi I(pi) = - Si pi log pi

  13. What does H(A) mean? • Average amount of self-information the observer could obtain by one observation • Average “newsworthiness” the observer should expect for one event • Ambiguity of knowledge the observer had about the system before observation • Amount of “ignorance” the observer had about the system before observation

  14. What does H(A) mean? • Amount of “ignorance” the observer had about the system before observation • It quantitatively shows the lack of information (not the presence of information) before observation Information Entropy

  15. Information entropy • Similar to thermodynamic entropy both conceptually and mathematically • Entropy is zero if the system state is uniquely determined with no fluctuation • Entropy increases as the randomness increases within the system • Entropy is maximal if the system is completely random (i.e., if every event is equally likely to occur)

  16. Exercise • Prove the following: Entropy is maximal if the system is completely random (i.e., if every event is equally likely to occur) • Show that f(p1, p2, …, pn) = - Si=1~n pi log pi(with Si=1~n pi = 1) takes its maximum when pi = 1/n • Remove one variable using the constraint • Or use the method of Lagrange multipliers

  17. Entropy and complex systems • Entropy shows how much information would be needed to fully specify the system’s state in every single detail • Ordered -> low information entropy • Disordered -> high information entropy • May not be consistent with the usual notion of “complexity” • Multiscale views are needed to address this issue

  18. Information Entropy and Multiple Probability Spaces

  19. Probability of composite events • Probability of composite event (x, y): p(x, y) = p(y, x) = p(x | y) p(y) = p(y | x) p(x) • p(x | y): Conditional probability for x to occur when y already occurred • p(x | y) = p(x) if X and Y are independent from each other

  20. Exercise: Bayes’ theorem • Define p(x | y) using p(y | x) and p(x) • Use the following formula as needed • p(x) = Sy p(x, y) • p(y) = Sx p(x, y) • p(x, y) = p(y | x) p(x) = p(x | y) p(y)

  21. Product probability space • Prob. space X: {x1, x2}, {p(x1), p(x2)} • Prob. space Y: {y1, y2}, {p(y1), p(y2)} • Product probability space XY: {(x1, y1), (x1, y2), (x2, y1), (x2, y2)}, {p(x1, y1), p(x1, y2), p(x2, y1), p(x2, y2)} Composite events

  22. Joint entropy • Entropy of product probability space XY: H(XY) = - SxSy p(x, y) log p(x, y) • H(XY) = H(YX) • If X and Y are independent: H(XY) = H(X) + H(Y) • If Y completely depends on X: H(XY) = H(X) ( >= H(Y) )

  23. Conditional entropy • Expected entropy of Y when a specific event occurred in X: H(Y | X) = Sx p(x) H(Y | X=x) = - Sx p(x) Sy p(y | x) log p(y | x) = - SxSy p(y, x) log p(y | x) • If X and Y are independent: H(Y | X) = H(Y) • If Y completely depends on X: H(Y | X) = 0

  24. Exercise • Prove the following: H(Y | X) = H(YX) - H(X) • Hint: Use Bayes’ theorem

  25. Mutual Information

  26. I(Y; X)= Mutual information Mutual information • Conditional entropy measures how much ambiguity still remains on Y after observing an event on X • Reduction of ambiguity on Y by one observation on X can be written as: H(Y) – H(Y | X)

  27. Symmetry of mutual information I(Y; X) = H(Y) – H(Y | X) = H(Y) + H(X) – H(YX) = H(X) + H(Y) – H(XY) = I(X; Y) Mutual information is symmetric in terms of X and Y

  28. Exercise • Prove the following: • If X and Y are independent: I(X; Y) = 0 • If Y completely depends on X: I(X; Y) = H(Y)

  29. Exercise • Measure the mutual information between the two systems on the right:

  30. Use of mutual information • Mutual information can be used to measure how much interaction exists between two subsystems in a complex system • Correlation only works for quantitative measures and detects only linear relationships • Mutual information works for qualitative (discrete, symbolic) measures and nonlinear relationships as well

  31. Information Source

  32. Information source • Sequence of values of a random variable that obeys some probabilistic rules • Sequence may be over time or space • Values (events) may or may not be independent from each other • Example: • Repeated coin tosses • Sound • Visual image

  33. Memoryless and Markov information sources 01010010001011011001101000110 Memoryless information source p(0) = p(1) = 1/2 01000000111111001110001111111 Markov information source p(1|0) = p(0|1) = 1/4

  34. Markov information source • Information source whose probability distribution at time t depends only on its immediate past value Xt-1 (or past n valuesXt-1, Xt-2, ..., Xt-n) • Cases n>1 can be converted into n=1 form by defining composite events • Probabilistic rules are given as a set of conditional probabilities, which can be written in the form of a transition probability matrix (TPM)

  35. 1/4 0 1 3/4 3/4 1/4 State-transition diagram 01000000111111001110001111111 Markov information source p(1|0) = p(0|1) = 1/4

  36. Probability vector at time t Probability vector at time t-1 TPM Matrix representation 01000000111111001110001111111 Markov information source p(1|0) = p(0|1) = 1/4 p0 p1 p0 p1 =

  37. Exercise abcaccaabccccaaabc aaccacaccaaaaabcc • Consider the above sequence as a Markov information source and create its state-transition diagram and matrix representation

  38. Review: Convenient properties of transition probability matrix • The product of two TPMs is also a TPM • All TPMs have eigenvalue 1 • |l|  1 for all eigenvalues of any TPM • If the transition network is strongly connected, the TPM has one and only one eigenvalue 1 (no degeneration)

  39. Review: TPM and asymptotic probability distribution • |l|  1 for all eigenvalues of any TPM • If the transition network is strongly connected, the TPM has one and only one eigenvalue 1 (no degeneration) → This eigenvalue is a unique dominant eigenvalue and the probability vector will eventually converge to its corresponding eigenvector

  40. Exercise • Calculate the asymptotic probability distribution of the following: 01000000111111001110001111111 Markov information source p(1|0) = p(0|1) = 1/4 p0 p1 p0 p1 =

  41. Calculating Entropy of Markov Information Source

  42. Review: Information entropy • Expected information H(A) when one of the individual events happened: H(A) = Si pi I(pi) = - Si pi log pi • This applies only to memoryless information source in which events are independent from each other

  43. Generalizing information entropy • For other types of information source where events are not independent, information entropy is defined as: H{X} = limk→∞ H(Xk+1 | X1X2…Xk) Xk: k-th value of random variable X

  44. Calculating information entropy of Markov information source (1) H{X} = limk→∞ H(Xk+1 | X1X2…Xk) • This means the expected entropy of the k+1-th value given a specific history of past k values • All that matter is the last value of the history, so let’s focus on Xk

  45. Calculating information entropy of Markov information source (2) • p(Xk=x): Probability for the last (k-th) value to be x H(Xk+1 | X1X2…Xk) = Sx p(Xk=x) H(Xk+1 | Xk=x) = - Sx p(Xk=x) Sy ayx log ayx = Sx p(Xk=x) h(ax) • ayx: y-th row x-th column element in TPM • h(ax): Entropy of x-th column vector in TPM

  46. Calculating information entropy of Markov information source (3) H(Xk+1 | X1X2…Xk) = Sx p(Xk=x) h(ax) • If the information source has only one asymptotic probability distribution q: limk→∞ p(Xk=x) = qx (q’s x-th element) H{X} = limk→∞ H(Xk+1 | X1X2…Xk) = h·q • h: A row vector whose x-th element is h(ax)

  47. Calculating information entropy of Markov information source (4) H{X} = limk→∞ H(Xk+1 | X1X2…Xk) = h·q • Information entropy of Markov information source is given by the average of entropies of its TPM’s column vectors weighted by its asymptotic probability distribution • If the information source has only one asymptotic probability distribution

  48. Exercise • Calculate information entropy of the following Markov information source we discussed earlier: 01000000111111001110001111111 abcaccaabccccaaabc aaccacaccaaaaabcc

  49. Summary • Complexity of a system may be characterized using information • Length of description • Entropy (ambiguity of knowledge) • Mutual information quantifies the coupling between two components within a system • Entropy may be measured for Markov information sources as well

More Related