Understanding Information Theory and Complexity

6 5 4 I(p) 3 2 1 0 1 0 0.5 p NECSI Summer School 2008Week 3: Methods for the Study of Complex SystemsInformation Theory Hiroki Sayama sayama@binghamton.edu

Four approaches to complexity Nonlinear Dynamics Complexity = No closed-form solution, Chaos Information Complexity = Length of description, Entropy Computation Complexity = Computational time/space, Algorithmic complexity Collective Behavior Complexity = Multi-scale patterns, Emergence

Information? • Matter Known since ancient times • Energy Knows since 19th century (industrial revolution) • Information Known since 20th century (WW’s, rise of computers)

An informal definition of information Aspects of some physical phenomenon that can be used to select a smaller set of options out of the original set of options (Things that reduce the number of possibilities) • An observer or interpreter involved • A default set of options needed

Quantitative Definition of Information

Quantitative definition of information • If something is expected to occur almost certainly, its occurrence should have nearly zero information • If something is expected to occur very rarely, its occurrence should have very large information If an event is expected to occur with probability p,the information produced by its occurrence (self-information) is given by I(p) = - log p

6 5 4 3 2 1 0 0 0.5 1 Quantitative definition of information I(p) = - log p • 2 is often used as the base of log • Unit of information is bit (binary digit) I(p) p

Why log? • To fulfill the additivity of information • For independent events A and B: Self-information of “A happened”: I(pA) Self-information of “B happened”: I(pB) Self-information of “A and B happened”: I(pApB) = I(pA) + I(pB) “I(p) = - log p” satisfies this additivity

Exercise • You picked up a card from a well-shuffled deck of cards (w/o jokers): • How much self-information does the event “the card is of spade” have? • How much self-information does the event “the card is a king” have? • How much self-information does the event “the card is a king of spades” have?

Information Entropy

Some terminologies • Event: An individual outcome (or a set of outcomes) to which a probability of its occurrence can be assigned • Sample space: A set of all possible individual events • Probability space: A combination of sample space and probability distribution (i.e., probabilities assigned to individual events)

Probability distribution and expected self-information • Probability distribution in probability space A: pi (i = 1…n, Si pi = 1) • Expected self-information H(A) when one of the individual events happened: H(A) = Si pi I(pi) = - Si pi log pi

What does H(A) mean? • Average amount of self-information the observer could obtain by one observation • Average “newsworthiness” the observer should expect for one event • Ambiguity of knowledge the observer had about the system before observation • Amount of “ignorance” the observer had about the system before observation

What does H(A) mean? • Amount of “ignorance” the observer had about the system before observation • It quantitatively shows the lack of information (not the presence of information) before observation Information Entropy

Information entropy • Similar to thermodynamic entropy both conceptually and mathematically • Entropy is zero if the system state is uniquely determined with no fluctuation • Entropy increases as the randomness increases within the system • Entropy is maximal if the system is completely random (i.e., if every event is equally likely to occur)

Exercise • Prove the following: Entropy is maximal if the system is completely random (i.e., if every event is equally likely to occur) • Show that f(p1, p2, …, pn) = - Si=1~n pi log pi(with Si=1~n pi = 1) takes its maximum when pi = 1/n • Remove one variable using the constraint • Or use the method of Lagrange multipliers

Entropy and complex systems • Entropy shows how much information would be needed to fully specify the system’s state in every single detail • Ordered -> low information entropy • Disordered -> high information entropy • May not be consistent with the usual notion of “complexity” • Multiscale views are needed to address this issue

Information Entropy and Multiple Probability Spaces

Probability of composite events • Probability of composite event (x, y): p(x, y) = p(y, x) = p(x | y) p(y) = p(y | x) p(x) • p(x | y): Conditional probability for x to occur when y already occurred • p(x | y) = p(x) if X and Y are independent from each other

Exercise: Bayes’ theorem • Define p(x | y) using p(y | x) and p(x) • Use the following formula as needed • p(x) = Sy p(x, y) • p(y) = Sx p(x, y) • p(x, y) = p(y | x) p(x) = p(x | y) p(y)

Product probability space • Prob. space X: {x1, x2}, {p(x1), p(x2)} • Prob. space Y: {y1, y2}, {p(y1), p(y2)} • Product probability space XY: {(x1, y1), (x1, y2), (x2, y1), (x2, y2)}, {p(x1, y1), p(x1, y2), p(x2, y1), p(x2, y2)} Composite events

Joint entropy • Entropy of product probability space XY: H(XY) = - SxSy p(x, y) log p(x, y) • H(XY) = H(YX) • If X and Y are independent: H(XY) = H(X) + H(Y) • If Y completely depends on X: H(XY) = H(X) ( >= H(Y) )

Exercise • Prove the following: H(Y | X) = H(YX) - H(X) • Hint: Use Bayes’ theorem

Mutual Information

I(Y; X)= Mutual information Mutual information • Conditional entropy measures how much ambiguity still remains on Y after observing an event on X • Reduction of ambiguity on Y by one observation on X can be written as: H(Y) – H(Y | X)

Symmetry of mutual information I(Y; X) = H(Y) – H(Y | X) = H(Y) + H(X) – H(YX) = H(X) + H(Y) – H(XY) = I(X; Y) Mutual information is symmetric in terms of X and Y

Exercise • Prove the following: • If X and Y are independent: I(X; Y) = 0 • If Y completely depends on X: I(X; Y) = H(Y)

Exercise • Measure the mutual information between the two systems on the right:

Use of mutual information • Mutual information can be used to measure how much interaction exists between two subsystems in a complex system • Correlation only works for quantitative measures and detects only linear relationships • Mutual information works for qualitative (discrete, symbolic) measures and nonlinear relationships as well

Information Source

Information source • Sequence of values of a random variable that obeys some probabilistic rules • Sequence may be over time or space • Values (events) may or may not be independent from each other • Example: • Repeated coin tosses • Sound • Visual image

Memoryless and Markov information sources 01010010001011011001101000110 Memoryless information source p(0) = p(1) = 1/2 01000000111111001110001111111 Markov information source p(1|0) = p(0|1) = 1/4

Markov information source • Information source whose probability distribution at time t depends only on its immediate past value Xt-1 (or past n valuesXt-1, Xt-2, ..., Xt-n) • Cases n>1 can be converted into n=1 form by defining composite events • Probabilistic rules are given as a set of conditional probabilities, which can be written in the form of a transition probability matrix (TPM)

1/4 0 1 3/4 3/4 1/4 State-transition diagram 01000000111111001110001111111 Markov information source p(1|0) = p(0|1) = 1/4

Probability vector at time t Probability vector at time t-1 TPM Matrix representation 01000000111111001110001111111 Markov information source p(1|0) = p(0|1) = 1/4 p0 p1 p0 p1 =

Exercise abcaccaabccccaaabc aaccacaccaaaaabcc • Consider the above sequence as a Markov information source and create its state-transition diagram and matrix representation

Review: Convenient properties of transition probability matrix • The product of two TPMs is also a TPM • All TPMs have eigenvalue 1 • |l|  1 for all eigenvalues of any TPM • If the transition network is strongly connected, the TPM has one and only one eigenvalue 1 (no degeneration)

Review: TPM and asymptotic probability distribution • |l|  1 for all eigenvalues of any TPM • If the transition network is strongly connected, the TPM has one and only one eigenvalue 1 (no degeneration) → This eigenvalue is a unique dominant eigenvalue and the probability vector will eventually converge to its corresponding eigenvector

Exercise • Calculate the asymptotic probability distribution of the following: 01000000111111001110001111111 Markov information source p(1|0) = p(0|1) = 1/4 p0 p1 p0 p1 =

Calculating Entropy of Markov Information Source

Review: Information entropy • Expected information H(A) when one of the individual events happened: H(A) = Si pi I(pi) = - Si pi log pi • This applies only to memoryless information source in which events are independent from each other

Generalizing information entropy • For other types of information source where events are not independent, information entropy is defined as: H{X} = limk→∞ H(Xk+1 | X1X2…Xk) Xk: k-th value of random variable X

Calculating information entropy of Markov information source (1) H{X} = limk→∞ H(Xk+1 | X1X2…Xk) • This means the expected entropy of the k+1-th value given a specific history of past k values • All that matter is the last value of the history, so let’s focus on Xk

Calculating information entropy of Markov information source (2) • p(Xk=x): Probability for the last (k-th) value to be x H(Xk+1 | X1X2…Xk) = Sx p(Xk=x) H(Xk+1 | Xk=x) = - Sx p(Xk=x) Sy ayx log ayx = Sx p(Xk=x) h(ax) • ayx: y-th row x-th column element in TPM • h(ax): Entropy of x-th column vector in TPM

Calculating information entropy of Markov information source (3) H(Xk+1 | X1X2…Xk) = Sx p(Xk=x) h(ax) • If the information source has only one asymptotic probability distribution q: limk→∞ p(Xk=x) = qx (q’s x-th element) H{X} = limk→∞ H(Xk+1 | X1X2…Xk) = h·q • h: A row vector whose x-th element is h(ax)

Calculating information entropy of Markov information source (4) H{X} = limk→∞ H(Xk+1 | X1X2…Xk) = h·q • Information entropy of Markov information source is given by the average of entropies of its TPM’s column vectors weighted by its asymptotic probability distribution • If the information source has only one asymptotic probability distribution

Exercise • Calculate information entropy of the following Markov information source we discussed earlier: 01000000111111001110001111111 abcaccaabccccaaabc aaccacaccaaaaabcc

Summary • Complexity of a system may be characterized using information • Length of description • Entropy (ambiguity of knowledge) • Mutual information quantifies the coupling between two components within a system • Entropy may be measured for Markov information sources as well

Understanding Information Theory and Complexity

Understanding Information Theory and Complexity

Presentation Transcript

Games as Cybernetic Systems (Ch. 18)

Introduction to information theory

International Environmental Law and Complex Systems Theory

Summer School Information Meeting

Complex Information Systems

Systems Theory study group

Agent-Based Modeling of Complex Adaptive Systems

Systems theory

Proposal: Second Annual School of Information Theory

Information Systems â€“ Week 5

Santa Fe Institute Complex Systems Summer School 2003

Zhangang Han Beijing Normal University BNU Complex Systems Summer School July 21, 2011

Systems theory

Information Systems – Week 12

HHS 497 uop course/uophelp

BUS 644 ash ASH Courses / bus644dotcom

BUS 644 Mart Peer Educator/bus644martdotcom

BUS 644 Tutor Peer Educator/bus644tutordotcom

HHS 497 Potential Instructors / tutorialrank.com

AJS 532 tutor Absolute Tutors/ajs532tutor.com

PSY 330 HELP Absolute Tutors/psy330helpdotcom