230 likes | 436 Views
More on complexity measures. Statistical complexity. J. P. Crutchfield. The calculi of emergence. Physica D . 1994. complex random. C. randomness, H. I. H. Entropy and algorithmic complexity associate maximum complexity with randomness
E N D
More on complexity measures Statistical complexity J. P. Crutchfield. The calculi of emergence. Physica D. 1994
complex random C randomness,H I H • Entropy and algorithmic complexity associate maximum complexity with randomness • pure order and pure noise are not “complex” • complex systems have • intricate structure on multiple scales • repeating patterns • continual variation, … • complexity lies between order and chaos • Wolfram’s class 4 CAs • Langton’s “edge of chaos” • Mutual information shows complexity • RBN transition example (also k-SAT): • are there other measures like this?
when randomness = noise • the measures so far assume that randomness is information • even logical depth • Randomness is not very “deep” information • Sometimes, the “randomness” actually is information • the output of good compression algorithms is highly “random” • else the remaining structure could be used to compress it more “any sufficiently advanced communication is indistinguishable from noise” • crypto functions output “random” strings • else the remaining structure could be used to break the code
randomness and noise • in the real world, (some) randomness is just “noise” • of no interest, carrying no “information” • these pictures are all different microscopically, but all just “white noise” macroscopically • the differences are not important • information measures “overfit” noise as data • this kind of noisy randomness is intuitively simple • a small change to the noise, is just the same noise
to model a coin toss … H | ½ T | ½ • how would you create an ensemble of random bit strings? • … just toss a coin! • in other words, use a stochastic automaton • that’s quite a short description • conforming to our intuition that random strings are not very complex
Statistical complexity • In certain circumstances, we can use theory of discrete computation and statistics to create equivalent models • Needs a discrete stochastic process that is conditionally stable • Future states do not depend on time, but only on previous states • Complexity, C is the size of a minimal model yielding a finite description that is at the least computationally powerful level • infer the machine from data ensemble • The collection of observed strings generated by process of interest • Statistical complexity ignores the “computational resource” • So randomness and periodicity have zero complexity J. P. Crutchfield. The calculi of emergence. Physica D. 1994
the inferred minimal model is called an e-machine • minimal model • size of the minimal stochastic machine • finite description • size of machine does not grow unboundedly with the size of the state • least computationally powerful level • e.g. finite state automaton, stack machine, UTM • Intuition: • Each observation represents a state, which incorporates an indirect indication of the hidden environment • States that lead to the same next state help to predict the environment • Causal states • An e–machine captures a minimal sequence of causal states J. P. Crutchfield. The calculi of emergence. Physica D. 1994
Consider a simple process • The process is a simple automaton • A system with a two-symbol alphabet, α = {0,1} • Two recurrent states, A and B • State A can, with equal probability, • emit a 0 and return to itself • emit a 1 and go to state B • State B always emits 1 and goes to A • But all we have is a black-box process • This is Weiss’s “even process”: • a 1 cannot be completely surrounded by other 1s C. R. Shalizi, K. L. Shalizi, J. P. Crutchfield. An algorithm for pattern discovery in time series. 2002. http://arxiv.org/pdf/cs/0210025v3.pdf
Record the process output • We need to deduce the automaton from data observations • Run the process many times • To get statistically useful data • e.g. 104 runs to word length = 4 C. R. Shalizi, K. L. Shalizi, J. P. Crutchfield. An algorithm for pattern discovery in time series. 2002. http://arxiv.org/pdf/cs/0210025v3.pdf
example : “even” process (1) • Work out probabilities and infer a machine • “homogonisation” because homogeneous states are merged • Merge is the main source of error – need a lot of observations For full calculation, see: C. R. Shalizi, K. L. Shalizi, J. P. Crutchfield. An algorithm for pattern discovery in time series. 2002. http://arxiv.org/pdf/cs/0210025v3.pdf
example : “even” process (2) • Check all states have incoming transitions • Reachability • Remove transient states • A and B form a transient cycle • The only exit is to produce a 0 and go to C • Every C state goes to C (adding 0) or D (adding 1) • Every state in D goes to C (adding 1) • “Determinisation” • Final e–machine has states C and D only C. R. Shalizi, K. L. Shalizi, J. P. Crutchfield. An algorithm for pattern discovery in time series. 2002. http://arxiv.org/pdf/cs/0210025v3.pdf
e–machines and stability • Replicating a process in an e–machine requires stability • Previous states aren’t always “causal” in unstable systems • Stability is related to temporal scale • Recall flocking • At the level of birds, apparently arbitrary motion, few patterns • At the level of the flock, coherent, apparently co-ordinated motion • So, can change level (scale) to one where there is stability • A bit like choosing the level to represent in differential equations • We can tell a system is not suitably stable if the inferred e–machine changes with the word length • That is, as the process runs over time, the e–machine has to change to express its statistical behaviour
e–machines and continuous systems • Most natural systems are continuous • Symbolic dynamics used to extract discrete time systems • Partition the state space and label each partition with a symbol • Over time, each point in the state space has a sequence of symbols • Its symbol at each observation point in its past and future • Loses information • Often deterministic continuous system gives stochastic discrete system Ц Point a is in region Ж at time t Over a series of discrete time observations, amoves through different regions: ... ЖЖЖϠϠЂЂЂЂ ЖЖ … Ц Ж Ж a Ђ a a a a a a a a a Ϡ Ђ Ϡ Ґ Ґ http://vserver1.cscs.lsa.umich.edu/~crshalizi/notabene/symbolic-dynamics.html (and citations)
Symbolic dynamics (1) b a d c • recast a continuous (space / time) dynamical system into a discrete one • partition the continuous phase space U into a finite number of sets, each labelled with a unique element from a finite alphabet : Ui • observe the system at discretised time intervals, and note the label of the set Ui it occupies, to give a sequence of symbols: • d c a a b d d a a … • rationale : sequences represent “results” of “measurements” of the underlying system
Symbolic dynamics (2) b a l= 3.5699… d c 3.5 < l < 4 • the symbolic dynamics of the system is the set of all sequences that can be produced (different initial conditions, etc) • defines a language • analyse the dynamics of these sequences • using entropy, mutual information, e-machines, etc. • e.g. Crutchfield’s analysis of the complexity and entropy of the logistic map: see J. P. Crutchfield. The calculi of emergence. Physica D. 1994
Analysis results for logistic map C randomness,H • periodic behaviour, small H, small C • automaton size = the period • chaotic behaviour, large H, small C • a small automaton captures the random behaviour (“coin toss”) • Complex behaviour, mid H, large C • near the transition from periodic to chaotic behaviour (“edge of chaos”) there is structure “on all scales” J. P. Crutchfield. The calculi of emergence. Physica D. 1994
Another complexity measure:multi-information • recall mutual information between two systems: • where H(X) is the entropy of system X • H(X,Y) is the joint entropy of the systems X and Y • I = 0 if X and Y are independent • For subsystems X1, X2, and the overall system X1,2 ,this gives:
multi-information (1) • multi-information generalises this to n subsystems of an overall system • SystemXS= X1,2,…n • Subsystems X1,X2, …, Xn • where • MI = 0 if all the subsystems are independent M. Studeny, J. Vejnarova. The multiinfomration function as a tool for measuring stochastic dependence. In Learning in Graphical Models. Kluwer, 1998
multi-information (2) XS Xa Xb X1 X2 … Xk … Xn • now consider partitioning the top level system XS into two subcomponents Xa, comprising subsystems X1,…, Xk , and Xb comprising subsystems Xk+1,…, Xn • the relationship between the multi-information of the whole system and its two big components is • rearranging, and substituting • so (unless the subsystems Xa and Xb are independent) : the MI of the whole is bigger than the parts
multi-information (3) XS X1 X2 X3 X12 X1 X2 X22 X1 X3 X32 X2 X3 • instead of considering one big subcomponent comprising k subsystems, now consider all possible such big subcomponents of k subsystems, each comprising subsystems Xi1,…, Xik • consider the average multi-information of these, • note that and • given the MI of the whole is bigger than the parts, we have • so the MI increases with the size of the subsystems considered
multi-information = complexity • complexity is the difference between actual increase of this average, and a linear increase: • C 0 • C is low if the system is random • all subsystems are independent, and so MI = 0 • C is low if the system is homogeneously structured • average MI increases linearly • C is high in the intermediate case, inhomogeneous groupings and clumpings • high, non-linearly increasing, average MIs G. Tononi, et al. A measure for brain complexity: relating functional segregation and integration in the nervous system. PNAS 91:5033-37, 1994
which complexity measure? • unconditional entropy is probably not appropriate • counts randomness as maximally “complex” • entropy variance readily calculated • between different space / time parts of self • algorithmic complexity K • useful for theoretical analyses, but not for analysing practical results • conditional entropy/mutual information/multi-information • between two systems • which can be between different space / time parts of self • appears to be maximised around interesting transitions • or between hierarchical levels of a system • statistical complexity C • of single system; appears to be maximised at “edge of chaos”
Some general sources • http://www.scholarpedia.org/article/Complexity • R. Badii, A. Politi. Complexity. Cambridge University Press. 1997 • J. P. Sethna. Statistical mechanics. Oxford University Press. 2006