Coarse-Graining, Symbolic Description, and Complexity

Coarse-Graining, SymbolicDescription, and Complexity Bailin Hao Institute of Theoretical Physics, Academia Sinica, Beijing T-Life Research Center, Fudan University, Shanghai The Santa Fe Institute, New Mexico http://tlife.fudan.edu.cn/ http://www.itp.ac.cn/~hao/ CSSS2007 Beijing

There are complex systems and complex behavior in natural and social phenomena. • Complexity goes with specificity. There is no universal measure of complexity. • One must specify the phenomenon under consideration and set a framework for study. • This lecture will describe one such framework. (Dave Feldman has provided most of the prerequisites.)

Start from an observation u d c s b t (Quarks with charge, mass, flavor, charm, …) p n e (Particles with charge, mass, spin, magnetic momentum, …) H C N O P … (Atoms with atomic number, ion radius, valence, affinity, …) H2O NO CO2… (Molecules with molecular weight, polarity, color, …) a c g t (Nucleotides with strong or weak coupling) A D E F G H … W Y V (Amino acids with different physico-chemical properties) BRCA1 PDGF (Genes, proteins, “words” taken as single symbols appear in pathways and networks) … … … … … … …

(Almost) all symbols in scienceare embodiment ofCoarse-Graining Coarse-graining may lead to rigorous results Geoffrey West: had Galileo be equipped with our high precision measuring instruments he would not be able to discover the law of free falling body and would have to write a 42-volume Treatise on Falling Bodies.

Coarse-Grained Description of Nature↓Use of Symbols↓Symbolic Sequences↓Language, Grammar, Automaton

A Reminder: Theorem 3 in Shannon’s Famous 1948 Paper • Theorem 3: Given any ε>0 andδ>0, we can find an N0 such that the sequences of any length N>=N0 fall into two classes: • A set whose total probability is less than ε. • The remainder, all of whose members have probabilities satisfying the inequality

Intuitive Explanation of the Theorem • There are 2N sequences of length N over the alphabet (0, 1) • Roughly speaking these sequences are divided into two groups when N is very large • A big group of typical sequences • A small group of atypical sequences: 0N, 1N, (01)N, (10)N, … and more complex ones which must be characterized almost individually

Our Starting Points: • Complexity goes with specificity. • Therefore, one has to look at real data. • These data are often noisy, incomplete and of low Signal/Noise ratio. This is especially true for biological data. • Therefore, statistical methods are must, but one should go beyond statistics. • Visualization with a certain degree of coarse-graining is crucial for highlighting the “regularities” immersed in huge amount of data.

Symbolic sequences naturallylead toLangauge and grammar

Language and Grammatical Complexity Alphabet  Example 1.  = {a, c, g, t} Example 2.  = {A, C, D … W, Y} Example 3.  = {a, … z, A, … Z, +, –, …} All possible strings made of symbols from the alphabet plus an empty string ε→ * Any subset of * is called a language over the alphabet  Grammar= {Alphabet, Initial symbols, Production Rules}

Classification of Formal Languages Chomsky Hierarchy Sequential production rules Lindenmayer Systems Parallel production rules

Chomsky Hierarchy of Formal Languages

a b (i) (ii)  (a, R) = b A Finite State Automaton (FA) A transfer function

FA: Finite State Automata • Deterministic FA • Non-Deterministic FA • Equivalence of DFA and NDFA: subset construction • Minimal DFA • Myhill-Nerode theorem (1958): number of nodes in minDFA

A Pushdown Automaton Pushdown list Stack First In Last Out (FILO)

A Turing MachineAlan M. Turing (1912-1954) FA +  R/W tape Church-Turing Thesis (1936): Any effective (mechanical) computation can be carried out by a Turing machine

Example: {ai b ici | i>0} CSL Terminals = {a, b, c} Non-terminal = {A, B} Sequential rules: B aBAc | abc bA bb cA Ac B abc B aBAc aabcAc aabAcc→aabbcc B aBAc aaBAcAc aaBAAcc aaabcAAcc aaabAcAcc aaabbAccc

Classification of Formal Languages Chomsky Hierarchy Sequential production rules Lindenmayer Systems Parallel production rules

Development of Anabaena catenula (串珠藻项圈藻属) br ar ar albr bl al al blar br bl ar al albr blar Alphabet: S = {ar, al, br, bl} Production rules: Initial symbol (axiom) = ar Grammar: G = (S, P, ) Language: L (G)  S* P =

Lindenmayer Systems Parallel production rules. Finer classification D0L –Deterministic, no interaction, i.e., context-free 0L – non-deterministic, no interaction IL – non-deterministic, with Interaction, i.e., context sensitive T0L – with Table of production rules TIL – E0L – Extended to non-terminal symbols ET0L – EIL REL of Chomsky

CSL CFL RGL FIN DOL RGL Regular CFL Context-Free CSL Context-Sensitive REL Recursively Enumerable REL

0:REL EIL 1:CSL IND ET0L IL E0L Chomsky Lindenmayer Indexed 2:CFL T0L 3:RGL 0L D0L

Example a la Lindenmayer L = {aibici | i > 0} CSL G = (S, T, )  = abc S = {a, b, c} T = {t1, t2} T1= {a aa, b bb, c cc} T2 = {a , b , c } T0L

Dyck language: A language of nested parentheses • Many but finite types of parentheses • Matched parentheses • Finite depth of nesting • Context-free language (CFL) • Tree structures, list structures, RNA secondary structures (without pseudoknots), etc.

Factorizable Languages • Symbolic dynamics leads to factorizable languages • A complete genome defines a factorizable langauge • An amino acid sequence with unique reconstruction (at certain K) defines a factorizable language • More on factorizable language in next lecture

Coarse-Grained Dynamics ↓ Symbolic Dynamics

Graphic iteration of a map

Coarse-Graining in Dynamics • Phase space → L, R • Numerical orbit → symbolic orbit • Many to one correspondence • Possibility for classification

Basic properties • Natural order on the interval: L < C < R • Monotonicity: L and fL↑; R and fR ↓ • Parrity: L +; R – • L preserves order, R reverses order • Continuity: L←C→R • L→L(y)≡fL-1(y) • R→R(y)≡fR-1(y)

Infinitely many numerical orbits Only two symbolic orbits: L∞ and RL∞ Simple dynamicsSimple language: 2 word types only

Langauges in unimodal map: 1991 • Feigenbaum attractor corresponds to a CSL; • Are there other CSL and CFL? • Periodic and eventual periodic orbits are RGL • Are there other RGL?

Periodic orbit (RLRRC)∞ andFinite State Automaton x0=CRLRRC… x1=RLRRC… x2=LRRC… x3=RRC… x4=RC…

Transformation of subintervalsUnder (RLRRC)∞ L: a → c+d R: b → d R: c → b+c R: d → a

Transfer Matrix and Transfer Function 0 0 1 1 0 0 0 1 0 1 1 0 1 0 0 0 States: a, b, c, d Input: R, L

Nondeterministic Finite State Automaton for (RLRRC)∞

Subset construction

Deterministic Finite State Automaton for (RLRRC)∞

Are there other RGLs inUnimodal maps? Theorem (Xie, 1993) In the dynamical languages of unimodal maps the class of RGLs contains only periodic and eventually periodic sequences.

Fibonacci sequences • Fibonacci numbers: 0, 1, 1, 2, 3, 5, 8, 13,… • F0=0, F1=1, Fn=Fn-1+Fn-2 • Periodic orbits with period Fn n=0,1,2,3,… • How about n→∞? • There are many different Fibonacci sequences in the unimodal map

How to go beyond RGLs? Block concatenation: B2n=b2(n-1)b2n-1 B2n+1=b2nb2n-1 • (a) b0=L, b1=RR • (b) b0=R, b1=LR • (c) b0=L, b1=RL • (a) b0=R, b1=LL

How about (bn)∞? • Finite n: must be RGL • Infinite n? The closure at n→∞ may be non-RGL • Indeed, it is non-RGL • Is it CFL or CSL? • How to comprehend infinite “periodic” orbits? Transfer matrices come to our help

Transfer matrix for case (a)

Xie Huimin and a PhD student proved 50 lemmas in 40 daysandThe last lemma says: case(a) corresponds to CSL.

Transfer matrix for case (b)

Transfer matrix for case (c)

Transfer matrix for case (d)

It is easier to prove that cases (b), (c), and (d) all correspond to CSLs.

Conjecture: there is no CFL inUnimodal maps (Xie, 1996) An open conjecture for 11 years

Dynamical languages inUnimodal maps 1991 1999

Coarse-Graining, Symbolic Description, and Complexity

Coarse-Graining, Symbolic Description, and Complexity

Presentation Transcript

Coarse Dispersions

Coarse-graining biochemical complexity

Coarse Dispersion

Symbolic Description and Visual Querying of Image Sequences Using Spatio-Temporal Logic

Coarse Graining and Mesoscopic Simulations

Symbolic

Symbolic AI and SOAR

Coarse-graining and Entropy Production in a Climate Model - Part 3-

Coarse Differentiation and Planar Multiflows

Topics 3: Polynomials. Discrete structures. Algebraic complexity. Symbolic-numeric

Symbolic

Coarse-Grained Transactions

Algebraic and Symbolic Reasoning

Representation of Symbolic Objects According to the description structure

Language and Symbolic Development

symbolic

Algorithms and Complexity 2: Complexity Notation

Commutativity and Coarse-Grained Transactions

Coarse Graining and Mesoscopic Simulations

Origin of the exponential complexity: ensemble description.