550 likes | 565 Views
Explore the concept of coarse-graining and symbolic description in complex systems. This lecture discusses the use of symbols, language, grammar, and automaton to study and understand complex phenomena.
E N D
Coarse-Graining, SymbolicDescription, and Complexity Bailin Hao Institute of Theoretical Physics, Academia Sinica, Beijing T-Life Research Center, Fudan University, Shanghai The Santa Fe Institute, New Mexico http://tlife.fudan.edu.cn/ http://www.itp.ac.cn/~hao/ CSSS2007 Beijing
There are complex systems and complex behavior in natural and social phenomena. • Complexity goes with specificity. There is no universal measure of complexity. • One must specify the phenomenon under consideration and set a framework for study. • This lecture will describe one such framework. (Dave Feldman has provided most of the prerequisites.)
Start from an observation u d c s b t (Quarks with charge, mass, flavor, charm, …) p n e (Particles with charge, mass, spin, magnetic momentum, …) H C N O P … (Atoms with atomic number, ion radius, valence, affinity, …) H2O NO CO2… (Molecules with molecular weight, polarity, color, …) a c g t (Nucleotides with strong or weak coupling) A D E F G H … W Y V (Amino acids with different physico-chemical properties) BRCA1 PDGF (Genes, proteins, “words” taken as single symbols appear in pathways and networks) … … … … … … …
(Almost) all symbols in scienceare embodiment ofCoarse-Graining Coarse-graining may lead to rigorous results Geoffrey West: had Galileo be equipped with our high precision measuring instruments he would not be able to discover the law of free falling body and would have to write a 42-volume Treatise on Falling Bodies.
Coarse-Grained Description of Nature↓Use of Symbols↓Symbolic Sequences↓Language, Grammar, Automaton
A Reminder: Theorem 3 in Shannon’s Famous 1948 Paper • Theorem 3: Given any ε>0 andδ>0, we can find an N0 such that the sequences of any length N>=N0 fall into two classes: • A set whose total probability is less than ε. • The remainder, all of whose members have probabilities satisfying the inequality
Intuitive Explanation of the Theorem • There are 2N sequences of length N over the alphabet (0, 1) • Roughly speaking these sequences are divided into two groups when N is very large • A big group of typical sequences • A small group of atypical sequences: 0N, 1N, (01)N, (10)N, … and more complex ones which must be characterized almost individually
Our Starting Points: • Complexity goes with specificity. • Therefore, one has to look at real data. • These data are often noisy, incomplete and of low Signal/Noise ratio. This is especially true for biological data. • Therefore, statistical methods are must, but one should go beyond statistics. • Visualization with a certain degree of coarse-graining is crucial for highlighting the “regularities” immersed in huge amount of data.
Language and Grammatical Complexity Alphabet Example 1. = {a, c, g, t} Example 2. = {A, C, D … W, Y} Example 3. = {a, … z, A, … Z, +, –, …} All possible strings made of symbols from the alphabet plus an empty string ε→ * Any subset of * is called a language over the alphabet Grammar= {Alphabet, Initial symbols, Production Rules}
Classification of Formal Languages Chomsky Hierarchy Sequential production rules Lindenmayer Systems Parallel production rules
Generative Grammar S NP VP VP V NP NP (Art) Adj* N S if S then S S either S or S N boy | girl | scientist | … V sees | believes | loves | eats | … Adj young | good | beautiful | … Art a | one | the S Sentence NP Noun Phrase VP Verb Phrase Adj Adjective Art Article Non-Terminal and Terminal Symbols
a b (i) (ii) (a, R) = b A Finite State Automaton (FA) A transfer function
FA: Finite State Automata • Deterministic FA • Non-Deterministic FA • Equivalence of DFA and NDFA: subset construction • Minimal DFA • Myhill-Nerode theorem (1958): number of nodes in minDFA
A Pushdown Automaton Pushdown list Stack First In Last Out (FILO)
A Turing MachineAlan M. Turing (1912-1954) FA + R/W tape Church-Turing Thesis (1936): Any effective (mechanical) computation can be carried out by a Turing machine
Example: {ai b ici | i>0} CSL Terminals = {a, b, c} Non-terminal = {A, B} Sequential rules: B aBAc | abc bA bb cA Ac B abc B aBAc aabcAc aabAcc→aabbcc B aBAc aaBAcAc aaBAAcc aaabcAAcc aaabAcAcc aaabbAccc
Classification of Formal Languages Chomsky Hierarchy Sequential production rules Lindenmayer Systems Parallel production rules
Development of Anabaena catenula (串珠藻项圈藻属) br ar ar albr bl al al blar br bl ar al albr blar Alphabet: S = {ar, al, br, bl} Production rules: Initial symbol (axiom) = ar Grammar: G = (S, P, ) Language: L (G) S* P =
Lindenmayer Systems Parallel production rules. Finer classification D0L –Deterministic, no interaction, i.e., context-free 0L – non-deterministic, no interaction IL – non-deterministic, with Interaction, i.e., context sensitive T0L – with Table of production rules TIL – E0L – Extended to non-terminal symbols ET0L – EIL REL of Chomsky
CSL CFL RGL FIN DOL RGL Regular CFL Context-Free CSL Context-Sensitive REL Recursively Enumerable REL
0:REL EIL 1:CSL IND ET0L IL E0L Chomsky Lindenmayer Indexed 2:CFL T0L 3:RGL 0L D0L
Example a la Lindenmayer L = {aibici | i > 0} CSL G = (S, T, ) = abc S = {a, b, c} T = {t1, t2} T1= {a aa, b bb, c cc} T2 = {a , b , c } T0L
Dyck language: A language of nested parentheses • Many but finite types of parentheses • Matched parentheses • Finite depth of nesting • Context-free language (CFL) • Tree structures, list structures, RNA secondary structures (without pseudoknots), etc.
Factorizable Languages • Symbolic dynamics leads to factorizable languages • A complete genome defines a factorizable langauge • An amino acid sequence with unique reconstruction (at certain K) defines a factorizable language • More on factorizable language in next lecture
Coarse-Grained Dynamics ↓ Symbolic Dynamics
Coarse-Graining in Dynamics • Phase space → L, R • Numerical orbit → symbolic orbit • Many to one correspondence • Possibility for classification
Basic properties • Natural order on the interval: L < C < R • Monotonicity: L and fL↑; R and fR ↓ • Parrity: L +; R – • L preserves order, R reverses order • Continuity: L←C→R • L→L(y)≡fL-1(y) • R→R(y)≡fR-1(y)
Infinitely many numerical orbits Only two symbolic orbits: L∞ and RL∞ Simple dynamicsSimple language: 2 word types only
Langauges in unimodal map: 1991 • Feigenbaum attractor corresponds to a CSL; • Are there other CSL and CFL? • Periodic and eventual periodic orbits are RGL • Are there other RGL?
Periodic orbit (RLRRC)∞ andFinite State Automaton x0=CRLRRC… x1=RLRRC… x2=LRRC… x3=RRC… x4=RC…
Transformation of subintervalsUnder (RLRRC)∞ L: a → c+d R: b → d R: c → b+c R: d → a
Transfer Matrix and Transfer Function 0 0 1 1 0 0 0 1 0 1 1 0 1 0 0 0 States: a, b, c, d Input: R, L
Are there other RGLs inUnimodal maps? Theorem (Xie, 1993) In the dynamical languages of unimodal maps the class of RGLs contains only periodic and eventually periodic sequences.
Fibonacci sequences • Fibonacci numbers: 0, 1, 1, 2, 3, 5, 8, 13,… • F0=0, F1=1, Fn=Fn-1+Fn-2 • Periodic orbits with period Fn n=0,1,2,3,… • How about n→∞? • There are many different Fibonacci sequences in the unimodal map
How to go beyond RGLs? Block concatenation: B2n=b2(n-1)b2n-1 B2n+1=b2nb2n-1 • (a) b0=L, b1=RR • (b) b0=R, b1=LR • (c) b0=L, b1=RL • (a) b0=R, b1=LL
How about (bn)∞? • Finite n: must be RGL • Infinite n? The closure at n→∞ may be non-RGL • Indeed, it is non-RGL • Is it CFL or CSL? • How to comprehend infinite “periodic” orbits? Transfer matrices come to our help
Xie Huimin and a PhD student proved 50 lemmas in 40 daysandThe last lemma says: case(a) corresponds to CSL.
It is easier to prove that cases (b), (c), and (d) all correspond to CSLs.
Conjecture: there is no CFL inUnimodal maps (Xie, 1996) An open conjecture for 11 years
Dynamical languages inUnimodal maps 1991 1999