510 likes | 716 Views
Finite State Machines for Strings over Infinite Alphabets. F. Neven, T. Schwentick and V. Vianu. ACM Transactions on Computational Logic, Vol. V No. N, 01/03. Automata Seminar - Spring 2007 Tamar Aizikowitz. Finite Machine for Infinite Alphabet?. Finite automaton:
E N D
Finite State Machines for Strings over Infinite Alphabets F. Neven, T. Schwentick and V. Vianu ACM Transactions on Computational Logic, Vol. V No. N, 01/03 Automata Seminar - Spring 2007 Tamar Aizikowitz
Finite Machine for Infinite Alphabet? • Finite automaton: • Transitions based on current state and input value • δ defined for QΣ • Infinite alphabet infinite transition function? • Solution: • Store a finite number of values • Transitions based on stored values • New values can be stored during computation
Register Automata • Suggested by Kaminski and Francez, 1994 • Finite automata + finite number of registers • Registers store values from alphabet • Register operations: • Compare register value with current value • Store current value in register • Transitions specify change of state, whether value is stored and movement of head.
Infinite Alphabets - Definitions • D : an infinite set (e.g. set of data values) • D-string :w=d1dn s.t. di D • dom(w) = {1,…,|w|} • valw(i) = diforidom(w) • ⊳,⊲D delimit input string • 2-way automata work on w = ⊳v⊲ • dom+(w) = {0,…,|w|+1} where: valw(0) = ⊳valw(|w|+1) = ⊲
Nondeterministic 2-Way k-Register Automata (2N-RA) • A=D , Q , q0 , τ0 , , F • D – infinite alphabet • Q ,q0 ,F – as usual • τ0:{1,…,k} → D{⊳,⊲} – initial register assignment • – transition function • Two types of transitions: • (i,q) → (p,d) – current value = register i value • q → (p,i,d) – store current value in register i • d {stay,right,left} – movement direction of head
Configurations • Configuration:γ= [ j , q , τ] • Initial configuration:γ0= [1,q0,τ0] • Accepting configuration:γf= [ j,qf,τ] , qfF Head Position Register Assignment Current State
Computations • [ j, q, τ][ j’, q’, τ’] iff: (1)(i,q) → (q’,d) δ, j’ = j+d , valw( j) = τ(i) and τ = τ’ or (2)q → (q’,i,d) δ, j’ = j+d and τ’= τ | τ(i) ← valw( j) • Note:Type 2 transition relevant only if no type 1 transition applies (why?) • w accepted by A iff there exists γf s.t. γ0*γf
Variants • Deterministic: at most one transition applies to each configuration. • One way: no left moves in transition function. • xC-RA: denotation for various models • Where x{1,2} and C{D,N}
Example 1: 1N-RA • L1={d1dn | i, j : ijdi=dj} contains all words where some value appears more than once • Construction idea: • Read input string from left to right • “Guess” i and store value in register • Look for stored value in remaining input
Example 1: Continued… A=D , {q0,q1, qf}, q0 , <#,#> , , {qf} “Trash” register Register for storing repeating value q0 - look for i : • Go right:q0 → (1,q0) • Guess i, store value, move to q1:q0 → (2,q1) q1 – look for j : • Go right:q1 → (1,q1) • If found value, move to qf:(2,q1) → qf qf : • Accepting configuration reached!
Example 1: Concluded • Example of run on w = 13234 … 1 1 3 3 2 2 3 3 4 4 W ACCEPTED! # 2 2 1 1 # 3 3 3 # 3 qf q1 q0 q1→(1,q1) q0→(1,q0) (2,q1)→qf q0→(2,q1)
Example 2: 2N-RA • L2={d1dn | i, j : ij→di dj} contains all words with distinct values • Construction idea: • Scan symbols from left to right. For each symbol: • Store value in register • Look for stored value in remaining input • If found reject • Else proceed to check next symbol (how?)
Example 2: Continued… • A=D,{q0,q1,q2,q3,qrej,qacc}, q0 ,<⊲,#,⊳>,,{qacc} ⊲ ⊲ di di dj di dj ⊳ ⊳ ? ⊲ # di di di di ⊳ ⊳ • (1,q0) → (q1,right) • q1 → (2,q2,right) • q2 → (1,q2,right) • (2,q2) → (qrej,stay) • (3,q2) → (q3,left) • q3 → (1,q3,left) • (2,q3) → (q1,right) • (3,q1) → (qacc,stay)
Logic • Variants of first order and monadic second order logic over D-strings. • w represented by logical structure: • Domain dom(w) with natural ordering < • Value function val:dom(w)→D instantiated by valw • Atomic Formulae: • x = y, x < y • val(x) = val(y) • val(x) = d for dD{⊳,⊲}
FO* and MSO* • The logic FO* • Atomic formulae • Boolean connectives • First order quantification over dom+(w) • The logic MSO* • FO* • Quantification over unary predicates on dom+(w)
FO* and MSO* Definability • L(φ):= {w D* | w φ} • For example… • What φ defines L1? xy( x yval(x) = val(y)) • What φ defines L2? xy( x y → val(x) val(y))
RAs vs. MSO* • Theorem 3.1:2D-RA MSO* • Proof: Consider the language L of strings u#v where the number of unique symbols appearing in u equals the number of unique symbols appearing in v. • Part 1: There exists a 2D-RA which accepts L. • Part 2:L is not MSO* definable. 2D-RA MSO*
Proof: Preliminaries • Nu / Nv = the set of unique symbols in u / v L={u#v | |Nu|= |Nv|} • lmow(d) = leftmost occurrence of d in w • Nu={a1,…,an} and Nv={b1,…,bm} s.t. for every i < j, lmou(ai) < lmou(aj) and lmov(bi) < lmov(bj). • Note:u#v L iff n = m
Proof: Part 1 (L is 2N-RA) • Question:How can we build a 2D-RA for L? • Basic concept: • Visit lmou(a1),lmov(b1),lmou(a2), … in order • If lmou(an) and lmov(bm) are reached simultaneously accept • Else reject • How can we visit the lmo-s in order? • Finding lmou(a1),lmov(b1) is easy… (how?)
Proof: Part 1 Concluded • Assume ai is stored in a register. Compute lmou(ai+1) as follows: • Move head to lmou(ai) • Go left until ⊳ • Go right until ai (leftmost occurrence) • For positions lmou(ai)+j (start j=1) test if lmou(ai+1) • Store value and proceed to move left • If value is encountered then check next position (j++) • Else, if ⊳ is reached then lmou(ai+1)= lmou(ai)+j • Similar for bi-s… Language accepted
Proof: Part 2 (L not MSO*) • Assume by contradiction that φ* is an MSO* sentence s.t. u#vφ* iff |Nu|=|Nv|. • Let C be the set of D-symbols appearing in φ*. w is admissible iff: • w is of the form u#v • w contains no symbols from C • NuNv = • Each D-symbol occurs at most once in u or v
Proof: Part 2 Continued… • Let φ be φ* by replacing: • val(x) = val(y) by x = y • val(x) = d by false if d # • For every admissible string w=d1dn#e1em: • an#am φ • d1dn#e1em φ letters don’t matter in φ • d1dn#e1em φ*w has no letters from C • n = m because all letters are different φ is MSO
Proof: Part 2 Concluded • For every n, there exists an admissible string d1dn#e1en(why?) • For every n, an#an φ • Note:φ is in MSO (no value comparisons) • Define a formula for the form an#am: • ψ:= x(val(x)=# y(val(y)=a (val(y)=# y=x))) • L’={an#an | n} is MSO definable by φψ • L’ is regular Contradiction!
2N-RA vs. FO* • Theorem 3.7:(weak version) FO*2N-RA • Proof:Define a language LD* s.t: • Part 1: No 2N-RA can accept L. • Part 2:L is FO* definable. FO* 2N-RA
Proof: Part 1 (L not 2N-RA) • Based on communication complexity methodology: • Input string divided between two parties I and II • Parties can send messages according to a pre-defined protocol • String is accepted if both parties accept • Each party has unlimited computational power • Restriction only on form of messages
Proof: Part 1 Continued… • We consider strings of the form u#v • u,v encode sets of subsets of D • L={u#v| u,v represent the same set of sets} • Claim:L cannot be accepted by 2N-RAs • Assume by contradiction that there exists a 2N-RAA s.t. L(A) = L • We simulate A by defining an appropriate protocol…
Proof: Part 1 Continued… • Define communication protocol as follows: • I is given u while II is given v • I simulates A until A tries to cross # to the right • Sends configuration information to II • II simulates A until A tries to cross # to the left • Sends configuration information to I • So on until one of the parties reaches an accepting configuration or gets stuck. • If A exists such a protocol will accept L
Proof: Part 1 Continued… • It remains to define an appropriate protocol… • Restrict u#v to at most N data values • Assume A has |Q| states and k registers • M:=|Q|Nk different messages needed • Each message needs to be sent no more than once in each direction (why?) • At most M2M different possible series of messages (dialogs) need to be considered
Proof: Part 1 Concluded • M2M is exponential in N • Number of sets of sets of N values is 22N • For large N, there exist u,v s.t: • u#u and v#v are accepted by the same dialogue • u,v represent different sets of sets • u#v is also accepted • No such protocol can accept L • No 2N-RA can accept L
Proof: Part 2 (L is FO*) • We show that L is FO* definable… • First we define an encoding for u,v: • Assume $ not in D • u,v of the form $d11dn1$d12dn2$$d1mdnm$ • Each d1jdnj represent a subset of D-values • Goal: Define a formula verifying that every subset in u appears in v and vice versa.
Proof: Part 2 Continued… • We start with some smaller formulae… • w is of the form u#v form:= x(val(x) = # y(val(y) = # → y=x)) • x is in the interval [y,z] x[y,z]:= y < xx < z • The interval [y,z] represents a subset subs(y,z):= val(y)=$ val(z)=$ y < z x(x[y,z] → val(x)# val(x) $)
Proof: Part 2 Continued… • Some more… • The subset [y,z] is a subset of [y’,z’] [y,z][y’,z’]:=x(x[y,z] → x’(x’[y’z’] val(x)=val(x’))) • The subset [y,z] equals the subset [y’,z’] [y,z]=[y’,z’]:=[y,z][y’,z’] [y’,z’][y,z] • The subset [y,z] is in u [y,z]u:= sub(y,z) x(val(x)=# →z < x) • The subset [y,z] is in v [y,z]v:= sub(y,z) x(val(x)=# →x < y)
Proof: Part 2 Concluded • Two last formulae… • Every subset in u appears in v usubv:= yz([y,z]u→y’z’(([y’,z’]v [y,z]=[y’,z’])) • vsubu defined similarly • And now to put it all together… φ:= form usubvvsubu • It follows that wφ iff wLL is FO* definable.
Decision Problems • Kaminski and Francez showed that emptiness for 1N-RAs is decidable • And what of universality? • We will show that universality for 1N-RA is undecidable by reduction from a known undecidable problem, PCP.
Post Correspondence Problem • Introduced by Emil Post in 1946 • Input: A sequence of pairs (x1,y1),…,(xn,yn) s.t. xi,yi{a,b}* for i=1,…,n • Solution: A set of indices α1,…, αm {1,…,n} s. t. xα1xαm = yα1yαm • Output: Does the given input instance have a solution.
Input: Solution: PCP Example
PCP Undecidability • PCP is known to be undecidable. • Proof sketch: Reduction from Lu: • Given a Turing Machine M and a word w • Define PCP instance P based on M and w s.t. P has a solution iff M accepts w • A solution for P encodes a run of M on w • x-series is always ‘one step ahead’ of y-series • y series can ‘catch up’ only if computation in x series reaches an accepting state
PCP Undecidability Continued • Start computation: • Encode transitions: • Copy symbols: • qacc ‘eats’ symbols: • Add instance pairs of the following forms:
Undecidability of Universality • Theorem 5.1: It is undecidable whether a given 1N-RA is universal. • Proof:For a given PCP instance P, construct a 1N-RAA s.t. A accepts an input string iff it does not represent a solution for P. P has no solution iff A is universal Decidability of universality leads to decidability of PCP Universality of 1N-RA is undecidable
PCP Encoding • Assume w.l.g. that Sym={1,…,n,a,b,#,$}D • Candidate: a string u#v s.t: • u encodes xα1 , … , xαm • v encodes yβ1 , … , yβl • Candidate is a solution if: • l = m • αi = βi • xα1xαm = yα1yαm Matching pairs
PCP Encoding Continued • xαj encoding: $ γαjδ1a1 δk ak • $acts as separator • γ represents j by a unique value • αj1,…,m • δi encode positions in the word • γ and δvalues appear only once in u / v • xαj = a1ak • yβj encoded similarly
PCP Encoding Continued • u#v is syntactically correct if: • γ-projection of u=γ-projection of v • δ-projection of u =δ-projection of v • u#v represents a solution if: • u#v is syntactically correct • For each γ, the number to the right of γ is the same in u and in v • For each δ, the symbol to the right of δ is the same in u and in v
Construction of A • Assume the values of Sym are stored in the initial register assignment • A works as follows: • “Guesses” why w is not a valid solution • Checks whether w meets the chosen criteria • If yes, accepts • Else rejects • w has an accepting computation w meets some criteria for being “wrong” w is not a solution for the PCP instance
When is w “wrong” • w is of the wrong form: • wu#v • u or v($γαδ…)* • xia1ak or yja1ak in u or v • γ-projections are wrong: • First / last γ in u first / last γ in v • Two γ’s are the same in u / v • γ1 and γ2 are successors in u but not in v
When is w “wrong” Concluded • δ projections are wrong: • Similar to γ-projections • w does not represent a solution: • The α-value for some γ in u is different than the corresponding β-value in v • The a- / b-value for some δ in u is different than the corresponding a- / b-value in v
Equivalence and Inclusion • Corollary 5.2: Equivalence of 1N-RAs is undecidable. • Proof: • Assume equivalence was decidable • Build an Automaton AD* that accepts every possible input word • Universality is decidable by checking equivalence to AD* Contradiction! • Corollary: Inclusion is also undecidable.