Finite State Machines for Strings over Infinite Alphabets

Finite State Machines for Strings over Infinite Alphabets F. Neven, T. Schwentick and V. Vianu ACM Transactions on Computational Logic, Vol. V No. N, 01/03 Automata Seminar - Spring 2007 Tamar Aizikowitz

Finite Machine for Infinite Alphabet? • Finite automaton: • Transitions based on current state and input value •  δ defined for QΣ • Infinite alphabet  infinite transition function? • Solution: • Store a finite number of values • Transitions based on stored values • New values can be stored during computation

Register Automata • Suggested by Kaminski and Francez, 1994 • Finite automata + finite number of registers • Registers store values from alphabet • Register operations: • Compare register value with current value • Store current value in register • Transitions specify change of state, whether value is stored and movement of head.

Infinite Alphabets - Definitions • D : an infinite set (e.g. set of data values) • D-string :w=d1dn s.t. di D • dom(w) = {1,…,|w|} • valw(i) = diforidom(w) • ⊳,⊲D delimit input string • 2-way automata work on w = ⊳v⊲ • dom+(w) = {0,…,|w|+1} where: valw(0) = ⊳valw(|w|+1) = ⊲

Nondeterministic 2-Way k-Register Automata (2N-RA) • A=D , Q , q0 , τ0 ,  , F  • D – infinite alphabet • Q ,q0 ,F – as usual • τ0:{1,…,k} → D{⊳,⊲} – initial register assignment •  – transition function • Two types of transitions: • (i,q) → (p,d) – current value = register i value • q → (p,i,d) – store current value in register i • d {stay,right,left} – movement direction of head

Configurations • Configuration:γ= [ j , q , τ] • Initial configuration:γ0= [1,q0,τ0] • Accepting configuration:γf= [ j,qf,τ] , qfF Head Position Register Assignment Current State

Computations • [ j, q, τ][ j’, q’, τ’] iff: (1)(i,q) → (q’,d) δ, j’ = j+d , valw( j) = τ(i) and τ = τ’ or (2)q → (q’,i,d) δ, j’ = j+d and τ’= τ | τ(i) ← valw( j) • Note:Type 2 transition relevant only if no type 1 transition applies (why?) • w accepted by A iff there exists γf s.t. γ0*γf

Variants • Deterministic: at most one transition applies to each configuration. • One way: no left moves in transition function. • xC-RA: denotation for various models • Where x{1,2} and C{D,N}

Example 1: 1N-RA • L1={d1dn | i, j : ijdi=dj}  contains all words where some value appears more than once • Construction idea: • Read input string from left to right • “Guess” i and store value in register • Look for stored value in remaining input

Example 1: Continued… A=D , {q0,q1, qf}, q0 , <#,#> ,  , {qf} “Trash” register Register for storing repeating value q0 - look for i : • Go right:q0 → (1,q0) • Guess i, store value, move to q1:q0 → (2,q1) q1 – look for j : • Go right:q1 → (1,q1) • If found value, move to qf:(2,q1) → qf qf : • Accepting configuration reached!

Example 1: Concluded • Example of run on w = 13234 … 1 1 3 3 2 2 3 3 4 4 W ACCEPTED! # 2 2 1 1 # 3 3 3 # 3 qf q1 q0 q1→(1,q1) q0→(1,q0) (2,q1)→qf q0→(2,q1)

Example 2: 2N-RA • L2={d1dn | i, j : ij→di dj}  contains all words with distinct values • Construction idea: • Scan symbols from left to right. For each symbol: • Store value in register • Look for stored value in remaining input • If found  reject • Else proceed to check next symbol (how?)

Example 2: Continued… • A=D,{q0,q1,q2,q3,qrej,qacc}, q0 ,<⊲,#,⊳>,,{qacc} ⊲ ⊲  di di  dj di dj  ⊳ ⊳ ? ⊲ # di di di di ⊳ ⊳ • (1,q0) → (q1,right) • q1 → (2,q2,right) • q2 → (1,q2,right) • (2,q2) → (qrej,stay) • (3,q2) → (q3,left) • q3 → (1,q3,left) • (2,q3) → (q1,right) • (3,q1) → (qacc,stay)

Logic • Variants of first order and monadic second order logic over D-strings. • w represented by logical structure: • Domain dom(w) with natural ordering < • Value function val:dom(w)→D instantiated by valw • Atomic Formulae: • x = y, x < y • val(x) = val(y) • val(x) = d for dD{⊳,⊲}

FO* and MSO* • The logic FO* • Atomic formulae • Boolean connectives • First order quantification over dom+(w) • The logic MSO* • FO* • Quantification over unary predicates on dom+(w)

FO* and MSO* Definability • L(φ):= {w  D* | w φ} • For example… • What φ defines L1? xy( x yval(x) = val(y)) • What φ defines L2? xy( x y → val(x) val(y))

RAs vs. MSO* • Theorem 3.1:2D-RA  MSO* • Proof: Consider the language L of strings u#v where the number of unique symbols appearing in u equals the number of unique symbols appearing in v. • Part 1: There exists a 2D-RA which accepts L. • Part 2:L is not MSO* definable. 2D-RA MSO*

Proof: Preliminaries • Nu / Nv = the set of unique symbols in u / v L={u#v | |Nu|= |Nv|} • lmow(d) = leftmost occurrence of d in w • Nu={a1,…,an} and Nv={b1,…,bm} s.t. for every i < j, lmou(ai) < lmou(aj) and lmov(bi) < lmov(bj). • Note:u#v  L iff n = m

Proof: Part 1 (L is 2N-RA) • Question:How can we build a 2D-RA for L? • Basic concept: • Visit lmou(a1),lmov(b1),lmou(a2), … in order • If lmou(an) and lmov(bm) are reached simultaneously  accept • Else  reject • How can we visit the lmo-s in order? • Finding lmou(a1),lmov(b1) is easy… (how?)

Proof: Part 1 Concluded • Assume ai is stored in a register. Compute lmou(ai+1) as follows: • Move head to lmou(ai) • Go left until ⊳ • Go right until ai (leftmost occurrence) • For positions lmou(ai)+j (start j=1) test if lmou(ai+1) • Store value and proceed to move left • If value is encountered then check next position (j++) • Else, if ⊳ is reached then lmou(ai+1)= lmou(ai)+j • Similar for bi-s…  Language accepted

Proof: Part 2 (L not MSO*) • Assume by contradiction that φ* is an MSO* sentence s.t. u#vφ* iff |Nu|=|Nv|. • Let C be the set of D-symbols appearing in φ*. w is admissible iff: • w is of the form u#v • w contains no symbols from C • NuNv = • Each D-symbol occurs at most once in u or v

Proof: Part 2 Continued… • Let φ be φ* by replacing: • val(x) = val(y) by x = y • val(x) = d by false if d  # • For every admissible string w=d1dn#e1em: • an#am  φ •  d1dn#e1em φ letters don’t matter in φ •  d1dn#e1em φ*w has no letters from C •  n = m because all letters are different φ is MSO

Proof: Part 2 Concluded • For every n, there exists an admissible string d1dn#e1en(why?) •  For every n, an#an  φ • Note:φ is in MSO (no value comparisons) • Define a formula for the form an#am: • ψ:= x(val(x)=#  y(val(y)=a (val(y)=# y=x))) •  L’={an#an | n} is MSO definable by φψ • L’ is regular  Contradiction!

2N-RA vs. FO* • Theorem 3.7:(weak version) FO*2N-RA • Proof:Define a language LD* s.t: • Part 1: No 2N-RA can accept L. • Part 2:L is FO* definable. FO* 2N-RA

Proof: Part 1 (L not 2N-RA) • Based on communication complexity methodology: • Input string divided between two parties I and II • Parties can send messages according to a pre-defined protocol • String is accepted if both parties accept • Each party has unlimited computational power • Restriction only on form of messages

Proof: Part 1 Continued… • We consider strings of the form u#v • u,v encode sets of subsets of D • L={u#v| u,v represent the same set of sets} • Claim:L cannot be accepted by 2N-RAs • Assume by contradiction that there exists a 2N-RAA s.t. L(A) = L • We simulate A by defining an appropriate protocol…

Proof: Part 1 Continued… • Define communication protocol as follows: • I is given u while II is given v • I simulates A until A tries to cross # to the right • Sends configuration information to II • II simulates A until A tries to cross # to the left • Sends configuration information to I • So on until one of the parties reaches an accepting configuration or gets stuck. • If A exists such a protocol will accept L

Proof: Part 1 Continued… • It remains to define an appropriate protocol… • Restrict u#v to at most N data values • Assume A has |Q| states and k registers • M:=|Q|Nk different messages needed • Each message needs to be sent no more than once in each direction (why?) • At most M2M different possible series of messages (dialogs) need to be considered

Proof: Part 1 Concluded • M2M is exponential in N • Number of sets of sets of N values is 22N •  For large N, there exist u,v s.t: • u#u and v#v are accepted by the same dialogue • u,v represent different sets of sets • u#v is also accepted •  No such protocol can accept L •  No 2N-RA can accept L

Proof: Part 2 (L is FO*) • We show that L is FO* definable… • First we define an encoding for u,v: • Assume $ not in D • u,v of the form $d11dn1$d12dn2$$d1mdnm$ • Each d1jdnj represent a subset of D-values • Goal: Define a formula verifying that every subset in u appears in v and vice versa.

Proof: Part 2 Continued… • We start with some smaller formulae… • w is of the form u#v form:= x(val(x) = # y(val(y) = # → y=x)) • x is in the interval [y,z] x[y,z]:= y < xx < z • The interval [y,z] represents a subset subs(y,z):= val(y)=$ val(z)=$ y < z x(x[y,z] → val(x)# val(x)  $)

Proof: Part 2 Continued… • Some more… • The subset [y,z] is a subset of [y’,z’] [y,z][y’,z’]:=x(x[y,z] → x’(x’[y’z’]  val(x)=val(x’))) • The subset [y,z] equals the subset [y’,z’] [y,z]=[y’,z’]:=[y,z][y’,z’]  [y’,z’][y,z] • The subset [y,z] is in u [y,z]u:= sub(y,z) x(val(x)=# →z < x) • The subset [y,z] is in v [y,z]v:= sub(y,z) x(val(x)=# →x < y)

Proof: Part 2 Concluded • Two last formulae… • Every subset in u appears in v usubv:= yz([y,z]u→y’z’(([y’,z’]v [y,z]=[y’,z’])) • vsubu defined similarly • And now to put it all together… φ:= form usubvvsubu • It follows that wφ iff wLL is FO* definable.

Decision Problems • Kaminski and Francez showed that emptiness for 1N-RAs is decidable • And what of universality? • We will show that universality for 1N-RA is undecidable by reduction from a known undecidable problem, PCP.

Post Correspondence Problem • Introduced by Emil Post in 1946 • Input: A sequence of pairs (x1,y1),…,(xn,yn) s.t. xi,yi{a,b}* for i=1,…,n • Solution: A set of indices α1,…, αm {1,…,n} s. t. xα1xαm = yα1yαm • Output: Does the given input instance have a solution.

Input: Solution: PCP Example

PCP Undecidability • PCP is known to be undecidable. • Proof sketch: Reduction from Lu: • Given a Turing Machine M and a word w • Define PCP instance P based on M and w s.t. P has a solution iff M accepts w • A solution for P encodes a run of M on w • x-series is always ‘one step ahead’ of y-series • y series can ‘catch up’ only if computation in x series reaches an accepting state

PCP Undecidability Continued • Start computation: • Encode transitions: • Copy symbols: • qacc ‘eats’ symbols: • Add instance pairs of the following forms:

Undecidability of Universality • Theorem 5.1: It is undecidable whether a given 1N-RA is universal. • Proof:For a given PCP instance P, construct a 1N-RAA s.t. A accepts an input string iff it does not represent a solution for P.  P has no solution iff A is universal  Decidability of universality leads to decidability of PCP  Universality of 1N-RA is undecidable

PCP Encoding • Assume w.l.g. that Sym={1,…,n,a,b,#,$}D • Candidate: a string u#v s.t: • u encodes xα1 , … , xαm • v encodes yβ1 , … , yβl • Candidate is a solution if: • l = m • αi = βi • xα1xαm = yα1yαm Matching pairs

PCP Encoding Continued • xαj encoding: $ γαjδ1a1 δk ak • $acts as separator • γ represents j by a unique value • αj1,…,m • δi encode positions in the word • γ and δvalues appear only once in u / v • xαj = a1ak • yβj encoded similarly

PCP Encoding Example

PCP Encoding Continued • u#v is syntactically correct if: • γ-projection of u=γ-projection of v • δ-projection of u =δ-projection of v • u#v represents a solution if: • u#v is syntactically correct • For each γ, the number to the right of γ is the same in u and in v • For each δ, the symbol to the right of δ is the same in u and in v

Construction of A • Assume the values of Sym are stored in the initial register assignment • A works as follows: • “Guesses” why w is not a valid solution • Checks whether w meets the chosen criteria • If yes, accepts • Else rejects • w has an accepting computation w meets some criteria for being “wrong” w is not a solution for the PCP instance

When is w “wrong” • w is of the wrong form: • wu#v • u or v($γαδ…)* • xia1ak or yja1ak in u or v • γ-projections are wrong: • First / last γ in u first / last γ in v • Two γ’s are the same in u / v • γ1 and γ2 are successors in u but not in v

When is w “wrong” Concluded • δ projections are wrong: • Similar to γ-projections • w does not represent a solution: • The α-value for some γ in u is different than the corresponding β-value in v • The a- / b-value for some δ in u is different than the corresponding a- / b-value in v

Equivalence and Inclusion • Corollary 5.2: Equivalence of 1N-RAs is undecidable. • Proof: • Assume equivalence was decidable • Build an Automaton AD* that accepts every possible input word •  Universality is decidable by checking equivalence to AD* Contradiction! • Corollary: Inclusion is also undecidable.

Finite State Machines for Strings over Infinite Alphabets

Finite State Machines for Strings over Infinite Alphabets

Presentation Transcript

Finite State Machines

Finite State Machines

Finite State Machines

Finite State Machines

Finite State Machines

Finite State Machines

Finite State Machines

Finite state machines

Finite State Machines

Finite State Machines

Finite State Machines

Finite state machines

Finite State Machines

Finite State Machines

Finite State Machines

Finite State Machines

Finite-State Machines

Finite State Machines

Finite State Machines

Finite state machines

Finite State Machines