330 likes | 399 Views
Fall 2011. The Chinese University of Hong Kong. CSCI 3130: Formal languages and automata theory. Undecidable problems for CFGs. Andrej Bogdanov http://www.cse.cuhk.edu.hk/~andrejb/csc3130. Decidable vs. undecidable. decidable. undecidable. “ DFA M accepts w ”. “ TM M accepts w ”.
E N D
Fall 2011 The Chinese University of Hong Kong CSCI 3130: Formal languages and automata theory Undecidable problems for CFGs Andrej Bogdanov http://www.cse.cuhk.edu.hk/~andrejb/csc3130
Decidable vs. undecidable decidable undecidable “DFAM accepts w” “TMM accepts w” “CFGG generates w” “TM M halts on w” “TM Maccepts some input” “DFAs M and M’ accept same inputs” “TM Mand M’ accept same inputs” “CFG G generates all inputs” ? “CFG G is ambiguous” more?
Representing computations L1 = {w%w: w ∈{a, b}*} a/aR b/bR x/xR q1 q5 q2 q1 qa q0 %/%R a/aL abbaa%abbaa q1 q2 a/aL b/bL a/xL b/bL x/xL x/xR a/xR xbbaa%abbaa %/%R %/%R ☐/☐R q0 q7 q5 q6 qa ... b/xR xbbaa%abbaa b/xL xbbaa%abbaa q3 q4 %/%R xbbaa%xbbaa a/aR x/xR ... b/bR xxxxx%xxxxx x/xR
Configurations • A configuration consists of the current state, the head position, and tape contents configuration qacc q1 … … a a b b a b ☐ ☐ abq1a a/bR q1 qacc abbqacc
Computation histories a/aR b/bR x/xR q0a0b0%0a0b0 x0q1b0%0a0b0 x0b6q1%0a0b0 x0b6%0q2a0b0 x0b6q5%0x1b0 x0q6b0%0x0b0 %/%R a/aL q1 q2 a/aL b/bL a/xL b/bL x/xL x/xR a/xR %/%R %/%R ☐/☐R q0 q7 q5 q6 qa b/xR b/xL ... q3 q4 x0x0%0x0x0q7 x0x0%0x0x0☐aqa %/%R a/aR x/xR b/bR computation history x/xR
Computation histories as strings If M halts on w, the computation history of (M, w) is the sequence of configurations C1, ..., Clthat M goes through on input w. #q0ab%ab#xq1b%ab# . . . #xx%xx☐qa# q0a0b0%0a0b0 x0q1b0%0a0b C1 C2 Cl ... The computation history can be written as a string histover alphabet ∪Q∪{#} x0x0%0x0x0q7 x0x0%0x0x0qa M accepts w qacc occurs in hist accepting history: qrej occurs in hist rejecting history: M rejects w
Undecidable problems for CFGs • We will argue that ALLCFG = {〈G〉: Gis a CFG that generates all strings} The language ALLCFGis undecidable. If ALLCFGcan be decided, so can ATM.
Undecidable problems for CFGs accept ifG generates all strings A 〈G〉 reject if not accept ifM rej/loops w construct G 〈M, w〉 reject ifM accepts w A 〈G〉 G generates all strings if M rejects or loops on w Gfails to generate some string if M accepts w
Undecidable problems for CFGs Gfails to generate some string 〈M, w〉 construct G 〈G〉 M accepts w The alphabet of G will be ∪Q∪{#} G will generate all strings except the computation history of (M, w), if it is accepting First we construct a PDA P, then convert it to CFG G
Undecidability via computation histories candidate computation history hist of (M, w) accept everything P except accepting hist reject #q0ab%ab#xq1b%ab# . . . #xx%xx☐qa# // try to spot a mistake in hist P: On input hist, If hist is not of the form #w1#w2#...#wk#, accept. e If w1 ≠ q0w or wk does not contain qa, accept. e q0 e If two consecutive blocks wi#wi+1 do not followfrom the transitions of M, accept. Otherwise, hist is an accepting history, so reject.
Computation is local a/aR b/bR x/xR #0q0a0b0%0a0b0 #0x0q1b0%0a0b0 #0x0b6q1%0a0b0 #0x0b6%0q2a0b0 #0x0b6q5%0x1b0 #0x0q6b0%0x0b0 %/%R a/aL q1 q2 a/aL b/bL a/xL b/bL x/xL x/xR a/xR %/%R %/%R ☐/☐R q0 q7 q5 q6 qa b/xR b/xL ... q3 q4 #0x0x0%0x0x0q7 #0x0x0%0x0x0☐aqa %/%R a/aR x/xR b/bR x/xR Changes between configurations always occur around the head
Legal and illegal transition windows legal windows illegal windows … 6a3b0x0 … … 0a6b0x0…0 … 6q2a0b0 … … 0a6b0q2 …0 … 6a3q2a0 … … 0q5a6x0…0 … 6q2q2a0 … … 0q2q2x3…0 q2 a/xL … 6a3b0a0 … … 0a6b0q5 …0 … 6a3q2a0 … … 0q5a6b0…0 q5 … 6a3a0☐0 … … 0x6a0☐0…0 … 6a3q2a0 … … 0a6q5x0…0
Implementing P If two consecutive blocks wi#wi+1 do not followfrom the transitions of M, accept: #0x0b6%0q2a0b #0x0b6q5%0x1b For every position of wi: Remember offset from # in wi on stack Remember first row of window in state offset After reaching the next #: Pop offset from # from stack as you consume input Remember second row of window in state If window is illegal, accept; Otherwise reject.
The computation history method • G accepts all strings except accepting computation histories of (M, w) • We first construct a PDA P, then convert it to CFG G ALLCFG = {〈G〉: Gis a CFG that generates all strings} If ALLCFGcan be decided, so can ATM. 〈M, w〉 construct G 〈G〉
The Post Correspondence Problem • Input: A set of tiles like this • Given an infinite supply of such tiles, can you match top and bottom? bab cc a ab baa a a baba bab e c ab a ab baa a bab e c ab c ab bab cc a baba
Undecidability of PCP PCP = {〈T〉: Tis a collection of tiles that contains a top-bottom match} The language PCPis undecidable.
Ambiguity of CFGs • We will argue that AMB = {G: Gis an ambiguous CFG} The language AMBis undecidable. If AMBcan be decided, so can PCP.
Ambiguity of CFGs • Step 1: Number the tiles T G (collection of tiles) (CFG) If T can be matched, then G isambiguous If T cannot be matched, then G is unambiguous 2 3 1 bab cc a ab c ab
Ambiguity of CFGs T G (CFG) (collection of tiles) a, b, c, 1, 2, 3 Terminals: Variables: S, T, B Productions: T → babT1 T → cT2 T → aT3 B → ccB1 B → abB2 B → abB3 2 3 1 a ab bab cc c ab S → T | B
Ambiguity of CFGs T G (CFG) (collection of tiles) a, b, c, 1, 2, 3 Terminals: Variables: S, T, B 1 2 3 Productions: bab cc c ab a ab T → cT2 T → aT3 S → T | B T → babT1 T → c2 T → a3 T → bab1 B → abB2 B → abB3 B → ccB1 B → ab2 B → ab3 B → cc1
Ambiguity of CFGs • Each sequence of tiles gives two derivations • If the tiles match, these two derive the same string 2 2 1 bab cc c ab c ab S ⇒ T ⇒ babT1 ⇒ babcT21⇒ babcc221 S ⇒ B ⇒ ccB1 ⇒ ccabB21⇒ ccabab221
Ambiguity of CFGs T G • If G is ambiguous then ambiguity must look like this (collection of tiles) (CFG) ✓ If T can be matched, then G isambiguous ✓ If T cannot be matched, then G is unambiguous Then n1...ni = m1…mj S S T B So there is a match a1 n1 b1 m1 T B n1 n2 ni … … a2 n2 b2 m2 a2 b2 ai bi a1 b1 … T B ai ni bj mj
Undecidability of PCP (optional)
Undecidability of PCP • We show that PCP = {〈T〉: Tis a collection of tiles that contains a top-bottom match} The language PCPis undecidable. If PCPcan be decided, so can ATM.
Undecidability of PCP • Idea: Matches represent accepting histories 〈M, w〉 T (collection of tiles) M accepts w T contains a match #q0ab%ab#xq1b%ab#...#xx%xx☐qa# #q0ab%ab#xq1b%ab#...#xx%xx☐qa# e#q0ab%ab #q0a #xq1 b b a a % % a a b b # # xq1% x%q2 …
An assumption • We will assume that one of the PCP tiles is marked as a starting tile • Later we’ll see how to “simulate” the starting tile by an ordinary tile s bab cc a ab baa a a baba c ab
Undecidability of PCP • On input 〈M, w〉 we construct these tiles for PCP 〈M, w〉 T (collection of tiles) M accepts w T contains a match s ☐#qix1 ☐#x2x3 ☐# ☐# xx xqa qa qax qa qa## # e #q0w x1qix2 x3x4x5 for each valid window with state qi in top middle for all x inG∪{#}
Undecidability of PCP tile type purpose s represents initial configuration e #q0w represent valid transitionsbetween configurations x1qix2 x3x4x5 xx add blank spacesbefore # if necessary ☐#qix1 ☐#x2x3 ☐# ☐# complete matchif computation accepts xqa qa qax qa qa## #
Undecidability of PCP #q0a%ab#xq1%ab#...#xx%xxq7☐#xx%xx☐qa# accepting computation history # #q0a b % ab# xq1b%ab#... #xx%x xq7☐ #q0ab%ab #xq1 b % ab# ...#xx%xxq7☐ #xx%x x☐qa # s ☐#qix1 ☐#x2x3 ☐# ☐# xx x1qix2 x3x4x5 e #q0w
Undecidability of PCP • Once the accepting state symbol occurs, the last two tiles can “eat up” the rest of the symbols #xx%xx ☐qa #xx%x xqa #xx%xqa#... # qa## #xx%xx☐qa #xx%xx qa #xx%x qa #...#qa # # xx xqa qa qax qa qa## #
Undecidability of PCP • If M rejects on input w, then qrej appears on bottom at some point, but it cannot be matched on top • If M loops on w, then matching keeps going forever s ☐#qia2 ☐#b1b2 ☐# ☐# xx xqa qa qax qa qa## # e #q0w a1qia3 b1b2b3 for each valid window of this form for all x inG∪{#}
Getting rid of the starting tile • We assumed that one tile marked as starting tile • We can remove assumption by changing tiles a bit s a aba ba bb b c cca a *a* *a*b*a b*a* *b*b b* *c c*c*a* *a * “starting tile”begins with * “final tile” “middle tiles”
Getting rid of the starting tile s a aba ba bb b c b c cca a *a* *a*b*a b*a* *b*b b* *c b* *c c*c*a* *a * can only useas starting tile can only useto complete match