650 likes | 878 Views
Automata, Grammars and Languages. Discourse 04 Context-Free Grammars and Pushdown Automata. Backus-Naur Form Grammars (CFGs). Algol 60, Algol 68—first “block-structured” languages Ex: CF Grammar. <program> ::= <block> <statement> ::= s | <block> <block> ::= begin <list> end
E N D
Automata, Grammars and Languages Discourse 04 Context-Free Grammars and Pushdown Automata C SC 473 Automata, Grammars & Languages
Backus-Naur Form Grammars (CFGs) • Algol 60, Algol 68—first “block-structured” languages • Ex: • CF Grammar <program> ::= <block> <statement> ::= s | <block> <block> ::= begin <list> end <list> ::= <statement> ; <list> | <statement> begins ; begin s;s;s end ;s end Start variable “S” terminals Nonterminals =variables rules=productions C SC 473 Automata, Grammars & Languages
Grammars are “Generators” • “yields” or “derives in one step” • Apply one production to one variable in the string • nondeterministic C SC 473 Automata, Grammars & Languages
A Particular Derivation • One possible derivation. Variable being rewritten at each stage is underscored • two choices at each derivation step: • Which variable (nonterminal) to be rewritten? • Which rule with that variable as LHS to be applied? • All possible terminal strings obtainable in this way make up L(G) C SC 473 Automata, Grammars & Languages
Why CFGs? • Most natural or artificial (e.g. programming) languages are not regular • We know that the latter language is not regular, so … • Ex: C programs C SC 473 Automata, Grammars & Languages
Derivation (Parse) Tree yield/frontier/terminal string = C SC 473 Automata, Grammars & Languages
Derivation (Parse) Tree C SC 473 Automata, Grammars & Languages
Derivation (Parse) Tree (cont’d) 1 2 3 8 4 5 6 7 9 10 12 11 13 14 15 C SC 473 Automata, Grammars & Languages
Context-Free Grammar • Defn 2.2: A context-free grammar G is a 4-tuple • is a finite set, the variables (nonterminals) • is a finite set disjoint from V, the terminals • is a finite set of rules, of the form • is the start variable • Ex: strings with balanced parentheses. Formally: • Ex: informally • Variables = upper case • Terminals = lower case technically, an ordered pair (A, w) C SC 473 Automata, Grammars & Languages
Yields & Derives Relations • Defn. The relation yields (derivesin 1 step) is defined as follows: if is a rule in R, then • Defn: derives in k steps: • Defn: derives: • In other words: • Defn: A derivation (of n steps) from is any sequence of strings satisfying: C SC 473 Automata, Grammars & Languages
Language Generated • Defn. The language generated by G is the set of all terminal strings derived from S: • A partial derivation is one that starts with S and ends in a non-terminal string containing variables in V • Ex: • Partial: • Terminal or terminated: C SC 473 Automata, Grammars & Languages
Derivations and Parse Trees • Ex: Notice: completed (terminated) parse tree is the same for both derivations—though the sequence “grows” differently C SC 473 Automata, Grammars & Languages
Derivation Parse Tree • Proposition 1: For every (terminated or partial) derivation there is an unique parse tree T with frontier constructible from D. • Proposition 2: For every parse tree T in G and any traversal order that is top-down (visits parents before children), there is an unique derivation for the frontier of T from S, and it isconstructible from T. • Corollary 3: For every parse tree T in G there is an unique leftmost derivation constructible from T. Pf: Pre-order traverse T, expanding variables as their nodes are visited. C SC 473 Automata, Grammars & Languages
Ex: Leftmost Derivation C SC 473 Automata, Grammars & Languages
Ex: Leftmost Derivation 1 Preorder traversal 2 13 3 4 14 5 15 17 6 11 16 18 7 12 17 10 8 9 C SC 473 Automata, Grammars & Languages
2 distinct parse trees for same terminal string 2 distinct leftmost derivations for same terminal string Leftmost derivation parse tree 1-to-1 A CFG is unambiguous wL(G)w has an unique parse tree (unique leftmost derivation) Syntactic Ambiguity terminal string = C SC 473 Automata, Grammars & Languages
Ex: Ambiguous Grammar--English <Sent><NP><VP> <NP><N>|<Adj><N> <VP><V><Obj>|<V><AdvP> <AdvP><Adv>|<AdvP> <AdvP><Prep><Obj> <Obj><Adj><N> <N>fruit | flies | … … C SC 473 Automata, Grammars & Languages
<Sent> <NP> <VP> <Adj> <N> <V> <Obj> fruit flies like <Adj> <N> a banana “Fruit flies like a banana” <Sent><NP><VP> <NP><N>|<Adj><N> <VP><V><Obj>|<V><AdvP> <AdvP><Adv>|<AdvP> <AdvP><Prep><Obj> <Obj><Adj><N> <N>fruit | flies | … … <Sent> <NP> <VP> <N> <V> <AdvP> <Prep> flies fruit <Obj> like <Adj> <N> a banana C SC 473 Automata, Grammars & Languages
DFA M = conversion algorithm Reg. Expr E Right-linear Grammar G NFA N Right Linear Grammars & Regular Languages • Defn: A CFG is right-linear iff each rule is of one the forms AwB or Aw where A, B are variables and w * • Chomsky (1958) called these “Type 3” • Thm: L is a regular language iff L=L(G) for some right-linear grammar G. There are algorithms for converting from finite automata to right-linear grammars, and conversely. C SC 473 Automata, Grammars & Languages
Right-Linear & Regular (cont’d) • Pf: () Assume L=L(M) where is a DFA. Construct with R having rule if in and rule if is a final state. Claim: Pf: easy induction on n The proof direction follows since • Pf: () Assume L=L(G) where is right-linear. Construct NFA where is a new symbol. has the transition if in R and transition if C SC 473 Automata, Grammars & Languages
Right-Linear & Regular (cont’d) • Claim: Pf: easy induction on n The proof direction follows since C SC 473 Automata, Grammars & Languages
Ex: Right-Linear FA • Ex: • Ex: f “useless” rules—can be eliminated C SC 473 Automata, Grammars & Languages
Pushdown Automaton • Defn 2.12: A pushdown automaton M is a 6-tuple • is a finite set, the states • is a finite, the input alphabet • is a finite set, the stack alphabet • is the transition function • is the start state • is the set of accept (final) states C SC 473 Automata, Grammars & Languages
PushDown Automaton to come seen input * current input symbol Finite Control stack * Top Bottom (no end- marker supplied) configuration: (state, rest of input, Stack ) C SC 473 Automata, Grammars & Languages
PDA (cont’d) Finite Control start state Initially: configuration: C SC 473 Automata, Grammars & Languages
PDA (cont’d) Transition: Finite Control configurations: Finite Control C SC 473 Automata, Grammars & Languages
PDA (cont’d) • Can have • -move: consume no input • Pop-move: erase top stack symbol • Push-only move: ignore stack • Any combination is possible C SC 473 Automata, Grammars & Languages
PDA (cont’d) Finally: Finite Control configuration: • Defn: recognizes iff for some , • and some • Defn: C SC 473 Automata, Grammars & Languages
Example: PDA • Recognizer for accepts does not accept (blocked) C SC 473 Automata, Grammars & Languages
Example: PDA w/ nondeterminism • Last example (palindromes with center-mark) was a deterministic PDA (DPDA) • NPDA for does not accept (blocked) Nondeterministic “guess” C SC 473 Automata, Grammars & Languages
Example: PDA • Recall well-nested parentheses (()) (()()) DPDA! C SC 473 Automata, Grammars & Languages
Example: PDA • “guesses” which pattern • “checks” whether guess is correct • accepts iff correct guess that checks C SC 473 Automata, Grammars & Languages
CFG PDA • Thm 2.20: A language is CF a PDA recognizes it. • There are algorithms for converting a grammar to an equivalent automaton, and conversely. • Lemma 2.21: There is an algorithm for constructing, from any CFG G, a PDA M such that L(G) = L(M). Pf: In constructing a PDA, we can permit, without losing generality, “multi- push” moves such as where For we may break a multi-push into a sequence of single-push moves by introducing new states: Henceforth we will allow multi-push moves in our PDAs. C SC 473 Automata, Grammars & Languages
CFG PDA • Idea: use nondeterminism. Given G, construct PDA P to • Load S on stack & simulate a leftmost derivation on the stack: • When a variable symbol A comes to stack top, “guess” a grammar rule A , pop A and push • When a terminal character comes to stack top, compare to next input symbol. • If they match, pop the top and advance the input (“check off”) • If they fail to match, jam (not an accepting computation) • Ifthe input holds a word in L(G) andP guesses the correct leftmost derivation (rules to apply), then all the input characters will be checked off against those at the top of the stack and the stack will empty as the last input is checked off.Otherwise at some point the PDA will jam C SC 473 Automata, Grammars & Languages
CFG PDA (cont’d) • Given construct • States: • Input alphabet: • Stack alphabet: • Start state: • Accept states: • Transition function: • Initialize stack: • Simulate rules: • Check off terminals: • Detect null stack & accept: C SC 473 Automata, Grammars & Languages
CFG PDA (cont’d) • Ex: C SC 473 Automata, Grammars & Languages
G P CFG PDA (cont’d) C SC 473 Automata, Grammars & Languages
G P CFG PDA (cont’d) CFG leftmost derivation PDA computation C SC 473 Automata, Grammars & Languages
PDA CFG • Lemma 2.27: There is an algorithm for constructing, from any PDA P, a CFG Gsuch thatL(G) = L(P). • Pf: Given a PDA we can convert it into a PDA with the following simplified structure: • it has only one accept state: • add -transitions from multiple accept states • it empties its stack just before entering the accept state: • Loop on a state that just pops: • each PDA transition is either a “pure push” • or a “pure pop • - introduce new intermediate states C SC 473 Automata, Grammars & Languages
PDA CFG (cont’d) • becomes • becomes • Idea of proof: construct G with variables for each p and q in the set of states Q. Arrange that if generates terminal string x, then PDA P started in state p with an empty stack on input string x has a computation that reaches state q with an empty stack. And conversely, if P started in state p with an empty stack has a computation on input string x that reaches state q with an empty stack, then How does P, when started on an empty stack in state p, operate on an input string x, ending with an empty stack in state q ? • First move must be a push • Last move must be a pop C SC 473 Automata, Grammars & Languages
PDA CFG (cont’d) • Trace computation of P on x starting in state p with empty stack, and ending in state q with empty stack: (1) stack never empties Fig. 1 Stack height input C SC 473 Automata, Grammars & Languages
PDA CFG (cont’d) • Trace computation of P on x starting in state p with empty stack, and ending in state q with empty stack: (2) stack empties somewhere Fig. 2 Stack height input C SC 473 Automata, Grammars & Languages
PDA CFG (cont’d) Construction. Given PDA construct with the following rules in R: • If • then C SC 473 Automata, Grammars & Languages
PDA CFG (cont’d) Claim 2.30: If then Pf: by induction on a derivation in G length k. Base: k=1. The only derivations of length 1 are and we have Step: Assume (IH) true for derivations of k steps. Want Claim true for derivations of k+1 steps. Suppose that . The first derivation step is either of the form or Case . Then with So IH By construction, since is a rule of G, C SC 473 Automata, Grammars & Languages
PDA CFG (cont’d) Case . Then with So IH Putting these together: C SC 473 Automata, Grammars & Languages
PDA CFG (cont’d) Claim 2.31: If then Pf: by induction on a computation in P of length k: Base: k=0. The only computations of length 0 are where x = . By construction Step: Assume (IH) true for computations of k steps. Want Claim true for computations of k+1 steps. Suppose that . Two cases: either the stack does not empty in midst of this computation (Fig. 1) or it Becomes empty during the computation (Fig. 2). Call these Case 1 and Case 2. C SC 473 Automata, Grammars & Languages
PDA CFG (cont’d) Case 1: See Fig.1. The symbol X pushed in the 1st move Is the same as that popped in the last move. Let the 1st and last moves be governed by the push/pop transitions: By construction, there is a rule in G Let x = ayb. Since then we must have By IH Then Using we conclude C SC 473 Automata, Grammars & Languages
PDA CFG (cont’d) Case 2: See Fig.2. Let r be the intermediate state where the stack becomes empty. Then By the IH, and Since by construction there is a rule in G of the form then C SC 473 Automata, Grammars & Languages
PDA CFG (cont’d) Ex: Rules of G: (1) push-pop pairs (1st kind): C SC 473 Automata, Grammars & Languages
PDA CFG (cont’d) Note: If (p´ unreachable) then (abbreviated ). Such variables are useless; all rules involving them on left or right sides can be eliminated as useless productions. For this grammar (2) Rules of the 2nd Kind (with useless rules removed—only 10/27 survive) in the order s,q,f: C SC 473 Automata, Grammars & Languages