220 likes | 332 Views
Computability. Joke. Context-free grammars Parsing. Chomsky Homework: Design grammar for [simple] computer language. Proof by induction. Requires the subject domain to be classified by natural numbers: 0 or 1 or some starting point, and then all numbers following
E N D
Computability Joke. Context-free grammars Parsing. Chomsky Homework: Design grammar for [simple] computer language
Proof by induction • Requires the subject domain to be classified by natural numbers: 0 or 1 or some starting point, and then all numbers following • Prove a starting case, for example, N=1 • Prove either • if it is true for k, FOR ALL k, can prove it for k+1 • if it is true for p<k, FOR ALL k, can prove it for k • Think of induction step as short cut to proving theorem for 2, 3, 4, 5, … SO, with my screaming capital letters as a hint, what was wrong with the All horses are the same color proof?
Preview on proofs • Another typical form of proof is by construction • build / design the FSM, etc. • Another is by contradiction: assuming result and show it leads to some falsehood • One category is: assuming you can make a list of all Xs…. then some special example must be on the list but then ….
Hierarchy • Moving from languages defined by FSM (aka finite state automata), equivalent to non-deterministic FSM, equivalent to regular expressions to • Languages defined by Context-Free Grammars, equivalent to [non-deterministic ] push-down automata • will turn out that deterministic PDA are less powerful. • a FSM can be considered a special type of PDA
Grammar • A grammar has a [finite] alphabet A (sometimes called terminals) plus a finite set V of variables. • Starting symbol S is a member of V. • A production rule is a mapping/substitution of strings • A grammar has a finite set of production rules • A context-free grammar has production rules of the form a single variable V to a string of symbols from A and V • V string of letters from A and V • A non-context-free production rule would be • aVb adWb, meaning, when V is in the context of a and b, then you can substitute dW • Can combine production rules using |
Derivations • Applying the rules until there are no more variables, just terminals is a derivation. • A string is in the language defined by the grammar if there is a derivation. • Think of the variables as the parts of speech.
Example • Let the alphabet of terminals be: (, ), +, *, v, w, x, y, z • Let the variables be • E, the starting symbol, think of it as expression • F, factor • OP, operator (I use two letters for readability) • Rules are • E ( E ) | E OP E | F • F v | w | x | y | z • OP + | *
Sample derivation E E OP E E ( E ) E E OP E E F F v OP + E F F w etc. FINISH! Draw as a tree. Trees in computer science are upside down!
Parsing a string is producing a set of rules, often recorded using a tree, that derive (cover) the string. So for the string (x+y) E ( E ) E E OP E E F x E R y
Parsing • If there isn't a parse tree, then the string isn't in the language, though it may require some proof…
Derivation vs Parsing • Opposite directions • The goal of parsing is to find a derivation that generates the string. • In compiling, parsing produces information that directs the compiler to generate code.
Exercises Produce the tree(s) for • x • x + (v*w) • x + y * z • (x*y)+(v*w) • When are trees the same and when are they different? • ambiguity is when the trees are really different, not just expanded in a different order. This will be made formal next.
Left most derivation • A derivation of a string w in a Grammar G is a leftmost derivation if at each step the leftmost remaining variable is the one replaced. • A string is derived ambiguously in a CF grammar if it has two or more different leftmost derivations. A grammar G is ambiguous if it generates some string ambiguously.
Compare for ambiguity • Variables E, T (for term), F (for factor), alphabet {a, +, *, (, ) }Rules: E E+T | T T T * F | F F (E) | a • Variables E, alphabet { a, +, *, (, ) }Rules: E E+E | E*E | (E) | a Try each on the strings: a+a*a a+(a*a) (a+a)*a a+a+a+a
Regular languages • All regular languages are context free languages! • Proof: Consider the FSM that recognizes a language. Define the following Context-free grammar: • alphabet for the FSM is the terminal alphabet • let each state of the FSM be a variable. Let the initial state be the initial variable. • Rules are: if there is an arrow from state V to state W labeled with letter a, then add the production rule: V aW If state X is an accepting state, add the rule X ∊ • So…strings generated by the grammar are the strings in the language.
CF languages • Each regular language is CF, but not vice versa… • Recall B = {0i1i | i>=0}. B is strings with the a set number of 0s followed by the same number of 1s. This was shown to be non-regular. Let grammar be S 0S1 | ∊
Chomsky normal form • A CF grammar is in Chomsky normal form if each rule is of the form • A BC • A a where A, B, and C are variables and a is any terminal and B and C are not the start variable S. It is permitted (but not required) to have the rule S ∊ but no other variable can produce the empty string. • There are several other normal forms.
Outline of proof for Chomsky NF Any context free language can be generated by a grammar in Chomsky normal form. • Create a new start variable to prevent the start variable being on the right • Eliminate A ∊ rules. If there is a rule R uAv, add rule R uv. If R uAvAw, add R uvAw | uAvw | uvw • Remove unit rules A B. If B u, then add A u (unless previously removed) • If A u1u2…uk and k>=3, add new variable Ai and replace with A u1A1, A1 u2A2, etc. If A u1u2, replace with A U1U2 and U1 u1 and U2 u2 Read on-line, Sipser text on reserve, videos, etc. for complete proof.
Example: B = {0i1i | i>=0} Convert S 0S1 | ∊ to CNF • add new start and new rule: S0 S • remove S ∊ and add S 01 | 0S1 and S0 ∊ • replace unit rule (S0 S):S0 01 | 0S1| ∊ and S 01 | 0S1 • address other problems by creating new variablesS0 A0A1 | A0A3 | ∊S A0A1 | A0A3A0 0A1 1A3 SA1 Does this work (produce strings in the pattern)? Claim: yes, because notice that an A3 only arises if there was an A0 before it.
Intuition…. • Context free grammars appear to be able to keep track of things…. • Even the leftmost derivation rule still has something like recursion.
Preview • Will define push-down automata, a type of machine equivalent to context-free grammars for defining languages • Pumping lemma
Classwork/Homework • Create a grammar for a simple programming language: • assignment statements • if statements • function calls • expressions can include function calls as well as operators and parentheses • terminals are names and numbers (lexical units) plus operators (+ and *) and parentheses, brackets, and ;