1.89k likes | 2.09k Views
Module 28. Context Free Grammars Definition of a grammar G Deriving strings and defining L(G) Context-Free Language definition. Context-Free Grammars. Definition. Definition. A context-free grammar G = (V, S , S, P) V: finite set of variables (nonterminals)
E N D
Module 28 • Context Free Grammars • Definition of a grammar G • Deriving strings and defining L(G) • Context-Free Language definition
Context-Free Grammars Definition
Definition • A context-free grammar G = (V, S, S, P) • V: finite set of variables (nonterminals) • S: finite set of characters (terminals) • S: start variable • element of V • role is similar to that of q0 for an FSA or NFA • P: finite set of grammar rules or production rules • Syntax of a production • variable --> string of variables and terminals
English Context-Free Grammar • ECFG = (V, S, S, P) • V = {<sentence>, <noun phrase>, <verb phrase>, ... } • people sometimes use < > to delimit variables • In this course, we generally will use capital letters to denote variables • S = {a, b, c, ..., z, ;, ,, ., ...} • S = <sentence> • P = { <sentence> --> <noun phrase> <verb phrase> <pct>, <noun phrase> --> <article> <adj> <noun>, ...}
{aibi | i>0} CFG • ABG = (V, S, S, P) • V = {S} • S = {a, b} • S = S • P = {S --> aSb, S --> ab} or S --> aSb | ab • second format saves some space
Context-Free Grammars Deriving strings, defining L(G), and defining context-free languages
Defining -->, ==> notation • First: --> notation • This is used to define the productions of a grammar • S --> aSb | ab • Second: ==>G notation • This is used to denote theapplication of a production rule from a grammar G • S ==>ABG aSb ==>ABG aaSbb ==>ABG aaabbb • We say that string S derives string aSb (in one step) • We say that string aSb derives string aaSbb (in one step) • We say that string aaSbb derives string aaabbb (in one step) • We often omit the grammar subscript when the intended grammar is unambiguous
Defining ==> continued • Third: ==>kG notation • This is used to denote kapplications of production rules from a grammar G • S ==>2ABG aaSbb • We say that string S derives string aaSbb in two steps • aSb ==>2ABG aaabbb • We say that string aSb derives string aaabbb in two steps • We often omit the grammar subscript when the intended grammar is unambiguous
Defining ==> continued • Fourth: ==>*G notation • This is used to denote 0 or moreapplications of production rules from a grammar G • S ==>*ABG S • We say that string S derives string S in 0 or more steps • S ==>*ABG aaSbb • We say that string S derives string aaSbb in 0 or more steps • aSb ==>*ABG aaSbb • We say that string aSb derives string aaSbb in 0 or more steps • aSb ==>*ABG aaabbb • We say that string aSb derives string aaabbb in 0 or more steps • We often omit the grammar subscript when the intended grammar is unambiguous
Defining derivations * • Derivation of a string x • The complete step by step derivation of a string x from the start variable S • Key fact: each step in a derivation makes only one application of a production rule from G • Example: Derivation of string aaabbb using ABG • S ==>ABG aSb ==>ABG aaSbb ==>ABG aaabbb • Example 2: AG= (V, S, S, P) where P = S -->SS | a • Deriving string aaa • S ==> SS ==> Sa ==> SSa ==> aSa ==> aaa
Defining L(G) * • Generating strings • If S ==>G* x, then grammar G generates string x • Note G generates strings which contain terminals and nonterminals • aSb contains nonterminals and terminals • S contains only nonterminals • aaabbb contains only terminals • L(G) • The set of strings over S generated by grammar G • Note we only consider terminal strings generated by G • {aibi | i > 0} = L(ABG) • {ai | i > 0} = L(AG)
Context-Free Languages * • Context-Free Languages • A language L is a context-free language (CFL) iff there exists a CFG G such that L(G) = L • Results so far • {ai | i > 0} is a CFL • One CFG G such that L(G) = this language is AG • Note this language is also regular • {aibi | i > 0} is a CFL • One CFG G such that L(G) = this language is ABG • Note this language is NOT regular
Example * • Let BAL = the set of strings over {(,)} in which the parentheses are balanced • Prove that BAL is a CFL • To prove this, you need to come up with a CFG BALG such that L(BALG) = BAL • BALG = (V, S, S, P) • V = {S} • S = {(, )} • S = S • P = ? • Give derivations of ((( ))) and ( )(( )) with your grammar
Module 29 • Parse/Derivation Trees • Leftmost derivations, rightmost derivations • Ambiguous Grammars • Examples • Arithmetic expressions • If-then-else Statements • Inherently ambiguous CFL’s
Context-Free Grammars Parse Trees Leftmost/rightmost derivations Ambiguous grammars
Parse Tree • Parse/derivation trees are structured derivations • The structure graphically illustrates semantic information about the string • Formalization of concept we encountered in regular languages unit • Note, what we saw before were not exactly parse trees as we define them now, but they were close
S S S ( S ) ( S ) l ( S ) l Parse Tree Example • Parse tree for string ( )(( )) and grammar BALG • BALG = (V, S, S, P) • V = {S}, S = {(, )}, S = S • P = S --> SS | (S) | l • One derivation of ( )(( )) • S ==> SS ==> (S)S ==> ( )S ==> ( )(S) ==> ( )((S)) ==> ( )(( )) • Parse tree
Syntax: draw a unique arrow from each variable to each character that is a direct child of that variable A line instead of an arrow is ok The derived string can be read in a left to right traversal of the leaves Semantics The tree graphically illustrates the nesting structure of the string of parentheses S S S ( S ) ( S ) l ( S ) l Comments about Example *
There is more than one derivation of the string ( )(( )). S ==> SS ==> (S)S ==>( )S ==> ( )(S) ==> ( )((S)) ==> ( )(( )) S ==> SS ==> (S)S ==> (S)(S) ==> ( )(S) ==> ( )((S)) ==> ( )(( )) S ==> SS ==> S(S) ==> S((S)) ==> S(( )) ==> (S)(( )) ==>( )(( )) Leftmost derivation Leftmost variable is always expanded Which one of the above is leftmost? Rightmost derivation Rightmost variable is always expanded Which one of the above is rightmost? S S S ( S ) ( S ) l ( S ) l Leftmost/Rightmost Derivations
Fix a string and a grammar Any derivation corresponds to a unique parse tree Any parse tree can correspond to many different derivations Example The one parse tree corresponds to all three derivations Unique mappings For any parse tree, there is a unique leftmost/rightmost derivation that it corresponds to S S S ( S ) ( S ) l ( S ) l Comments • S ==> SS ==> (S)S ==>( )S ==> ( )(S) ==> ( )((S)) ==> ( )(( )) • S ==> SS ==> (S)S ==> (S)(S) ==> ( )(S) ==> ( )((S)) ==> ( )(( )) • S ==> SS ==> S(S) ==> S((S)) ==> S(( )) ==> (S)(( )) ==>( )(( ))
Example * • S ==> SS ==> SSS ==> (S)SS ==> ( )SS ==> ( )S ==> ( ) • The above is a leftmost derivation of the string ( ) from the grammar BALG • Draw the corresponding parse tree • Draw the corresponding rightmost derivation • S ==> (S) ==> (SS) ==> (S(S)) ==> (S( )) ==> (( )) • The above is a rightmost derivation of the string (( )) from the grammar BALG • Draw the corresponding parse tree • Draw the corresponding leftmost derivation
Ambiguous Grammars Examples: Arithmetic Expressions If-then-else statements Inherently ambiguous grammars
Ambiguous Grammars • A grammar G is ambiguous if there exists a string x in L(G) with two or more distinct parse trees • (2 or more distinct leftmost/rightmost derivations) • Example • Grammar AG is ambiguous • String aaa in L(AG) has 2 rightmost derivations • S ==> SS ==> SSS ==> SSa ==> Saa ==> aaa • S ==> SS ==> Sa ==> SSa ==> Saa ==> aaa
2 Simple Examples • Grammar BALG is ambiguous • String ( ) in L(BALG) has >1 leftmost derivation • S ==> (S) ==> ( ) • S ==> (S) ==> (SS) ==>(S) ==>( ) • Give another leftmost derivation of ( ) from BALG • Grammar ABG is NOT ambiguous • Consider any string x in {aibi | i > 0} • There is a unique parse tree for x
Legal Arithmetic Expressions • Develop a grammar MATHG = (V, S, S, P) for the language of legal arithmetic expressions • S = {0, 1, +, *, -, /, (, )} • Strings in the language include • 0 • 10 • 10*11111+100 • 10*(11111+100) • Strings not in the language include • 10+ • 11++101 • )(
Grammar MATHG1 • V = {E, N} • S = {0, 1, +, *, -, /, (, )} • S = E • P: • E --> N | E+E | E*E | E/E | E-E | (E) • N --> N0 | N1 | 0 | 1
E --> N | E+E | E*E | E/E | E-E | (E)N --> N0 | N1 | 0 | 1 MATHG1 is ambiguous • Come up with two distinct leftmost derivations of the string 11+0*11 • E ==> E+E ==> N+E ==> N1+E ==> 11+E ==> 11+E*E ==> 11+N*E ==> 11+0*E ==> 11+0*N ==> 11+0*N1 ==> 11+0*11 • E ==> E*E ==> E+E*E ==> N+E*E ==> N1+E*E ==> 11+E*E ==> 11+N*E ==> 11+0*E ==> 11+0*N ==> 11+0*N1 ==>11+0*11 • Draw the corresponding parse trees
E ==> E+E ==> N+E ==> N1+E ==> 11+E ==> 11+E*E ==> 11+N*E ==> 11+0*E ==> 11+0*N ==> 11+0*N1 ==> 11+0*11 E ==> E*E ==> E+E*E ==> N+E*E ==> N1+E*E ==> 11+E*E ==> 11+N*E ==> 11+0*E ==> 11+0*N ==> 11+0*N1 ==>11+0*11 E E * E + E E E + N E N * E N 1 N N N 1 N N N 0 1 1 0 N 1 1 1 1 Corresponding Parse Trees E E
E E E E * E + E E E + N E N * E N 1 N N N 1 N N N 0 1 1 0 N 1 1 1 1 Parse Tree Meanings Note how the parse trees captures the semantic meaning of string 11+0*11. More specifically, what number does the first parse tree represent? What number does the second parse tree represent?
Implications • Two interpretations of string 11+0*11 • 11+(0*11) = 11 • (11+0)*11 = 1001 • What if a line in a program is • MSU_Tuition = 11+0*11; • What is MSU_Tuition? • Depends on how the expression 11+0*11 is parsed. • This is not good. • Ambiguity in grammars is undesirable, particularly if the grammar is used to develop a compiler for a programming language like C++. • In this case, there is an unambiguous grammar for the language of arithmetic expressions
If-Then-Else Statements • A grammar ITEG = (V, S, S, P) for the language of legal If-Then-Else statements • V = (S, BOOL) • S = {adv<80, adv>50, grade=3.5, grade=3.0, if, then, else} • S = S • P: • S --> if BOOL then S else S | if BOOL then S |grade=3.5 | grade=3.0 • BOOL --> adv<80 | adv>50
S --> if BOOL then S |grade=3.5 | grade=3.0 | if BOOL then S else S BOOL --> adv<80 | adv>50 ITEG is ambiguous • Come up with two distinct leftmost derivations of the string • if adv<80 then if adv>50 then grade=3.5 else grade=3.0 • S ==>if BOOL then S else S ==> if adv<80 then S else S ==> if adv<80 then if BOOL then S else S ==> if adv<80 then if adv>50 then S else S ==> if adv<80 then if adv>50 then grade=3.5 else S ==> if adv<80 then if adv>50 then grade=3.5 else grade=3.0 • S ==>if BOOL then S ==> if adv<80 then S ==> if adv<80 then if BOOL then S else S ==> if adv<80 then if adv>50 then S else S ==> if adv<80 then if adv>50 then grade=3.5 else S ==> if adv<80 then if adv>50 then grade=3.5 else grade=3.0 • Draw the corresponding parse trees
S ==>if BOOL then S else S ==> if adv<80 then S else S ==> if adv<80 then if BOOL then S else S ==> if adv<80 then if adv>50 then S else S ==> if adv<80 then if adv>50 then grade=3.5 else S ==> if adv<80 then if adv>50 then grade=3.5 else grade=3.0 S ==>if BOOL then S ==> if adv<80 then S ==> if adv<80 then if BOOL then S else S ==> if adv<80 then if adv>50 then S else S ==> if adv<80 then if adv>50 then grade=3.5 else S ==> if adv<80 then if adv>50 then grade=3.5 else grade=3.0 if S if B then S else B then S S adv<80 if grade=3.0 adv<80 if else B then S B then S adv>50 adv>50 grade=3.5 grade=3.5 grade=3.0 Corresponding Parse Trees S S
Parse Tree Meanings S S if B then S if S B then S else S adv<80 if else B then S adv<80 if grade=3.0 B then S adv>50 grade=3.5 grade=3.0 adv>50 grade=3.5 If you receive a 90 on advanced points, what is your grade? By parse tree 1 By parse tree 2
Implications • Two interpretations of string • if adv<80 then if adv>50 then grade=3.5 else grade=3.0 • Issue is which if-then does the last ELSE attach to? • This phenomenon is known as the “dangling else” • Answer: Typically, else binds to NEAREST if-then • In this case, there is an unambiguous grammar for handling if-then’s as well as if-then-else’s
Inherently ambiguous CFL’s • A CFL L is inherently ambiguous iff for all CFG’s G such that L(G) = L, G is ambiguous • Examples so far • None of the CFL’s we’ve seen so far are inherently ambiguous • While the CFG’s we’ve seen ambiguous, there do exist unambiguous CFG’s for those CFL’s. • Later result • There exist inherently ambiguous CFL’s • Example: {aibjck | i=j or j=k or i=j=k} • Note i=j=k is unnecessary, but I added it here for clarity
Summary • Parse trees illustrate “semantic” information about strings • Ambiguous grammars are undesirable • This means there are multiple parse trees for some string • These strings can be interpreted in multiple ways • There are some heuristics people use for taking an ambiguous grammar and making it unambiguous, but this is not the focus of this course • There are some inherently ambiguous CFL’s • Thus, the above heuristics do not always work
Module 30 • EQUAL language • Designing a CFG • Proving the CFG is correct
EQUAL language Designing a CFG
EQUAL • EQUAL is the set of strings over {a,b} with an equal number of a’s and b’s • Strings in EQUAL include • aabbab • bbbaaa • abba • Strings in {a,b}* not in EQUAL include • aaa • bbb • aab • ababa
Designing a CFG for EQUAL • Think recursively • Base Case • What is the shortest possible string in EQUAL? • Production Rule:
Recursive Case • Recursive Case • Now consider a longer string x in EQUAL • Since x has length > 0, x must have a first character • This must be a or b • Two possibilities for what x looks like • x = ay • What must be true about relative number of a’s and b’s in y? • x = bz • What must be true about relative number of a’s and b’s in z?
Case 1: x=ay • x = ay where y has one extra b • What must y look like? • Some examples • b • babba • aabbbab • aaabbbb • Is there a general pattern that applies to all of the above examples? • More specifically, show how we can decompose all of the above strings y into 3 pieces, two of which belong to EQUAL. • Some of these pieces might be the empty string l
Decomposing y • y has one extra b • Possible examples • b, babba, aabbbab, aaabbbb • Decomposition • y = ubv where • u and v both have an equal number of a’s and b’s • Decompose the 4 strings above into u, b, v • lbl,aabbbab, lbabba, aaabbbbl
Implication • Case 1: x=ay • y has one extra b • Case 1 refined: x=aubv • u, v belong to EQUAL • Production rule for this case?
Case 2: x=bz • Case 2: x=bz • z has one extra a • Case 2 refined: x=buav • u, v belong to EQUAL • Production rule for this case?
Final Grammar • EG = (V, S, S, P) • V = {S} • S = {a,b} • S = S • P:
EQUAL language Proving CFG is correct
Is our grammar correct? • How do we prove our grammar is correct? • Informal • Test some strings • Review logic behind program (CFG) design • Formal • First, show every string derived by EG belongs to EQUAL • That is, show L(EG) is a subset of EQUAL • Second, show every string in EQUAL can be derived by EG • That is, show EQUAL is a subset of L(EG) • Both proofs will be inductive proofs • Inductive proofs and recursive algorithms go well together
L(EG) subset of EQUAL • Let x be an arbitrary string in L(EG) • What does this mean? • S ==>*EG x • Follows from definition of x in L(EG) • We will prove the following • If S ==>1EG x, then x is in EQUAL • If S ==>2EG x, then x is in EQUAL • If S ==>3EG x, then x is in EQUAL • If S ==>4EG x, then x is in EQUAL • ...