570 likes | 690 Views
AUTOMATA THEORY. Chapter 05. CONTEX-FREE GRAMMERS AND LANGUAGES. Introduction. Context-free grammars (CFG) have played a central role in compiler technology since the 1960’s. They turned the implementation of parsers , ad-hoc implementation task.
E N D
Chapter 05 CONTEX-FREE GRAMMERS AND LANGUAGES
Introduction Context-free grammars (CFG) have played a central role in compiler technology since the 1960’s. They turned the implementation of parsers, ad-hoc implementation task. Parsers: functions that discover the structure of a program.
An informal example Let us consider the language of palindromes. A palindrome is a string that reads the same forward and backward, such as otto, madamimadam. Let’s consider describing only the palindromes with alphabet {0,1}. EX: 0110,11011 etc.
A Context-free Grammar for Palindromes P є P 0 P 1 P 0P0 P 1P1 Only for binary strings.
Definition of CFG A CFG is a way of describing language by recursive rules called productions. A CFG consists of … A finite set of symbols/terminals/terminal symbols. A finite set of variables/nonterminals. A start symbol/start variable. A finite set of productions/rules.
Definition of CFG (continue) Each productions consists of: the head of the production. the production symbol The body of the production, a string of zero or more terminals and variables.
Definition of CFG (continue) The four components of CFG G can be represent as follows: G = (V, T, P, S) Variables terminals Start variable productions
A Context-free Grammar for Palindromes The grammar G for the palindrome is represented by.. G = ({P},{0,1},A,P) pal pal where A represents the set of five productions: • P є • P 0 • P 1 • P 0P0 • P 1P1 only for binary string
Example of CFG A CFG for simple expressions where the operators ‘+’ and ‘*’ present. It allows only the letters ‘a’ and ’b’ and the digits ‘0’ and ‘1’. Every identifiers must begin with a and b which may be followed by any other string in {a,b,0,1}* G=({E,I},T,P,E) T={0,1,a,b,+,*,(,)} productions: • E I • E E+E • E E*E • E (E) • I a 6. I b 7. I Ia 8. I Ib 9. I I0 10 I I1
Derivation using grammar (ab+ab0) E(E)-------------4 E(E+E)----------2 E(I+E)-----------1 E(Ib+E)---------8 E(ab+E)--------5 E(ab+I)----------1 E(ab+I0)----------9 E(ab+Ib0)--------8 E(ab+ab0)-------5 productions: E I E E+E E E*E E (E) I a 6. I b 7. I Ia 8. I Ib 9. I I0 10 I I1
Example of CFG A CFG for syntactically correct infix algebraic expressions in the variables x, y and z. G=({S},T,P,S) T={x , y, z,-,+,*,/,(,)} productions: S → x S → y S → z S → S + S S → S - S S → S * S S → S / S S → ( S )
Derivation using grammar productions: S → x S → y S → z S → S + S S → S - S S → S * S S → S / S S → ( S )
LMD and RMD LMD (Left Most Derivation): At each step we replace the left most variable by one of its production bodies. Such a derivation is called a leftmost derivation. A derivation is leftmost by using the relations => and => for one or many steps. RMD (Right Most Derivation): At each step we replace the right most variable by one of its production bodies. Such a derivation is called a rightmost derivation. A derivation is leftmost by using the relations => and => for one or many steps. lm lm rm rm
Left Most Derivation CFG: EI | E+E | E*E| (E) I a| B| Ia |Ib |I0 | I1 LMD: a*(a+b00): E =>E*E lm=>I*E lm=>a*E lm=>a*(E) lm=>a*(E+E) lm=>a*(I+E) lm=>a * (a+E) lm=>a*(a+I) lm=>a*(a+I0) lm=>a*(a+I00) lm=>a*(a+b00)
Right Most Derivation CFG: EI | E+E | E*E| (E) I a| B| Ia |Ib |I0 | I1 RMD: a*(a+b00): E =>E*E rm=>E*(E) rm=>E*(E+E) rm=>E*(E+I) rm=>E*(E+I0) rm=>E*(E+I00) rm=>E * (E+b00) rm=>E*(I+b00) rm=>E*(a+b00) rm=>I*(a+I00) rm=>a*(a+b00)
The Language of a Grammar If G(V,T,P,S) is a CFG, the language of G, denoted L(G), is the set of terminal strings that have derivations from the start symbol. That is, L(G)={w in T | S w} If a language L is the language of some context-free grammar, then L is said to be a context-free language, or CFL. * G
Parse Tree A tree representation for derivations which shows clearly has the symbols of a terminal string are grouped into substrings. Parse tree used in a compiler, data structure. In a compiler, the tree structure of the source program facilities the translation of the source program into executable code by allowing natural, recursive functions to perform this translation process. Graphical representation for a derivations.
Constructing Parse Tree Let us fix on a grammar G=(V,T,P,S). The parse trees for G are trees with the following conditions: Each interior node is labeled by a variable V. Each leaf is labeled by either variable, a terminal or є. If an interior node is labeled A, and its children are labeled X1, X2………………….,Xk respectively, from the left, then A X1X2…Xk is a production.
Parse Tree Example A parse tree showing the derivation of I+E from E. E E + E I
Parse Tree Example (Continue..) A parse tree showing the derivation P 0110. * P P є P 0 P 1 P 0P0 P 1P1 0 P 0 1 1 P є
The Yield of a Parse Tree If we look at the leaves of any parse tree and concatenate them from left, we get a string called the yield of a parse tree, which is always a string that is derived from the root variable. The yield is a terminal string. That is, all leaves are labeled either with a terminal or with є. The root is labeled by the start symbol.
Parse tree showing a*(a+b00) E E E * ( E ) I E + E a I I I 0 a I 0 b
Inference, Derivations, and Parse Trees Parse Tree Leftmost Derivation Rightmost Derivation Derivation Recursive Inference
Self Study <5.2.4> <5.2.5> <5.2.6> Theorem 5.12, 5.14, 5.18
Ambiguous Grammar A grammar uniquely determines a structure for each string in its language. Not every grammar does provide unique structures. When a grammar fails to provide unique structure, it is known as ambiguous grammar. More than one derivation/parse tree.
Ambiguous Grammar example Let us consider a CFG: CFG: EI | E+E | E*E| (E) I a| B| Ia |Ib |I0 | I1 Expression: a + a*a LMD: E E+E I+E a+ E a+ E*E a+ I*E a+ a*E a+ a*I a+ a*a RMD: E E*E E*I E*a E+E*a E+I*a E+ a*a I+ a*a a+ a*a lm lm lm lm lm lm lm lm rm rm rm rm rm rm rm rm
LMD E E E + E E * I I I a a a Fig: Trees yield a+a*a
RMD E E E * E E + I I I a a a Fig: Trees yield a+a*a
Removing Ambiguity from Grammar Two causes of ambiguity in the grammar : The precedence of operator is not respected. A sequence of identical operators can group either from the left or from the right.
Two derivation trees for Prof. Busch - LSU
take Prof. Busch - LSU
Bad Tree Good Tree Compute expression result using the tree Prof. Busch - LSU
The solution of the problem of enforcing precedence is to introduce several different variables. A factor- is an expression that cannot be broken apart by any adjacent operators. The only factors in our expression language are: i. Identifiers: It is not possible to separate the letters of identifier by attaching an operator. ii. Any parenthesized expression, no matter what appears inside the parenthesis. A term- is an expression that cannot be broken by the ‘+’ operator. Term is product of one or more factors. An expression-is a sum of one or more terms. Removing Ambiguity from Grammar
Let us consider a CFG: CFG: EI | E+E | E*E| (E) I a| B| Ia |Ib |I0 | I1 An unambiguous expression grammar : I a| B| Ia |Ib |I0 | I1 F I| (E) T F| T*F E T| E+T Removing Ambiguity from Grammar
Unambiguous Grammar example CFG: I a| B| Ia |Ib |I0 | I1 F I| (E) T F| T*F E T| E+T Expression: a + a*a Derivation: E E+T T+T F+ T I+ T a+ T a+ T*F a+ F*F a+ I*I a+ a*a
Inherent Ambiguity Topic 5.4.4 L={anbncmdm|n>=1, m>=1}U{anbmcmdm| n>=1, m>=1}
Unambiguous Grammar example E E+T T+T F+ T I+ T a+ T a+ T*F a+ F*F a+ I*I a+ a*a E E T + T F T * I F F a I I Fig: Trees yield a+a*a a a
Example of CFG A CFG for generates prefix expressions with operands x and y and binary operators +, -, *. productions: E → x E → y E → +EE E → -EE E → *EE
Example of CFG Design A CFG for the set of all strings with an equal number of a’s and b’s. productions: S→ aSbS | bSaS | Є
Example of CFG Design A CFG on the string length that no string in L(G) has ba as a substring. productions: S→ aS | Sb | a| b
Example of CFG Design A CFG for the regular expression 0*1(0+1)*. productions: S→ A1B A → 0A | Є B → 0B | 1B| Є
Application of CFG CFG- a way to describe natural language Two of these uses: 1. Parsers 2. Markup language (HTML,XML) Parsers: A parse tree-as a graphical representation for derivations. Parsing is the process of determining if a string of tokens can be generated by a grammar. A complier may not actually construct a parse tree. However a parser must be capable of constructing such tree. A parser can be constructed for any grammar. The CFG is an essential concept for the implementation of parsers.
YACC Parser Generator Tools such as YACC take a CFG as input and produce a parser Exp: Id {…} | Exp ‘+’ Exp {…} | Exp ‘*’ Exp {…} | ‘(’ Exp ‘)’ {…} Id: ‘a’ {…} |’b’ {…} |Id ‘a’ {…} |Id ‘b’ {…} |Id ‘0’ {…} |Id ‘1’ {…} ;