340 likes | 530 Views
Context Free Grammar. Introduction. Why do we want to learn about Context Free Grammars ? Used in many parsers in compilers Yet another compiler-compiler, yacc GNU project parser generator, bison Web application In describing the formats of XML ( eXtensible Markup Language) documents.
E N D
Introduction • Why do we want to learn about Context Free Grammars? • Used in many parsers in compilers • Yet another compiler-compiler, yacc • GNU project parser generator, bison • Web application • In describing the formats of XML (eXtensible Markup Language) documents
Context Free Grammar • G = (V, T, P, S) • V – a set of variables, e.g. {S, A, B, C, D, E} • T – a set of terminals, e.g. {a, b, c} • P – a set of productions rules • In the form of A , where AV, (VT)* e.g. S aB • S is a special variable called the start symbol
CFG – Example 1 Construct a CFG for the following language : L1={0n1n | n>0} Thought: If is a string in L1, so is 01, i.e. 0k+11k+1 = 0(0k1k)1 S 01 | 0S1
CFG – Example 2 Construct a CFG for the following language : L2={0i1j | ij and i, j>0} Thought: Similar to L1, but at least one more ‘1’ or at least one more ‘0’ S AC | CB A 0A | 0 B B1 | 1 C 0C1 |
CFG – Example 3 Determine the language generated by the CFG: S AS | A A1 | 0A1 | 01 S generates consecutive ‘A’s A generates 0i1j where i j Languages: each block of ‘0’s is followed by at least as many ‘1’s
Derivation How a string is generated by a grammar G Consider the grammar G S AS | A A1 | 0A1 | 01 S 0110011 ?
S A S A A 1 S 0 1 0 A 1 0 1 Derivation Rightmost Derivation Leftmost Derivation Parse Tree or Derivation Tree S AS A1S 011S 011AS 0110A1S 0110011S 0110011 S AS AAS AA A0A1 A0011 A10011 0110011
Ambiguity Each parse tree has one unique leftmost derivation and one unique rightmost derivation. A grammar is ambiguous if some strings in it have more than one parse trees, i.e., it has more than one leftmost derivations (or more than one rightmost derivations).
Ambiguity • Consider the following grammar G: • S AS | a | b • A SS | ab • A string generated by this grammar can have more than one parse trees. Consider the string abb: S S A S A S b S S b a b a b
Ambiguity • As another example, consider the following grammar: E E + E | E * E | (E) | x | y | z • There are 2 leftmost derivations for x + y + z: E E + E x + E x + E + E x + y + E x + y + z E E + E E + E + E x + E + E x + y + E x + y + z E E E E E E + + x z y z x y + +
Simplification of CFG • Remove useless variables • Generating Variables • Reachable Variables • Remove -productions, e.g. A • Remove unit-productions, e.g. AB
Useless Variables • A variable X is useless if: • X does not generate any string of terminals, or • The start symbol, S, cannot generate X. S X • where , (VT)*, X V and T*
Removal of Non-generating Variables • Mark each production of the form: X where T* • Repeat • Mark X where consists of terminals or variables which are on the left hand side of some marked productions. • Until no new productions is marked. • Remove all unmarked productions.
Example • CFG: S AB|CA A a B BC|AB C aB|b S CA A a C b
Example • CFG: S aAa|aB A aS|bD B aBa|b C abb|DD D aDa S aAa|aB A aS B aBa|b C abb
Removal of Non-Reachable Variables • Mark each production of the form: S where (VT)* • Repeat • Mark X where X appears on the right hand side of some marked productions. • Until no new productions is marked. • Remove all unmarked productions.
Example CFG: S aAa|aB A aS B aBa|b C abb S aAa|aB A aS B aBa|b
Example CFG: S Aab|AB A a B bD| bB C A | B D a S Aab|AB A a B bD | bB D a
Remove Useless Variables A Bac|bTC|ba T aB|TB B aTB|bBC C TBc|aBC|ac A ba C ac A ba
Removal of -productions Step 1: Find nullable variables Step 2: Remove all -productions by replacing some productions
How to find nullable variables ? • Mark all variables A for which there exists a production of the form A . • Repeat • Mark X for which there exists X where V* and all symbols in have been marked. • Until no new variables is marked.
Removal of -Productions • We can remove all -productions (except S if S is nullable) by rewriting some other productions. • e.g., If X1 and X3 are nullable, we should replace • A X1X2X3X4 by: • A X1X2X3X4 | X2X3X4 | X1X2X4 | X2X4
Removal of -productions Example: S A | B | C A aAa | B B bB | bb C aCaa | D D baD | abD | Step1: nullable variables are D, C and S
Removal of -productions(3) Step 2: • eliminate D by replacing: • D baD | abD into D baD | abD | ba | ab • eliminate C by replacing: • C aCaa | D into C aCaa | D | aaa • S A | B | C into S A | B | C | • The new grammar: • SA | B | C | • AaAa | B • BbB | bb • CaCaa | D | aaa • DbaD | abD | ba | ab Example:S A | B | C A aAa | B B bB | bb C aCaa | D D baD | abD |
Example S ABCD A a B C ED | D BC E b S ABCD|ABC|ABD|ACD|AB|AC|AD|A • A a • C ED|E D BC|B|C • E b
Removal of Unit Productions • Unit production: A B, where A and B V • 1. Detect cycles of unit productions: • A1 A2 A3 … A1 • Replaced A1, A2, A3 , …, Ak by any one of them • 2. Detect paths of unit productions: • A1 A2 A3 …, Ak , where (VT)*/V • Add productions A1 , A2 , …, Ak and remove all the unit productions
Removal of Unit Productions Example: S A | B | C A aa | B B bb | C C cc | A Cycle: A B C A Remove by replace B,C by A S A | A | A A aa | A A bb | A A cc | A Becomes: S A A aa | bb | cc Path: S A Remove by adding productions S aa | bb | cc
Example CFG: S → A | Bb A → C | a B → aBa | b C → aSa S → Bb | aSa |a → A → a | aSa B → aBa | b C → aSa
Example Substitute
Example Unit productions of form can be removed immediately Remove
Example Substitute
Example Remove repeated productions Final grammar