370 likes | 589 Views
Chapter 6 Simplification of CFGs and Normal Forms. Parsing (Review). Given a string w and a grammar G , a parser finds a derivation of the string w from the grammar G , or else determines that the string is not part of the language
E N D
Parsing (Review) • Given a string w and a grammar G, a parser finds a derivation of the string w from the grammar G, or else determines that the string is not part of the language • Thus, a parser solves the membership problem for a language, which is the problem of deciding, for any string w and grammar G, whether w belongs to the language generated by G • Typically, a parser also constructs a parse tree for the string (which can be used by a compiler for code generation)
Parsing (Review) • Can we solve the membership problem for context-free languages? • That is, can we develop a parsing algorithm for any context-free language? • If so, can we develop an efficient parsing algorithm? • We saw in the previous chapter (ch5) that we can, if we place restrictions on the grammar. • Normal forms of context-free grammars are interesting in that, although they are restricted forms, it can be shown that every context-free grammar can be converted to a normal form. • The two types of normal forms that we will look at are Chomsky normal form and Greibach normal form.
Parsing (Review) • Simplified forms can eliminate ambiguity and otherwise “improve” a grammar • What we would like to do is to have all productions in a context-free grammar be in a form such that the derivation string (sentential form) length is strictly non-decreasing. • Given this form, if parsing produces derivation strings (sentential form) longer than the input string, we know that the string cannot belong to the language.
6.1: Methods for Transforming Grammars (1) A Useful Substitution Rule • Theorem 6.1: • This intuitive theorem allows us to simplify grammars. • Let G = (NT, T, S, P) be a context-free grammar. Suppose that P contains a production rule of the form • A → xBz • Assume that A and B are different NT and that • B → y1 | y2 | ... | ynis the set of all productions in P which have B as the left side. • Let G’ =(NT, T, S, P’) be the grammar in which P’ is constructed from P by replacing rule • A → xBz with A → xy1z | xy2z | ... | xynz • Then L(G’) = L(G)
6.1: Methods for Transforming Grammars (2) A Useful Substitution Rule • Let G be • S → a | aaS | abBc • B → abbS | b • Applying theorem 6.1 results in • S → a | aaS | ababbSc| abbc • B → abbS | b • The rules B → abbS | b, which are still part of the grammar, no longer serve any purpose. • Both of these useless rules may be deleted without effectively changing the grammar.
6.1: Methods for Transforming Grammars (3) Removing Useless Productions • A non-terminalA is useful (it occurs in at least one derivation.) if: • it is reachable:occurs in a sentential form S*aAb • it is live:generates a terminal string A*w T* • A non-terminal A is useless if: • A does not occur in any sentential form • It cannot be reached from start symbol OR • A does not generate any string of terminals. • It cannot derive a terminal string • A terminal is useful if it occurs in a sentencewL(G) • Any production involving a useless symbol is a useless production.
6.1: Methods for Transforming Grammars (4) Removing Useless Productions • To eliminate useless symbols: • First: Find the set TERM that contains all non-terminals that derive a terminal string • A*w, where w T* • Non-terminals NOT in TERM are useless, they cannot contribute to generate strings in L(G) • Second: Find the set REACH that contains all non-terminals ATERM that are reachable from S • S*aAb
6.1: Methods for Transforming Grammars (5) Removing Useless Productions • Example 1 • G: S →AC | BS | B A→aA | aF B→CF | b C→cC | D D→aD | BD | C E→aA | BSA F→bB | b • L(G) is b+ • B, F TERM, since both generate terminals • S TERM, since S→B and hence S*b • A TERM, since A→aF and hence A*ab • E TERM, since E→aA and hence E*aab
6.1: Methods for Transforming Grammars (6) Removing Useless Productions • C and D do not belong to TERM, so all rules containing C and D are removed • The new grammar is • GT: S →BS | B A→aA | aF B→b E→aA | BSA F→bB | b • All non-terminals in GT derive terminal strings • Now, we must remove the non-terminals that do not occur in sentential forms of the grammar • A set REACH is built that contains all non-terminals TERMderivable from S
6.1: Methods for Transforming Grammars (7) Removing Useless Productions • GT: S →BS | B A→aA | aF B→b E→aA | BSA F→bB | b • S REACH, since it is the start symbol • B REACH, since S→SB, and hence B is derivable from S • A, E, and F can not be derived from S or B, so all rules containing A, E and F are removed
6.1: Methods for Transforming Grammars (8) Removing Useless Productions • The new grammar is • GU: S →BS | B B→b • L(GU) = b+ • The set of terminals of GU is {b}, a is removed since it does not occur in any string in the language of GU • The order is important: • Applying Second step (REACH) before First Step (TERM) may not remove all useless symbols.
6.1: Methods for Transforming Grammars (9) Removing Useless Productions • Home exercise: Remove all useless productions. • S → AB | CD | ADF | CF | EA • A → abA | ab • B → bB | aD | BF | aF • C → cB | EC | Ab • D → bB | FFB • E → bC | AB • F → abbF | baF | bD | BB • G → EbE | CE | ba • Let G = ({S, A, B, C}, {a, b}, S, {S → aS | A | C,A → a, B → aa, C → aCb}) be a CFG. • Remove all useless productions • Final grammar is • G’ = ({S}, {a}, S, {S → aS | a})
6.1: Methods for Transforming Grammars (10) Removing e-Productions • Let G be S→ SaB | aB B→ bB | e • A non-terminal symbol that can derive the null string (e) is called nullable. • For example, in G above, B is nullable since B → e • A grammar withoutnullable non-terminals is called non-contracting • G, above, is not non-contracting, since it has one nullable non-terminal, which is B.
6.1: Methods for Transforming Grammars (11) Removing e-Productions How to find nullable non-terminals? Mark all non-terminals A for which there exists a production of the form A→ Repeat Mark non-terminal X for which there exists X→ and all symbols in have been marked as nullable Until no new non-terminal is marked Read Theorem 6.3
6.1: Methods for Transforming Grammars (12) Removing e-Productions The set of nullable non-terminals of the grammar S→ ACA A→ aAa | B | C B→ bB | b C→ cC | e is {S, A, C} C is nullable since C→ e and hence C*e A is nullable since A→ C, and C is nullable S is nullable since S→ ACA, and A and C are nullable
6.1: Methods for Transforming Grammars (13) Removing e-Productions Find nullable non-terminals. S→ aS | SS | bA A→BB B → CC | ab | aAbC C→
6.1: Methods for Transforming Grammars (14) Removing e-Productions B→ aAb | … A→ e | … B→ ab | aAb | … A→ … • If L(G), we can eliminate all productions A → • For every B referring to A: • For example, if B→ e and A→ BABa • Then after eliminating the rule B→, new rules for A will be added • A → BABa • A → ABa • A → BAa • A → Aa
6.1: Methods for Transforming Grammars (15) Removing e-Productions Let G be S→ SaB | aB B→ bB | e After removing e-productions, the new grammar will be S→ SaB | Sa | aB | a B→ bB | b Let G = (NT, T, P, S) be a CFG. If Aw, then the grammar G’ = (NT, T, P {A→w}, S) is equivalent to G(i.e., L(G) = L(G’)) The removal of e-productionsincreases the number of rules but reduces the length of derivations. *
6.1: Methods for Transforming Grammars (16) Removing e-Productions Let GS→ ACA A→ aAa | B | C B→ bB | b C→ cC | e The equivalent essentially non-contracting grammar GL is GL: S→ ACA | CA | AA | AC | A | C | e A→ aAa | aa | B | C B→ bB | b C→ cC | c Since S*e in G, the rule S→e is allowed in GL, but all other e-productions are replaced A grammar satisfying these conditions is called essentially non-contracting (only start symbol is nullable)
6.1: Methods for Transforming Grammars (17) Removing e-Productions • Let G be • S→ aS | SS | bA • A→ BB • B→ ab | aAbC | aAb | CC • C→ • We eliminate C→ by replacing: • B→ CC into B→ CC, B→ C, and B→ • B→ aAbC into B→ aAbC and B→ aAb • Since C → is only C production • only B → and B → aAb retained. • The new grammar: • S→ aS | SS | bA • A→ BB • B→ | ab | aAb
6.1: Methods for Transforming Grammars (18) Removing e-Productions • The new grammar: • S→ aS | SS | bA • A→ BB • B→ | ab | aAb • We eliminate B→ by replacing • A→BB into A→BB, A→B, and A→ • Since there are other B productions, these are all retained • The new grammar: • S→ aS | SS | bA • A→ BB | B | • B→ ab | aAb
6.1: Methods for Transforming Grammars (19) Removing e-Productions • The new grammar: • S→ aS | SS | bA • A→ BB | B | • B→ ab | aAb • Finally we eliminate A → by replacing • B→aAb into B→aAb, B→ab • S→bA into S→bA | b • The final CFG is: • S→ aS | SS | bA | b • A→ BB | B • B→ ab | aAb
6.1: Methods for Transforming Grammars (20)Removing of Unit Rules • Rules having this form A→B are called unit rules • Consider the rules • A→ aA | a | B • B → bB | b | C • The unit rule A→B indicates that any string derivable from B is also derivable from A • The removal of unit rules increases the number of rules but reduces the length of derivations.
6.1: Methods for Transforming Grammars (21)Removing of Unit Rules To eliminate the unit rule, add A rules that directly generate the same strings as B Add a rule A→u for each B → u and deleting A→B from the grammar Read Theorem 6.4 A→B B→a | ... A→a | ... B→a | ...
6.1: Methods for Transforming Grammars (22)Removing of Unit Rules Consider the rules A→ aA | a | B B → bB | b | C The new rules after eliminating the unit rule A→B A→ aA | a | bB | b | C B → bB | b | C We add new rules to A by replacing B in A with all its RHS rules
6.1: Methods for Transforming Grammars (23)Removing of Unit Rules GL: S → ACA | CA | AA | AC | A | C | e A → aAa | aa | B | C B → bB | b C → cC | c The new equivalent grammar (without unit rules) GC: S → ACA | CA | AA | AC | aAa | aa | bB | b | cC | c| e A → aAa | aa | bB | b | cC | c B → bB | b C → cC | c
6.1: Methods for Transforming Grammars (24)Removing of Unit Rules Remove unit rules: S → T | S + T T →F | F * T F → a | (S) S →T | S + T T →a | (S) | F * T F → a | (S) S →a | (S) | F * T | S + T T →a | (S) | F * T F → a | (S)
6.2: Chomsky Normal Form (1) • The Chomsky normal form places restrictions on the length and the composition of the right-hand side of a rule • Definition 6.4: • A CFG is in Chomsky normal form if each production rule has one of the following forms: • A→a • A→BC • S→e • where B, C NT • Read Theorem 6.6
6.2: Chomsky Normal Form (2) • Algorithm Step 1 • Make sure that the following are satisfied: • No e-productions (other than S→ e) • No chain rules • No useless symbols
6.2: Chomsky Normal Form (3) • Algorithm Step 2 • Eliminate terminals from RHS of productions • For each production A→ X1X2…Xm • where Xi NT T • If m 1, replace each terminala RHS of A • Add (if needed) Ca→ a for each a T, where each Ca is new non-terminal. • In production A, replace terminal a with corresponding Ca
6.2: Chomsky Normal Form (4) • Algorithm Step 3 • Eliminate productions with long RHS: • For each production: • A→ B1B2…Bm, m 2, where BiNT • replace with productions • A→ B1D1 • D1→ B2D2 • … • Dm-2→ Bm-1Bm • where D1…Dm-2 are new non-terminals.
6.2: Chomsky Normal Form (5) • Original grammar (no chain rules, useless symbols, or e-productions): S X a Y | Y b X Y X a Y | a Y S S| a X | b • Grammar after eliminating terminals from RHSs: S X A Y | Y B A a X Y X A Y | a B b Y S S| A X | b • Grammar after eliminating long RHSs: S X T | Y B T A Y A a X Y F | a F X G B b Y S S| A X | b G A Y Note: Could simplify by combining redundant variables T and G
6.2: Chomsky Normal Form (6) • Original grammar (no chain rules, useless symbols, or e-productions): S aXYZ | a X aX | a Y bcY | bc Z cZ | c • Grammar after eliminating terminals from RHSs: S AXYZ | a A a X AX | a B b Y BCY | BC C c Z CZ | c • Grammar after eliminating long RHSs: S AF | a A a F XG X AX | a B b G YZ Y BH | BC C c H CY Z CZ | c • See Example 6.8
6.2: Greibach Normal Form (7) • A context-free grammar is in Greibach Normal Form if every production is of the form A → aX • where A NT, X NT*, and a Σ • Examples: • G1 = ({S, A}, {a, b}, S, {S → aSA| a, A → aA| b}) • GNF • G2= ({S, A}, {a, b}, S, {S → AS | AAS, A → SA | aa}) • not GNF • This grammar S AB A aA|bB | b B b is not in GNF • This grammar S aAB | bBB | bB A aA|bB | b B b is in GNF
CFG Simplification: Example How can the following be simplified? S A B 1) Delete B useless because nothing derivable from B S A C D 2) Delete either AAaorAaA A A a3) Delete one of the identical productions A a 4) Delete C e, also replace SACD with SAD A aA 5) Replace with D eAe A a6) Delete E useless after change #5 C e7) Delete F useless because not derivable from S D dD D E E e A e F ff Note how some simplifications can allow other subsequent simplifications.