Chapter 6 Simplification of CFGs and Normal Forms

Chapter 6Simplification of CFGs and Normal Forms

Parsing (Review) • Given a string w and a grammar G, a parser finds a derivation of the string w from the grammar G, or else determines that the string is not part of the language • Thus, a parser solves the membership problem for a language, which is the problem of deciding, for any string w and grammar G, whether w belongs to the language generated by G • Typically, a parser also constructs a parse tree for the string (which can be used by a compiler for code generation)

Parsing (Review) • Can we solve the membership problem for context-free languages? • That is, can we develop a parsing algorithm for any context-free language? • If so, can we develop an efficient parsing algorithm? • We saw in the previous chapter (ch5) that we can, if we place restrictions on the grammar. • Normal forms of context-free grammars are interesting in that, although they are restricted forms, it can be shown that every context-free grammar can be converted to a normal form. • The two types of normal forms that we will look at are Chomsky normal form and Greibach normal form.

Parsing (Review) • Simplified forms can eliminate ambiguity and otherwise “improve” a grammar • What we would like to do is to have all productions in a context-free grammar be in a form such that the derivation string (sentential form) length is strictly non-decreasing. • Given this form, if parsing produces derivation strings (sentential form) longer than the input string, we know that the string cannot belong to the language.

6.1: Methods for Transforming Grammars (1) A Useful Substitution Rule • Theorem 6.1: • This intuitive theorem allows us to simplify grammars. • Let G = (NT, T, S, P) be a context-free grammar. Suppose that P contains a production rule of the form • A → xBz • Assume that A and B are different NT and that • B → y1 | y2 | ... | ynis the set of all productions in P which have B as the left side. • Let G’ =(NT, T, S, P’) be the grammar in which P’ is constructed from P by replacing rule • A → xBz with A → xy1z | xy2z | ... | xynz • Then L(G’) = L(G)

6.1: Methods for Transforming Grammars (3) Removing Useless Productions • A non-terminalA is useful (it occurs in at least one derivation.) if: • it is reachable:occurs in a sentential form S*aAb • it is live:generates a terminal string A*w T* • A non-terminal A is useless if: • A does not occur in any sentential form • It cannot be reached from start symbol OR • A does not generate any string of terminals. • It cannot derive a terminal string • A terminal is useful if it occurs in a sentencewL(G) • Any production involving a useless symbol is a useless production.

6.1: Methods for Transforming Grammars (4) Removing Useless Productions • To eliminate useless symbols: • First: Find the set TERM that contains all non-terminals that derive a terminal string • A*w, where w  T* • Non-terminals NOT in TERM are useless, they cannot contribute to generate strings in L(G) • Second: Find the set REACH that contains all non-terminals ATERM that are reachable from S • S*aAb

6.1: Methods for Transforming Grammars (6) Removing Useless Productions • C and D do not belong to TERM, so all rules containing C and D are removed • The new grammar is • GT: S →BS | B A→aA | aF B→b E→aA | BSA F→bB | b • All non-terminals in GT derive terminal strings • Now, we must remove the non-terminals that do not occur in sentential forms of the grammar • A set REACH is built that contains all non-terminals TERMderivable from S

6.1: Methods for Transforming Grammars (7) Removing Useless Productions • GT: S →BS | B A→aA | aF B→b E→aA | BSA F→bB | b • S REACH, since it is the start symbol • B REACH, since S→SB, and hence B is derivable from S • A, E, and F can not be derived from S or B, so all rules containing A, E and F are removed

6.1: Methods for Transforming Grammars (8) Removing Useless Productions • The new grammar is • GU: S →BS | B B→b • L(GU) = b+ • The set of terminals of GU is {b}, a is removed since it does not occur in any string in the language of GU • The order is important: • Applying Second step (REACH) before First Step (TERM) may not remove all useless symbols.

6.1: Methods for Transforming Grammars (9) Removing Useless Productions • Home exercise: Remove all useless productions. • S → AB | CD | ADF | CF | EA • A → abA | ab • B → bB | aD | BF | aF • C → cB | EC | Ab • D → bB | FFB • E → bC | AB • F → abbF | baF | bD | BB • G → EbE | CE | ba • Let G = ({S, A, B, C}, {a, b}, S, {S → aS | A | C,A → a, B → aa, C → aCb}) be a CFG. • Remove all useless productions • Final grammar is • G’ = ({S}, {a}, S, {S → aS | a})

6.1: Methods for Transforming Grammars (10) Removing e-Productions • Let G be S→ SaB | aB B→ bB | e • A non-terminal symbol that can derive the null string (e) is called nullable. • For example, in G above, B is nullable since B → e • A grammar withoutnullable non-terminals is called non-contracting • G, above, is not non-contracting, since it has one nullable non-terminal, which is B.

6.1: Methods for Transforming Grammars (11) Removing e-Productions How to find nullable non-terminals? Mark all non-terminals A for which there exists a production of the form A→  Repeat Mark non-terminal X for which there exists X→  and all symbols in  have been marked as nullable Until no new non-terminal is marked Read Theorem 6.3

6.1: Methods for Transforming Grammars (12) Removing e-Productions The set of nullable non-terminals of the grammar S→ ACA A→ aAa | B | C B→ bB | b C→ cC | e is {S, A, C} C is nullable since C→ e and hence C*e A is nullable since A→ C, and C is nullable S is nullable since S→ ACA, and A and C are nullable

6.1: Methods for Transforming Grammars (13) Removing e-Productions Find nullable non-terminals. S→ aS | SS | bA A→BB B → CC | ab | aAbC C→

6.1: Methods for Transforming Grammars (14) Removing e-Productions B→ aAb | … A→ e | … B→ ab | aAb | … A→ … • If   L(G), we can eliminate all productions A →  • For every B referring to A: • For example, if B→ e and A→ BABa • Then after eliminating the rule B→, new rules for A will be added • A → BABa • A → ABa • A → BAa • A → Aa

6.1: Methods for Transforming Grammars (15) Removing e-Productions Let G be S→ SaB | aB B→ bB | e After removing e-productions, the new grammar will be S→ SaB | Sa | aB | a B→ bB | b Let G = (NT, T, P, S) be a CFG. If Aw, then the grammar G’ = (NT, T, P {A→w}, S) is equivalent to G(i.e., L(G) = L(G’)) The removal of e-productionsincreases the number of rules but reduces the length of derivations. *

6.1: Methods for Transforming Grammars (16) Removing e-Productions Let GS→ ACA A→ aAa | B | C B→ bB | b C→ cC | e The equivalent essentially non-contracting grammar GL is GL: S→ ACA | CA | AA | AC | A | C | e A→ aAa | aa | B | C B→ bB | b C→ cC | c Since S*e in G, the rule S→e is allowed in GL, but all other e-productions are replaced A grammar satisfying these conditions is called essentially non-contracting (only start symbol is nullable)

6.1: Methods for Transforming Grammars (17) Removing e-Productions • Let G be • S→ aS | SS | bA • A→ BB • B→ ab | aAbC | aAb | CC • C→  • We eliminate C→  by replacing: • B→ CC into B→ CC, B→ C, and B→  • B→ aAbC into B→ aAbC and B→ aAb • Since C →  is only C production • only B →  and B → aAb retained. • The new grammar: • S→ aS | SS | bA • A→ BB • B→  | ab | aAb

6.1: Methods for Transforming Grammars (18) Removing e-Productions • The new grammar: • S→ aS | SS | bA • A→ BB • B→  | ab | aAb • We eliminate B→  by replacing • A→BB into A→BB, A→B, and A→ • Since there are other B productions, these are all retained • The new grammar: • S→ aS | SS | bA • A→ BB | B |  • B→ ab | aAb

6.1: Methods for Transforming Grammars (19) Removing e-Productions • The new grammar: • S→ aS | SS | bA • A→ BB | B |  • B→ ab | aAb • Finally we eliminate A → by replacing • B→aAb into B→aAb, B→ab • S→bA into S→bA | b • The final CFG is: • S→ aS | SS | bA | b • A→ BB | B • B→ ab | aAb

6.1: Methods for Transforming Grammars (20)Removing of Unit Rules • Rules having this form A→B are called unit rules • Consider the rules • A→ aA | a | B • B → bB | b | C • The unit rule A→B indicates that any string derivable from B is also derivable from A • The removal of unit rules increases the number of rules but reduces the length of derivations.

6.1: Methods for Transforming Grammars (21)Removing of Unit Rules To eliminate the unit rule, add A rules that directly generate the same strings as B Add a rule A→u for each B → u and deleting A→B from the grammar Read Theorem 6.4 A→B B→a | ... A→a | ... B→a | ...

6.1: Methods for Transforming Grammars (22)Removing of Unit Rules Consider the rules A→ aA | a | B B → bB | b | C The new rules after eliminating the unit rule A→B A→ aA | a | bB | b | C B → bB | b | C We add new rules to A by replacing B in A with all its RHS rules

6.1: Methods for Transforming Grammars (23)Removing of Unit Rules GL: S → ACA | CA | AA | AC | A | C | e A → aAa | aa | B | C B → bB | b C → cC | c The new equivalent grammar (without unit rules) GC: S → ACA | CA | AA | AC | aAa | aa | bB | b | cC | c| e A → aAa | aa | bB | b | cC | c B → bB | b C → cC | c

6.1: Methods for Transforming Grammars (24)Removing of Unit Rules Remove unit rules: S → T | S + T T →F | F * T F → a | (S) S →T | S + T T →a | (S) | F * T F → a | (S) S →a | (S) | F * T | S + T T →a | (S) | F * T F → a | (S)

6.2: Chomsky Normal Form (1) • The Chomsky normal form places restrictions on the length and the composition of the right-hand side of a rule • Definition 6.4: • A CFG is in Chomsky normal form if each production rule has one of the following forms: • A→a • A→BC • S→e • where B, C NT • Read Theorem 6.6

6.2: Chomsky Normal Form (2) • Algorithm Step 1 • Make sure that the following are satisfied: • No e-productions (other than S→ e) • No chain rules • No useless symbols

6.2: Chomsky Normal Form (3) • Algorithm Step 2 • Eliminate terminals from RHS of productions • For each production A→ X1X2…Xm • where Xi NT T • If m 1, replace each terminala RHS of A • Add (if needed) Ca→ a for each a T, where each Ca is new non-terminal. • In production A, replace terminal a with corresponding Ca

6.2: Chomsky Normal Form (4) • Algorithm Step 3 • Eliminate productions with long RHS: • For each production: • A→ B1B2…Bm, m 2, where BiNT • replace with productions • A→ B1D1 • D1→ B2D2 • … • Dm-2→ Bm-1Bm • where D1…Dm-2 are new non-terminals.

6.2: Greibach Normal Form (7) • A context-free grammar is in Greibach Normal Form if every production is of the form A → aX • where A  NT, X  NT*, and a  Σ • Examples: • G1 = ({S, A}, {a, b}, S, {S → aSA| a, A → aA| b}) • GNF • G2= ({S, A}, {a, b}, S, {S → AS | AAS, A → SA | aa}) • not GNF • This grammar S  AB A aA|bB | b B b is not in GNF • This grammar S aAB | bBB | bB A aA|bB | b B  b is in GNF

CFG Simplification: Example How can the following be simplified? S  A B 1) Delete B useless because nothing derivable from B S  A C D 2) Delete either AAaorAaA A  A a3) Delete one of the identical productions A a 4) Delete C e, also replace SACD with SAD A aA 5) Replace with D eAe A a6) Delete E useless after change #5 C e7) Delete F useless because not derivable from S D dD D E E e A e F ff Note how some simplifications can allow other subsequent simplifications.

Chapter 6 Simplification of CFGs and Normal Forms

Chapter 6 Simplification of CFGs and Normal Forms

Presentation Transcript

Normal Forms and Infinity

Normal Forms

Normal Forms

Normal Forms

Normal forms

Normal Forms

Schema Refinement and Normal Forms Chapter 19

Normal Forms

Chapter 6 Simplification of CFGs and Normal Forms

4.4 Normal Forms

Normal Forms

Chapter 6 Simplification of Context-free Grammars and Normal Forms

Chapter 6 Simplification of Context-free Grammars and Normal Forms

CHAPTER 19 SCHEMA, REFINEMENT AND NORMAL FORMS

Schema Refinement and Normal Forms Chapter 19

Normal Forms and XML

Chapter 6 Normal Distributions

Chapter 6: Forms

Schema Refinement and Normal Forms Chapter 19

Chapter 6 : Normal Distributions