140 likes | 338 Views
Module 32. Chomsky Normal Form (CNF) 4 step process. Chomsky Normal Form. A CFG is in Chomsky normal form (CNF) if every production is one of these two types: A → BC A → a Key ideas Eliminating λ-productions (e.g. S → λ) Eliminating unit productions (e.g. A → B). Nullable Variables.
E N D
Module 32 • Chomsky Normal Form (CNF) • 4 step process
Chomsky Normal Form • A CFG is in Chomsky normal form (CNF) if every production is one of these two types: A → BC A → a • Key ideas • Eliminating λ-productions (e.g. S → λ) • Eliminating unit productions (e.g. A → B)
Nullable Variables • A variable A in a CFG G = (V, Σ, S, P) is defined as nullable if: • Base case: P contains the production A → λ • Recursive case: P contains the production A → B1B2 … Bn and B1 through Bn are nullable • No other variables are nullable
Finding Nullable Variables • Initialize N0 to be the set of nullable variables by the base case definition • i = 0; • do • i = i+1; • Ni = Ni-1 union {A | P contains A → α where α is a string in Ni-1*} • while Ni ≠ Ni-1; • The final Ni is the set of nullable variables
Eliminating λ-productions • Given CFG G = (V, Σ, S, P), construct a CFG G1 = (V, Σ, S, P1) as follows. • Initialize P1 = P • Find set of all nullable variables N in G • For every production A → α in P, add to P1 every production that can be obtained from this one by deleting from α one or more of the occurrences of nullable variables in α • Example: A → BBCD where B and C are nullable leads to A → BCD | BCD | BBD | BD | BD | CD | D • Clean up • Delete all λ-productions from P1 • Delete any duplicate productions • Delete any productions of the form A → A • Thm: L(G1) = L(G) – {λ}
A-derivable Variables • B is A-derivable in a CFG G if and only if A ==>G* B • Recursive definition: A variable B in a CFG G = (V, Σ, S, P) is defined as A-derivable if: • Base case: P contains the production A → B or B = A • Recursive case: Variable C is A-derivable and P contains the production C → B • No other variables are A-derivable • Easy to make into algorithm
Eliminating unit productions • Given CFG G = (V, Σ, S, P), construct a CFG G1 = (V, Σ, S, P1) as follows. • Initialize P1 = P • Find each A in V, find set of A-derivable variables in V • For each pair (A,B) such that B is A-derivable and every non-unit production B → α in P, add production A → α in P1 • Clean up • Delete all unit productions from P1 • Delete any duplicate productions • Thm: L(G1) = L(G) if G did not have any λ-productions
Making CNF grammar • First eliminate λ-productions • Then eliminate unit productions • For each terminal a in Σ, introduce a variable Xa with production rule Xa → a • For each production of the form A → α where terminal a appears and |α| > 1, replace a with Xa • Finally, replace productions of the form A → B1B2 … Bn with a series of productions: • A →B1Y1 • Y1 →B21Y2 • .... • Yn-2 →Bn-1Bn • Other methods can be used for this last step
Example: Eliminate λ-productions • S → AACD • A → aAb | λ • C → aC | a • D → aDa | bDb | λ • Nullable variables: A & D • New grammar • S → AACD | ACD | AAC | CD | AC | C • A → aAb | ab • C → aC | a • D → aDa | bDb | aa | bb
Example: Eliminate unit productions • S → AACD | ACD | AAC | CD | AC | C • A → aAb | ab • C → aC | a • D → aDa | bDb | aa | bb • New grammar: • S → AACD | ACD | AAC | CD | AC | aC | a • A → aAb | ab • C → aC | a • D → aDa | bDb | aa | bb
Example: Add Xa and Xb • S → AACD | ACD | AAC | CD | AC | aC | a • A → aAb | ab • C → aC | a • D → aDa | bDb | aa | bb • New grammar • S → AACD | ACD | AAC | CD | AC | XaC | a • A → XaAXb | XaXb • C → XaC | a • D → XaDXa | XbDXb | XaXa | XbXb • Xa→ a • Xb→ b
Example: Shorten long productions • S → AACD | ACD | AAC | CD | AC | XaC | a • A → XaAXb | XaXb • C → XaC | a • D → XaDXa | XbDXb | XaXa | XbXb • Xa→ a • Xb→ b • Example replacement • S → AACD becomes S → AT1, T1 → AT2, T2 → CD
Observations • Consider a derivation from a CNF grammar G that begins: S ==>G ABCD • How short can the final derived terminal string be? • Why?
Observation 2 • A path in a parse tree has length x if it contains x variables • Consider a parse tree T for a string x and a CNF grammar G with m variables. • Suppose the longest path in T has length k. How long can this string x be? • Suppose string x has length 2m. How short can the longest path in T be?