980 likes | 1.31k Views
Chomsky Normal Form CYK Algorithm. Normal Forms. There are some special forms in which I can bring the grammar to work with it more easily. Chomsky Normal Form Greibach Normal Form. Chomsky Normal Form. A Context Free Grammar is in Chomsky Normal Form if the rules are of the form: A ⟶ BC
E N D
Normal Forms There are some special forms in which I can bring the grammar to work with it more easily. • Chomsky Normal Form • Greibach Normal Form
Chomsky Normal Form A Context Free Grammar is in Chomsky Normal Form if the rules are of the form: • A⟶BC • A ⟶ a • S ⟶ε with A, B, C being variables (B,C not being the start variable), a being a terminal and S only being the start variable.
Chomsky Normal Form There are 5 steps to follow in order to transform a grammar into CNF: • Add the a new start variable S0 and the production rule S0⟶ S. • Eliminate the ε-rules. • Eliminate the unary productions A ⟶ B. • Add rules of the form Vt⟶ t for every terminal t and replace t with the variable Vt. • Transform the remaining of the rules to the form A ⟶ BC (A, B, C variables).
1. Add a new start variable • We have to make sure that the start variable doesn’t occur to the right side of some rule. • Thus, we add a new start variable S0 and the rule S0⟶ S, where S is the old start variable.
2. Eliminate ε-rules • We have to eliminate all productions of the form A⟶ ε, for A being any non-start variable. • To do so we should remove the rule A⟶ ε and replace every appearance of A with ε in all other rules.
3. Eliminate unary productions • A unary production is a production of the form A ⟶ B (with both A, B being variables). • There should only be productions of the form V1 ⟶ V2V3 involving variables, thus we have to eliminate unary productions. • To do so, we replace B in A ⟶ B with the right parts of the rules involving B in the left part.
4. Add Vt⟶ t and replace t with Vt • There should only be rules of the form A ⟶ t involving terminals, thus terminals should disappear from every other rule involving more than just one single literal. • To do so, we add a new variable Vt for every terminal t and we replace every appearance of t with Vt , except those in rules of the form A ⟶ t.
5. Transform rules to A ⟶ BC • All the rules involving only variables should be of the form A ⟶ BC. Thus we should take care of all the rules involving more than 2 variables in the right part • For the rule V ⟶ A1A2A3…An,we start reducing the size of the right part by replacing every two variables with one new variable (resulting in the creation of n-2 new variables).
5. Transform rules to A ⟶ BC V ⟶ A1A2A3A4A5A6…An
5. Transform rules to A ⟶ BC V ⟶ B1A3A4A5A6…An B1 ⟶ A1A2
5. Transform rules to A ⟶ BC V ⟶ B2A4A5A6…An B2 ⟶ B1A3 B1 ⟶ A1A2
5. Transform rules to A ⟶ BC V ⟶ B3A5A6…An B3 ⟶ B2A4 B2 ⟶ B1A3 B1 ⟶ A1A2
5. Transform rules to A ⟶ BC V ⟶ B4A6…An B4 ⟶ B3A5 B3 ⟶ B2A4 B2 ⟶ B1A3 B1 ⟶ A1A2
5. Transform rules to A ⟶ BC V ⟶ Bn-2An Bn-2 ⟶ Bn-3An-1 … B4 ⟶ B3A5 B3 ⟶ B2A4 B2 ⟶ B1A3 B1 ⟶ A1A2
Example S ⟶ CSC | B C ⟶ 00 | ε B ⟶ 01B | 1
Example 1. Add new start variablle S0 ⟶ S S ⟶ CSC | B C ⟶ 00 | ε B ⟶ 01B | 1
Example 2. Eliminate ε-moves S0 ⟶ S S ⟶ CSC | B C ⟶ 00 | ε B ⟶ 01B | 1
Example 2. Eliminate ε-moves S0 ⟶ S S ⟶ CSC | B |CS| SC |S C ⟶ 00 B ⟶ 01B | 1
Example 3. Eliminate Unary Productions S0 ⟶ S S ⟶ CSC | B | CS | SC|S C ⟶ 00 B ⟶ 01B | 1
Example 3. Eliminate Unary Productions S0 ⟶ S S ⟶ CSC | B | CS | SC C ⟶ 00 B ⟶ 01B | 1
Example 3. Eliminate Unary Productions S0 ⟶ S S ⟶ CSC | B | CS | SC C ⟶ 00 B ⟶ 01B | 1
Example 3. Eliminate Unary Productions S0 ⟶ S S ⟶ CSC| 01B | 1 | CS | SC C ⟶ 00 B ⟶ 01B | 1
Example 3. Eliminate Unary Productions S0 ⟶ S S ⟶ CSC| 01B | 1 | CS | SC C ⟶ 00 B ⟶ 01B | 1
Example 3. Eliminate Unary Productions S0 ⟶ CSC | 01B | 1 | CS|SC S ⟶ CSC| 01B | 1 | CS | SC C ⟶ 00 B ⟶ 01B | 1
Example 4. Create Vt for every terminal t S0 ⟶ CSC | 01B | 1 | CS | SC S ⟶ CSC| 01B | 1 | CS | SC C ⟶ 00 B ⟶ 01B | 1 Z ⟶ 0
Example 4. Create Vt for every terminal t S0 ⟶ CSC | Z1B | 1 | CS | SC S ⟶ CSC| Z1B | 1 | CS | SC C ⟶ZZ B ⟶Z1B | 1 Z ⟶ 0
Example 4. Create Vt for every terminal t S0 ⟶ CSC | Z1B | 1 | CS | SC S ⟶ CSC | Z1B | 1 | CS | SC C ⟶ ZZ B ⟶ Z1B | 1 Z ⟶ 0 A ⟶ 1
Example 4. Create Vt for every terminal t S0 ⟶ CSC | ZAB | 1 | CS | SC S ⟶ CSC| ZAB | 1 | CS | SC C ⟶ ZZ B ⟶ZAB | 1 Z ⟶ 0 A ⟶ 1
Example 5. Take care of long rules S0 ⟶ CSC | ZAB | 1 | CS | SC S ⟶ CSC | ZAB | 1 | CS | SC C ⟶ ZZ B ⟶ ZAB | 1 Z ⟶ 0 A ⟶ 1 D ⟶ CS
Example 5. Take care of long rules S0 ⟶ DC | ZAB | 1 | CS | SC S ⟶ DC | ZAB | 1 | CS | SC C ⟶ ZZ B ⟶ ZAB | 1 Z ⟶ 0 A ⟶ 1 D ⟶ CS
Example 5. Take care of long rules S0 ⟶ DC | ZAB | 1 | CS | SC S ⟶ DC | ZAB | 1 | CS | SC C ⟶ ZZ B ⟶ ZAB | 1 Z ⟶ 0 A ⟶ 1 D ⟶ CS E ⟶ ZA
Example 5. Take care of long rules S0 ⟶ DC | EB | 1 | CS | SC S ⟶ DC | EB | 1 | CS | SC C ⟶ ZZ B ⟶EB | 1 Z ⟶ 0 A ⟶ 1 D ⟶ CS E ⟶ ZA
CYK Introduction • Problem: Given a context free grammar and a string s is it possible to decide whether s can be generated by the grammar or not? • If the grammar is not in a very special form this is not so efficient. • If the grammar is in Chomsky Normal Form, we have an elegant algorithm for testing this, the CYK algorithm.
The CYK algorithm • Suppose that we are given a grammar in Chomsky Normal form S → AB A → BB | 0 B → AA |1 • We would like to see if 10110 is generated by this grammar or not.
Substrings of length 1 • Since the only way to produce terminals is by following the rules A → a, just replace every terminal with the variables that produce it. 1 0 1 1 0 B A B B A
Substrings of length 2 Suppose now that we want to see how every substring of length 2 can be generated. This is equivalent with finding ways to produce all the length 2 substrings where terminals are replaced with the variables that represent them. But since every rule is of the form A → BC, it suffices to replace every two consecutive variables with the variables that produce them. 1 0 1 1 0 B A B B A - S A -
Substrings of length 3 • To produce the substring 101 (in 10110) we can either take 1 with 01 or 10 with 1. Here BS cannot be produced by any variable. 10 1 1 0 B A B B A - S A - -
Substrings of length 3 • To produce the substring 101 (in 10110) we can either take 1 with 01 or 10 with 1. Here we don’t have a pair since 10 cannot be produced. 1 01 1 0 B A BB A - S A - -
Substrings of length 3 • To produce the substring 011 (in 10110) we can either take 0 with 11 or 01 with 1. Here AA can be produced by B. 101 1 0 B A B B A - S A - - B
Substrings of length 3 • To produce the substring 011 (in 10110) we can either take 0 with 11 or 01 with 1. Here SB cannot be produced by any variable 1 0 11 0 B A B B A - S A - - B
Substrings of length 3 • To produce the substring 110 (in 10110) we can either take 1 with 10 or 11 with 0. Here we don’t have a pair since 10 cannot be produced by a variable. 1011 0 B A BB A - S A - - B -
Substrings of length 3 • To produce the substring 110 (in 10110) we can either take 1 with 10 or 11 with 0. Here AA can be produced by B 10 110 B A B BA - S A - - B B
Substrings of length 4 • To produce the substring 1011 (in 10110) we can take 1 with 011 or 10 with 11, or 101 with 1. Here BB can be produced by A. 10 1 1 0 B A B B A - S A - - BB A
Substrings of length 4 • To produce the substring 1011 (in 10110) we can take 1 with 011 or 10 with 11, or 101 with 1. Here we don’t have a pair since 10 cannot be produced. 1 011 0 B A B B A - S A - - B B A
Substrings of length 4 • To produce the substring 1011 (in 10110) we can take 1 with 011 or 10 with 11, or 101 with 1. Here we don’t have a pair since 101 cannot be produced. 1 0 11 0 B A B B A - S A - - B B A
Substrings of length 4 • To produce the substring 0110 (in 10110) we can take 0 with 110 or 01 with 10, or 011 with 0. Here AB can be produced by S. 101 1 0 B A B B A - S A - - B B A S
Substrings of length 4 • To produce the substring 0110 (in 10110) we can take 0 with 110 or 01 with 10, or 011 with 0. Here we don’t have a pair since 10 cannot be produced. 1 0110 B A B B A - S A - - B B A S
Substrings of length 4 • To produce the substring 0110 (in 10110) we can take 0 with 110 or 01 with 10, or 011 with 0. Here BA cannot be produced by any variable. 1 0 1 10 B A B BA - S A - - BB A S
Combine previous solutions • In order now to produce the whole string 10110 we can take 1 with 0110 or 10 with 110 or 101 with 10, or 1011 with 0. Here, BS cannot be produced. 1 0 1 1 0 B A B B A - S A - - B B A S -