220 likes | 307 Views
Fall 2009. The Chinese University of Hong Kong. CSC 3130: Automata theory and formal languages. Normal forms and parsing. Andrej Bogdanov http://www.cse.cuhk.edu.hk/~andrejb/csc3130. Testing membership and parsing. Given a grammar How can we know if a string x is in its language?
E N D
Fall 2009 The Chinese University of Hong Kong CSC 3130: Automata theory and formal languages Normal forms and parsing Andrej Bogdanov http://www.cse.cuhk.edu.hk/~andrejb/csc3130
Testing membership and parsing • Given a grammar • How can we know if a string x is in its language? • If so, can we obtain a parse tree for x? • Can we tell if the parse tree is unique? S → 0S1 | 1S0S1 | T T → S | e
First attempt • Maybe we can try all possible derivations: S → 0S1 | 1S0S1 | T T → S | x = 00111 S 0S1 00S11 01S0S11 0T1 when do we stop? 1S0S1 10S10S1 ... T S
Problems • How do we know when to stop? S → 0S1 | 1S0S1 | T T → S | x = 00111 S 0S1 00S11 01S0S11 when do we stop? 0T1 1S0S1 10S10S1 ...
Problems • Idea: Stop derivation when length exceeds |x| • Not right because of -productions • We might want to eliminate -productions too S → 0S1 | 1S0S1 | T T → S | x = 01011 S 0S1 01S0S11 01S011 01011 1 3 7 6 5
Problems • Loops among the variables (S→T→S) might make us go forever • We want to eliminate such loops S → 0S1 | 1S0S1 | T T → S | x = 00111
Removal of -productions • A variable N is nullable if there is a derivation • How to remove -productions (except from S) * N • Find all nullable variables N1, ..., Nk • For every production of the form A → Ni, • add another production A → • If Ni → is a production, remove it • If S is nullable, add the special productionS →
Example • Find the nullable variables grammar nullable variables B C D S ACD A a B C ED | D BC | b E b • Find all nullable variables N1, ..., Nk
Finding nullable variables • To find nullable variables, we work backwards • First, mark all variables A s.t. A as nullable • Then, as long as there are productions of the formwhere all of A1,…, Ak are marked as nullable, mark A as nullable A → A1… Ak
Eliminating e-productions D C S AD D B D e S AC S A C E S ACD A a B C ED | D BC | b E b nullable variables:B, C, D • For every production of the form A → Ni, • add another production A → • If Ni → is a production, remove it
Dealing with loops • A unit production is a production of the formwhere A1 and A2 are both variables • Example A1 → A2 grammar: unit productions: S → 0S1 | 1S0S1 | T T → S | R | R → 0SR S T R
Removal of unit productions • If there is a cycle of unit productionsdelete it and replace everything with A1 • Example A1 → A2 → ... → Ak→ A1 S T S → 0S1 | 1S0S1 | T T → S | R | R → 0SR S → 0S1 | 1S0S1 S → R | R → 0SR R T is replaced by S in the {S, T} cycle
Removal of unit productions • For other unit productions, replace every chainby productions A1 → ,... , Ak→ • Example A1 → A2 → ... → Ak→ S → 0S1 | 1S0S1 | R | R → 0SR S → 0S1 | 1S0S1 | 0SR | R → 0SR S → R → 0SR is replaced by S → 0SR, R → 0SR
Recap • After eliminating e-productions and unit productions, we know that every derivationdoesn’t shrink in length and doesn’t go into cycles • Exception: S → • We will not use this rule at all, except to check if e L • Note • e-productions must be eliminated before unit productions * S a1…ak where a1, …, ak are terminals
eliminate unit, e-prod Example: testing membership S → | 01 | 101 | 0S1 |10S1 | 1S01 | 1S0S1 S → 0S1 | 1S0S1 | T T → S | x = 00111 01, 101 S 0S1 0011, 01011 00S11 strings of length ≥ 6 only strings of length ≥ 6 10011, strings of length ≥ 6 10S1 10101, strings of length ≥ 6 1S01 only strings of length ≥ 6 1S0S1
Algorithm 1 for testing membership • How to check if a string x ≠ e is in L(G) • Eliminate all e-productions and unit productions • Let X := S • While some new rule R can be applied to X • Apply R to X • If X = x, you have found a derivation for x • If |X| > |x|, backtrack • If no more rules can be applied to X, x is not in L
Practical limitations of Algorithm I • This method can be very slow if x is long • There is a faster algorithm, but it requires that we do some more transformations on the grammar G = CFG of the java programming language x = code for a 200-line java program algorithm might take about 10200 steps!
Chomsky Normal Form • A grammar is in Chomsky Normal Form if every production (except possibly S → e)is of the type • Conversion to Chomsky Normal Form is easy: A → a A → BC or A → BcDE A → BX1 X1→ CX2 X2→ DE A → BCDE C → c break up sequences with new variables replace terminals with new variables C → c
Exercise • Convert this CFG into Chomsky Normal Form: S |ADDA A a C c D bCb
Algorithm 2 for testing membership SAC S AB | BC A BA | a B CC | b C AB | a – SAC – B B SA B SC SA B AC AC B AC x = baaba b a a b a Idea: We generate each substring of x bottom up
SAC – SAC – B B SA B SC SA B AC AC B AC b a a b a Parse tree reconstruction S AB | BC A BA | a B CC | b C AB | a x = baaba Tracing back the derivations, we obtain the parse tree
Cocke-Younger-Kasami algorithm table cells Input: Grammar G in CNF, string x = x1…xk 1k … … • For cells in last rowIf there is a production A xiPutA in table cell ii • For cells st in other rows If there is a production A BC whereB is in cell sj and C is in cell jtPutA in cell st 23 12 22 kk 11 x1 x2 … xk s j t k 1 Cell ij remembers all possible derivations of substring xi…xj