1 / 22

CSC 3130: Automata theory and formal languages

Fall 2009. The Chinese University of Hong Kong. CSC 3130: Automata theory and formal languages. Normal forms and parsing. Andrej Bogdanov http://www.cse.cuhk.edu.hk/~andrejb/csc3130. Testing membership and parsing. Given a grammar How can we know if a string x is in its language?

Download Presentation

CSC 3130: Automata theory and formal languages

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fall 2009 The Chinese University of Hong Kong CSC 3130: Automata theory and formal languages Normal forms and parsing Andrej Bogdanov http://www.cse.cuhk.edu.hk/~andrejb/csc3130

  2. Testing membership and parsing • Given a grammar • How can we know if a string x is in its language? • If so, can we obtain a parse tree for x? • Can we tell if the parse tree is unique? S → 0S1 | 1S0S1 | T T → S | e

  3. First attempt • Maybe we can try all possible derivations: S → 0S1 | 1S0S1 | T T → S |  x = 00111 S 0S1 00S11 01S0S11 0T1 when do we stop? 1S0S1 10S10S1 ... T S 

  4. Problems • How do we know when to stop? S → 0S1 | 1S0S1 | T T → S |  x = 00111 S 0S1 00S11 01S0S11 when do we stop? 0T1 1S0S1 10S10S1 ...

  5. Problems • Idea: Stop derivation when length exceeds |x| • Not right because of -productions • We might want to eliminate -productions too S → 0S1 | 1S0S1 | T T → S |  x = 01011 S  0S1  01S0S11  01S011  01011 1 3 7 6 5

  6. Problems • Loops among the variables (S→T→S) might make us go forever • We want to eliminate such loops S → 0S1 | 1S0S1 | T T → S |  x = 00111

  7. Removal of -productions • A variable N is nullable if there is a derivation • How to remove -productions (except from S) * N • Find all nullable variables N1, ..., Nk • For every production of the form A → Ni, • add another production A →  • If Ni →  is a production, remove it • If S is nullable, add the special productionS →    

  8. Example • Find the nullable variables grammar nullable variables B C D S  ACD A a B   C  ED |  D  BC | b E  b • Find all nullable variables N1, ..., Nk 

  9. Finding nullable variables • To find nullable variables, we work backwards • First, mark all variables A s.t. A   as nullable • Then, as long as there are productions of the formwhere all of A1,…, Ak are marked as nullable, mark A as nullable A → A1… Ak

  10. Eliminating e-productions D  C S  AD D  B D  e S  AC S  A C  E S  ACD A a B   C  ED |  D  BC | b E  b nullable variables:B, C, D  • For every production of the form A → Ni, • add another production A →  • If Ni →  is a production, remove it

  11. Dealing with loops • A unit production is a production of the formwhere A1 and A2 are both variables • Example A1 → A2 grammar: unit productions: S → 0S1 | 1S0S1 | T T → S | R |  R → 0SR S T R

  12. Removal of unit productions • If there is a cycle of unit productionsdelete it and replace everything with A1 • Example A1 → A2 → ... → Ak→ A1 S T  S → 0S1 | 1S0S1 | T T → S | R |  R → 0SR S → 0S1 | 1S0S1 S → R |  R → 0SR  R T is replaced by S in the {S, T} cycle

  13. Removal of unit productions • For other unit productions, replace every chainby productions A1 → ,... , Ak→  • Example A1 → A2 → ... → Ak→  S → 0S1 | 1S0S1 | R |  R → 0SR S → 0S1 | 1S0S1 | 0SR |  R → 0SR S → R → 0SR is replaced by S → 0SR, R → 0SR

  14. Recap • After eliminating e-productions and unit productions, we know that every derivationdoesn’t shrink in length and doesn’t go into cycles • Exception: S → • We will not use this rule at all, except to check if e  L • Note • e-productions must be eliminated before unit productions * S  a1…ak where a1, …, ak are terminals

  15. eliminate unit, e-prod Example: testing membership S →  | 01 | 101 | 0S1 |10S1 | 1S01 | 1S0S1 S → 0S1 | 1S0S1 | T T → S |  x = 00111 01, 101 S 0S1 0011, 01011 00S11 strings of length ≥ 6 only strings of length ≥ 6 10011, strings of length ≥ 6 10S1 10101, strings of length ≥ 6 1S01 only strings of length ≥ 6 1S0S1

  16. Algorithm 1 for testing membership • How to check if a string x ≠ e is in L(G)  • Eliminate all e-productions and unit productions • Let X := S • While some new rule R can be applied to X • Apply R to X • If X = x, you have found a derivation for x • If |X| > |x|, backtrack • If no more rules can be applied to X, x is not in L   

  17. Practical limitations of Algorithm I • This method can be very slow if x is long • There is a faster algorithm, but it requires that we do some more transformations on the grammar G = CFG of the java programming language x = code for a 200-line java program algorithm might take about 10200 steps!

  18. Chomsky Normal Form • A grammar is in Chomsky Normal Form if every production (except possibly S → e)is of the type • Conversion to Chomsky Normal Form is easy: A → a A → BC or A → BcDE A → BX1 X1→ CX2 X2→ DE A → BCDE C → c break up sequences with new variables replace terminals with new variables C → c

  19. Exercise • Convert this CFG into Chomsky Normal Form: S  |ADDA A  a C  c D  bCb

  20. Algorithm 2 for testing membership SAC S  AB | BC A  BA | a B  CC | b C  AB | a – SAC – B B SA B SC SA B AC AC B AC x = baaba b a a b a Idea: We generate each substring of x bottom up

  21. SAC – SAC – B B SA B SC SA B AC AC B AC b a a b a Parse tree reconstruction S  AB | BC A  BA | a B  CC | b C  AB | a x = baaba Tracing back the derivations, we obtain the parse tree

  22. Cocke-Younger-Kasami algorithm table cells Input: Grammar G in CNF, string x = x1…xk 1k … … • For cells in last rowIf there is a production A  xiPutA in table cell ii • For cells st in other rows If there is a production A  BC whereB is in cell sj and C is in cell jtPutA in cell st 23 12 22 kk 11 x1 x2 … xk s j t k 1 Cell ij remembers all possible derivations of substring xi…xj

More Related