1 / 22

CSC 3130: Automata theory and formal languages

Fall 2008. The Chinese University of Hong Kong. CSC 3130: Automata theory and formal languages. Normal forms and parsing. Andrej Bogdanov http://www.cse.cuhk.edu.hk/~andrejb/csc3130. Testing membership and parsing. Given a grammar How can we know if a string x is in its language?

rigg
Download Presentation

CSC 3130: Automata theory and formal languages

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fall 2008 The Chinese University of Hong Kong CSC 3130: Automata theory and formal languages Normal forms and parsing Andrej Bogdanov http://www.cse.cuhk.edu.hk/~andrejb/csc3130

  2. Testing membership and parsing • Given a grammar • How can we know if a string x is in its language? • If so, can we reconstruct a parse tree for x? S → 0S1 | 1S0S1 | T T → S | e

  3. First attempt • Maybe we can try all possible derivations: S → 0S1 | 1S0S1 | T T → S |  x = 00111 S 0S1 00S11 01S0S11 0T1 when do we stop? 1S0S1 10S10S1 ... T S 

  4. Problems • How do we know when to stop? S → 0S1 | 1S0S1 | T T → S |  x = 00111 S 0S1 00S11 01S0S11 when do we stop? 0T1 1S0S1 10S10S1 ...

  5. Problems • Idea: Stop derivation when length exceeds |x| • Not right because of -productions • We might want to eliminate -productions too S → 0S1 | 1S0S1 | T T → S |  x = 01011 S  0S1  01S0S11  01S011  01011 1 3 7 6 5

  6. Problems • Loops among the variables (S→T→S) might make us go forever • We might want to eliminate such loops S → 0S1 | 1S0S1 | T T → S |  x = 00111

  7. Unit productions • A unit production is a production of the formwhere A1 and A2 are both variables • Example A1 → A2 grammar: unit productions: S → 0S1 | 1S0S1 | T T → S | R |  R → 0SR S T R

  8. Removal of unit productions • If there is a cycle of unit productionsdelete it and replace everything with A1 • Example A1 → A2 → ... → Ak→ A1 S T  S → 0S1 | 1S0S1 | T T → S | R |  R → 0SR S → 0S1 | 1S0S1 S → R |  R → 0SR  R T is replaced by S in the {S, T} cycle

  9. Removal of unit productions • For other unit productions, replace every chainby productions A1 → ,... , Ak→  • Example A1 → A2 → ... → Ak→  S → 0S1 | 1S0S1 | R |  R → 0SR S → 0S1 | 1S0S1 | 0SR |  R → 0SR S → R → 0SR is replaced by S → 0SR, R → 0SR

  10. Removal of -productions • A variable N is nullable if there is a derivation • How to remove -productions (except from S) * N • Find all nullable variables N1, ..., Nk • For i = 1 to k • For every production of the form A → Ni, • add another production A →  • If Ni →  is a production, remove it • If S is nullable, add the special productionS →    

  11. Example • Find the nullable variables grammar nullable variables B C D S  ACD A a B   C  ED |  D  BC | b E  b • Find all nullable variables N1, ..., Nk 

  12. Finding nullable variables • To find nullable variables, we work backwards • First, mark all variables A s.t. A   as nullable • Then, as long as there are productions of the formwhere all of A1,…, Ak are marked as nullable, mark A as nullable A → A1… Ak

  13. Eliminating e-productions D  C S  AD D  B D  e S  AC S  A C  E S  ACD A a B   C  ED |  D  BC | b E  b nullable variables:B, C, D  • For i = 1 to k • For every production of the form A → Ni, • add another production A →  • If Ni →  is a production, remove it

  14. Recap • After eliminating e-productions and unit productions, we know that every derivationdoesn’t shrink in length and doesn’t go into cycles • Exception: S → • We will not use this rule at all, except to check if e  L • Note • e-productions must be eliminated before unit productions * S  a1…ak where a1, …, ak are terminals

  15. eliminate unit, e-prod Example: testing membership S →  | 01 | 101 | 0S1 |10S1 | 1S01 | 1S0S1 S → 0S1 | 1S0S1 | T T → S |  x = 00111 01, 101 S 0S1 0011, 01011 00S11 strings of length ≥ 6 only strings of length ≥ 6 10011, strings of length ≥ 6 10S1 10101, strings of length ≥ 6 1S01 only strings of length ≥ 6 1S0S1

  16. Algorithm 1 for testing membership • We can now use the following algorithm to check if a string x is in the language of G • Eliminate all e-productions and unit productions • If x = e and S → , accept; else delete S →  • Let X := S • While some new production P can be applied to X • Apply P to X • If X = x, accept • If |X| > |x|, backtrack • If no more productions can be applied to X, reject     

  17. Practical limitations of Algorithm I • Previous algorithm can be very slow if x is long • There is a faster algorithm, but it requires that we do some more transformations on the grammar G = CFG of the java programming language x = code for a 200-line java program algorithm might take about 10200 steps!

  18. Chomsky Normal Form • A grammar is in Chomsky Normal Form if every production (except possibly S → e)is of the type • Conversion to Chomsky Normal Form is easy: A → a A → BC or A → BcDE A → BX1 X1→ CX2 X2→ DE A → BCDE C → c break up sequences with new variables replace terminals with new variables C → c

  19. Exercise • Convert this CFG into Chomsky Normal Form: S  |ADDA A  a C  c D  bCb

  20. Algorithm 2 for testing membership SAC S  AB | BC A  BA | a B  CC | b C  AB | a – SAC – B B SA B SC SA B AC AC B AC x = baaba b a a b a Idea: We generate each substring of x bottom up

  21. SAC – SAC – B B SA B SC SA B AC AC B AC b a a b a Parse tree reconstruction S  AB | BC A  BA | a B  CC | b C  AB | a x = baaba Tracing back the derivations, we obtain the parse tree

  22. Cocke-Younger-Kasami algorithm Input: Grammar G in CNF, string x = x1…xk table cells • For i = 1 to k If there is a production A  xiPut A in table cell ii • For b = 2 to k For s = 1 to k – b + 1 Set t = s + b For j = sto t If there is a production A  BC where B is in cell sj and C is in cell jtPut A in cell st 1k … … 23 12 22 kk 11 x1 x2 … xk s j t k 1 b Cell ij remembers all possible derivations of substring xi…xj

More Related