1 / 27

CSCI 3130: Automata theory and formal languages

Fall 2011. The Chinese University of Hong Kong. CSCI 3130: Automata theory and formal languages. Ambiguity Parsing algorithms. Andrej Bogdanov http://www.cse.cuhk.edu.hk/~andrejb/csc3130. Ambiguity. E  E + E | E * E | ( E ) | N N  1 N | 2 N | 1 | 2. . E. 1+2*2. E. E. E.

chasea
Download Presentation

CSCI 3130: Automata theory and formal languages

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fall 2011 The Chinese University of Hong Kong CSCI 3130: Automata theory and formal languages AmbiguityParsing algorithms Andrej Bogdanov http://www.cse.cuhk.edu.hk/~andrejb/csc3130

  2. Ambiguity E  E + E | E * E | (E)| N N  1N | 2N | 1 | 2  E 1+2*2 E E E * E + E + E E N = 5 = 6 N E * E N N 2 N N 1 A CFG is ambiguous if some string has more than one parse tree 1 2 2 2

  3. Example Is S  SS | x ambiguous? Yes, because S S S S S S S S S S xxx x x x x x x

  4. Disambiguation S  SS | x S  Sx | x S S S x x x Sometimes we can rewrite the grammar to remove ambiguity

  5. Disambiguation E  E + E | E * E | (E)| N N  1N | 2N | 1 | 2 same precedence! F F T T Divide expression into terms and factors F F 2 * (1 + 2 * 2)

  6. Disambiguation E  E + E | E * E | (E)| N N  1N | 2N | 1 | 2 An expression is a sum of one or more terms E  T | E + T Each term is a product of one or more factors T  F | T * F Each factor is a parenthesized expression or a number F  (E) |1 | 2

  7. Parsing example E E  T | E + T T  F | T * F F  (E) |1 | 2 E T + T F F T * E ( ) F E + T E T T F + * T F F F 2 * (1 + 1 + 2 * 2) + 1

  8. Disambiguation • Disambiguation is not always possible because • There exist inherently ambiguous languages • There is no general procedure for disambiguation • In programming languages, ambiguity comes from precedence rules, and we can do like in example • In English, ambiguity is sometimes a problem: He ate the cookies on the floor

  9. Ambiguity in English He ate the cookies on the floor

  10. Parsing S → 0S1 | 1S0S | T T → S | e input: 0011 How would we program the computer to build a parse tree for us?

  11. Parsing S → 0S1 | 1S0S | T T → S | e input: 0011 ... S 0S1 00S11 000S111 ⇒ ⇒ ⇒ ⇒ 01S0S1 ⇒ ... ⇒ 0T1 00T11 ⇒ 00S11 ✔ ⇒ 0011 1S0S 10S10S ⇒ ⇒ ... ... T S ⇒ ⇒ ⇒  First idea: Try all derivations

  12. Problems Trying all derivations may take a very long time  If input is not in the language, parsing will never stop 

  13. When to stop Idea 2: Stop when S → 0S1 | 1S0S | T T → S | e |derived string| > |input| Problems: S  T  S  T  … S  0S1 0T1  01 Derived strings may shrink because of “e-productions” 1 3 3 Derivation may loop because of “unit productions” 2 Task: remove e and unit productions

  14. Removal of -productions • A variable N is nullable if it derives the empty string * N • Identify all nullable variables N  • Remove nullable variables carefully  • If start variable S is nullable: Add a new start variable S’Add special productionsS’ → S |  

  15. Example grammar nullable variables B C D S  ACD A a B   C  ED |  D  BC | b E  b • Repeat the following: • If X  , mark X as nullable • If X  YZ…W, all marked nullable, • mark X as nullable also. • Identify all nullable variables 

  16. Eliminating e-productions D  C S  AD D  B D  e S  AC S  A C  E S  ACD A a B   C  ED |  D  BC | b E  b • For every nullable N: • If you see X → N, add X →  nullable:B, C, D • If you see N → , remove it. • Remove nullable variables carefully 

  17. Eliminating unit productions • A unit production is a production of the form A → B grammar: unit productions graph: S → 0S1 | 1S0S | T T → S | R |  R → 0SR S T R

  18. Removal of unit productions • If there is a cycle of unit productionsdelete it and replace everything with A  A → B→ ... → C→ A S T  S → 0S1 | 1S0S | T T → S | R |  R → 0SR S → 0S1 | 1S0S S → R |  R → 0SR  R replace T by S

  19. Removal of unit productions • Replace every chainby A → , B→ ,... , C →   A → B→ ... → C→  S S → 0S1 | 1S0S | R |  R → 0SR S → 0S1 | 1S0S | 0SR |  R → 0SR R S → R → 0SR is replaced by S → 0SR, R → 0SR

  20. Recap If input is not in the language, parsing will never stop Problem: Solution: Eliminate  productions Eliminate unit productions important to do in this order Try all possible derivations but stop parsing when |derived string| > |input|

  21. Example S → 0S1 | 0S0S | T T → S | 0 S → 0S1 | 1S0S | 0 input:0011 conclusion: 0011∉ L S 0 ⇒ ✘ 001 0S1 ✘ ⇒ ⇒ too long 00S11 too long 00S0S1 0S0S 000S 0000 ⇒ ✘ ⇒ ⇒ too long too long 1000S1 00S10S too long too long 1000S0S 00S0S0S

  22. Problems Trying all derivations may take a very long time  If input is not in the language, parsing will never stop 

  23. Preparations • To use it we must prepare the CFG: A faster way to parse:the Cocke-Younger-Kasami algorithm Eliminate  productions Eliminate unit productions Convert CFG to Chomsky Normal Form

  24. Chomsky Normal Form • A CFG is in Chomsky Normal Formif every production* has the form A → a A → BC or Noam Chomsky Convert to Chomsky Normal Form: A → BcDE A → BX X → CY Y → DE A → BCDE C → c break up sequences with new variables replace terminals with new variables C → c * Exception: We allow S→ e for start variable only

  25. Cocke-Younger-Kasami algorithm SAC S  AB | BC A  BA | a B  CC | b C  AB | a – SAC – B B SA B SC SA B AC AC B AC x = baaba b a a b a Idea: We generate each substring of x bottom up

  26. SAC – SAC – B B SA B SC SA B AC AC B AC b a a b a Parse tree reconstruction S  AB | BC A  BA | a B  CC | b C  AB | a x = baaba Tracing back the derivations, we obtain the parse tree

  27. Cocke-Younger-Kasami algorithm Grammar without e and unit productions in Chomsky Normal Form table cells Input stringx = x1…xk 1k … … • For all cells in last rowIf there is a production A  xiPutA in table cell ii • For cells st in other rows If there is a production A  BC whereB is in cell sj and C is in cell (j+1)tPutA in cell st 23 12 22 kk 11 x1 x2 … xk s j t k 1 Cell ij remembers all possible derivations of substring xi…xj

More Related