1 / 29

CSC 3130: Automata theory and formal languages

Fall 2009. The Chinese University of Hong Kong. CSC 3130: Automata theory and formal languages. Parsers for programming languages. Andrej Bogdanov http://www.cse.cuhk.edu.hk/~andrejb/csc3130. CFG of the java programming language. Identifier: IDENTIFIER QualifiedIdentifier:

cybill
Download Presentation

CSC 3130: Automata theory and formal languages

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fall 2009 The Chinese University of Hong Kong CSC 3130: Automata theory and formal languages Parsers for programming languages Andrej Bogdanov http://www.cse.cuhk.edu.hk/~andrejb/csc3130

  2. CFG of the java programming language Identifier: IDENTIFIER QualifiedIdentifier: Identifier { . Identifier } Literal: IntegerLiteral FloatingPointLiteral CharacterLiteral StringLiteral BooleanLiteral NullLiteral Expression: Expression1 [AssignmentOperator Expression1]] AssignmentOperator: = += -= *= /= &= |= … from http://java.sun.com/docs/books/jls /second_edition/html/syntax.doc.html#52996

  3. Parsing java programs class Point2d { /* The X and Y coordinates of the point--instance variables */ private double x; private double y; private boolean debug; // A trick to help with debugging public Point2d (double px, double py) { // Constructor x = px; y = py; debug = false; // turn off debugging } public Point2d () { // Default constructor this (0.0, 0.0); // Invokes 2 parameter Point2D constructor } // Note that a this() invocation must be the BEGINNING of // statement body of constructor public Point2d (Point2d pt) { // Another consructor x = pt.getX(); y = pt.getY(); } } … Simple java program: about 1000 symbols

  4. Parsing algorithms • How long would it take to parse this? • Can we parse faster? • No! CYK is the fastest known general-purposeparsing algorithm exhaustive algorithm about 1080 years (longer than life of universe) CYK algorithm about 1 week!

  5. Another way of thinking Scientist: Find an algorithm thatcan parse strings inany grammar Engineer: Design your grammar so it has a very fastparsing algorithm

  6. An example Stack Input S  Tc(1) T  TA(2) | A(3) A  aTb(4) | ab(5) Action  a ab A T Ta Taa Taab TaA TaT TaTb TA T Tc S abaabbc baabbc aabbc aabbc aabbc abbc bbc bc bc bc c c c   shift shift reduce (5) reduce (3) shift shift shift reduce (5) reduce (3) shift reduce (4) reduce (2) shift reduce (1) input: abaabbc S T A T T A A a a b b c a b

  7. Items A  aTb(4) S  Tc(1) T  TA(2) T  A(3) A  ab(5) A  •aTb A  a•Tb A  aT•b A  aTb• A  •ab A  a•b A  ab• T  •A T  A• S  •Tc S T•c S  Tc• T  •TA T  T•A T  TA• Stack Input Action  a ab A T Ta abaabbc baabbc aabbc aabbc aabbc abbc shift shift reduce (5) reduce (3) shift shift • • • • • • Idea of parsing algorithm: Try to match complete items to top of stack

  8. Some terminology Stack Input S  Tc(1) T  TA(2) | A(3) A  aTb(4) | ab(5) Action  a ab A T Ta Taa Taab TaA TaT TaTb TA T Tc S abaabbc baabbc aabbc aabbc aabbc abbc bbc bc bc bc c c c   shift shift reduce (5) reduce (3) shift shift shift reduce (5) reduce (3) shift reduce (4) reduce (2) shift reduce (1) input: abaabbc handle valid items: a•Tb, a•b valid items: T•a, T•c, aT•b

  9. Outline of LR(0) parsing algorithm • As the string is being read, it is pushed on a stack • Algorithm keeps track of all valid items • Algorithm can perform two actions: no complete itemis valid there is one valid item,and it is complete reduce shift

  10. Running the algorithm Valid Items Input A Stack  a aa aab aA aAb A aabb abb bb b b   A  •aAb A  •ab A  a•Ab A  a•b A  •aAb A  •ab A  a•Ab A  a•b A  •aAb A  •ab A  ab• A  aA•b A  aAb• S S S R S R A  aAb | ab A  aAb  aabb

  11. How to update valid items • Initial set of valid items • Updating valid items on “shift b” • After these updates, for every valid item A  a•Cb andproductionC  •d, we also addas a valid item S  •a for every production S  a is updated to A  ab•b A  a•bb disappears if X ≠ b A  a•Xb a, b: terminals A, B: variables X, Y: mixed symbols a, b: mixed strings notation C  •d

  12. How to update valid items • Updating valid items on “reduce b to B” • First, we backtrack to valid items before reduce • Then, we apply same rules as for “shift B” (as if B were a terminal) is updated to A  aB•b A  a•Bb disappears if X ≠ B A  a•Xb C  •d is added for every valid item A  a•Cband productionC  •d

  13. Viable item updates by NFA • States of NFA will be items (plus a start state q0) • For every item S  •a we have a transition • For every item A  •X we have a transition • For every item A  a•Cb and production C  •d e q0 S  •a X A  •X A  X• e A  •C C  •d

  14. Example A  aAb | ab a A A  aA•b A  a•Ab A  •aAb   b q0  A  aAb•  A  ab• A  •ab A  a•b a b

  15. Convert NFA to DFA a 2 A  a•Ab A  a•b A  •aAb A  •ab 1 4 A a A  •aAb A •ab A  aA•b b b 5 3 A  ab• A  aAb• die states correspond to sets of valid items transitions are labeled by variables / terminals

  16. Shift states and reduce states a 2 A  a•Ab A  a•b A  •aAb A  •ab 1 4 A a A  •aAb A •ab A  aA•b b b 5 3 A  ab• A  aAb• are shift states 1 2 4 are reduce states 3 5

  17. Attempt at parsing with DFA DFA state Input A Stack  a aa aab aA aabb abb bb b b A  •aAb A  •ab A  a•Ab A  a•b A  •aAb A  •ab A  a•Ab A  a•b A  •aAb A  •ab A  ab• A  aA•b 1 2 2 3 ? S S S R A  aAb | ab A  aAb  aabb

  18. Remember the state in stack! DFA state Input A Stack 1 1a2 1a2a2 1a2a2b3 1a2A4 1a2A4b5 1A aabb abb bb b b   A  •aAb A  •ab A  a•Ab A  a•b A  •aAb A  •ab A  a•Ab A  a•b A  •aAb A  •ab A  ab• A  aA•b A  aAb• 1 2 2 3 4 5 S S S R S R A  aAb | ab A  aAb  aabb

  19. Reconstructing the parse tree DFA state Input A Stack 1 12 122 1223 124 1245 1 aabb abb bb b b   A  •aAb A  •ab A  a•Ab A  a•b A  •aAb A  •ab A  a•Ab A  a•b A  •aAb A  •ab A  ab• A  aA•b A  aAb• 1 2 2 3 4 5 S S S R S R A A a a b b • • • • • A  aAb | ab A  aAb  aabb

  20. LR(0) grammars and deterministic PDAs • The parsing procedure can be implemented by adeterministic pushdown automaton • A PDA is deterministic if in every state there is atmost one possible transition • for every input symbol and pop symbol, including e • Example: PDA for w#wR is deterministic, but PDA forwwR is not

  21. LR(0) grammars and deterministic PDAs • Not every PDA can be made deterministic • Since PDAs are equivalent to CFGs, LR(0) parsing algorithm must fail for some CFG, e.g. • Why does LR(0) parsing algorithm fail? L = {wwR : w ∈ {a, b}*}

  22. Example 1 L = {wwR : w ∈ {a, b}*} A  aAa | bAb | e a A a e A  •aAa A  a•Aa A  aA•a A  aAa• e e e e q0 A  • e e e b A b e A  •bAb A  b•Ab A  bA•b A  bAb•

  23. Example 1 L = {wwR : w ∈ {a, b}*} A  aAa | bAb | e a, b a A  aAa• A  a•Aa A  b•Ab A  •aAa A  aA•a a, b A A  •bAb A  •aAa A  bA•b A  • A  •bAb A  • b A  bAb• shift-reduce conflict

  24. Example 1 L = {wwR : w ∈ {a, b}*} A  aAa | bAb | e a, b a A  aAa• A  a•Aa A  b•Ab A  •aAa A  aA•a a, b A A  •bAb A  •aAa A  bA•b A  • A  •bAb A  • b A  bAb• shift or reduce? shift or reduce? input: abba

  25. When you can’t LR(0) parse • Algorithm can perform two actions: • What if: no complete itemis valid there is one valid item,and it is complete reduce (R) shift (S) some valid itemscomplete, some not more than one validcomplete item R / R conflict S / R conflict

  26. Example 2 L = {w#wR : w ∈ {a, b}*} A  aAa | bAb | # a A a e A  •aAa A  a•Aa A  aA•a A  aAa• e e e e # q0 A  •# A  #• e e e b A b e A  •bAb A  b•Ab A  bA•b A  bAb•

  27. Example 2 L = {wwR : w ∈ {a, b}*} A  aAa | bAb | e a, b 4 2 a A  aAa• 1 A  a•Aa 3 A  b•Ab A  •aAa A  aA•a a, b A A  •bAb A  •aAa A  bA•b A  •# A  •bAb 5 A  •# b A  bAb• # # 6 A  #• No S/R or R/R conflicts!

  28. Example 2: parsing State A Stack 1 12 122 1226 1223 12234 123 1236 1 1 2 2 6 3 4 3 5 S S S R S R S R A A A # b a a b • • • • • •

  29. Hierarchy of context-free grammars context-free grammars parse using CYK algorithm (slow) LR(∞) grammars … to be continued… java perl python … LR(1) grammars LR(0) grammars parse using LR(0) algorithm

More Related