1 / 25

Parsing

Parsing. Discrete Mathematics and Its Applications Baojian Hua bjhua@ustc.edu.cn. Derivations. A string is valid in a language if and only if there exists a derivation from the start state which produces it Begin with the start symbol, and apply grammar rules until you produce the string

danae
Download Presentation

Parsing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parsing Discrete Mathematics and Its Applications Baojian Hua bjhua@ustc.edu.cn

  2. Derivations • A string is valid in a language if and only if there exists a derivation from the start state which produces it • Begin with the start symbol, and apply grammar rules until you produce the string • Note that the final string (sentence) consists of only terminals

  3. Question • Given a formal grammar G and a sentence (program) p, is p derivable from grammar G ? • Or equivalently, is a given program p valid according to some language’s syntax (say C)?

  4. Example: Context-Free Grammar S ::= x A | y B A ::= u C | v C B ::= t C ::= w | z // derivable? xum

  5. Example: Context-Free Grammar S ::= x A | y B A ::= u C | v C B ::= t C ::= w | z // derivable? xum xuwz

  6. Example: Context-Free Grammar S ::= x A | y B A ::= u C | v C B ::= t C ::= w | z // derivable? xum xuwz xwu

  7. Example: Context-Free Grammar S ::= x A | y B A ::= u C | v C B ::= t C ::= w | z // derivable? xum xuwz xwu xuz

  8. Lexical Analyzer • The lexical analyzer translates the source program into a stream of lexical tokens • Source program: • stream of (ASCII or Unicode) characters • Lexical token: • compiler data structure that represents the occurrence of a terminal symbol • Valid sentence consists of only allowable terminals

  9. Example: Context-Free Grammar S ::= x A | y B A ::= u C | v C B ::= t C ::= w | z // all terminals T={x, y, u, v, t, w, z}

  10. Example: Context-Free Grammar S ::= x A | y B A ::= u C | v C B ::= t C ::= w | z // all terminals T={x, y, u, v, t, w, z} // allowable strings T*

  11. Predictive Parsing • Parsing: recognizing a string and do something useful • The most naïve approach to use when implementing a parser is to use recursive descent • A form of top-downparsing • Not as powerful as other methods, but easy enough to implement by hand

  12. Predictive Parsing S ::= x A | y B A ::= u C | v C B ::= t C ::= w | z // Valid? xum xuwz xwu xuz

  13. A Predictive Parser in C (Sketch) tokenTy token; void parseS () { switch (token.kind) { case x: token = nextToken (); parseA (); break; case y: token = nextToken (); parseB (); break; default: error (…); } } // other functions are similar

  14. Output:Abstract Syntax Tree xuz S x A u C z

  15. A Predictive Parser Emitting AST in C (Sketch) tokenTy token; S parseS () { switch (token.kind) { case x: token = nextToken (); a=parseA (); return newS1 (x, a); case y: token = nextToken (); b=parseB (); return newS2 (y, b); default: error (…); } } // other functions are similar

  16. Predictive Parsing Difficulties S ::= x A | x B A ::= u C | v C B ::= t C ::= w | z // derivable? xuz

  17. Or Even Worse 15*(3+4) E By 4 => E * E By 5 => E * (E + E) By 2 => E * (E + 4) By 2 => E * (3 + 4) By 2 => 15 * (3 + 4) 1 E ::= id 2 | num 3 | E + E 4 | E * E 5 | ( E )

  18. Or Even Worse 15*(3+4) E E * E E * (E + E) E * (E + 4) E * (3 + 4) 15 * (3 + 4) E E * E 15 * E 15 * (E + E) 15 * (3 + E) 15 * (3 + 4) rightmost derivation leftmost derivation

  19. E E E + E E * E 15 E E 15 * E + E 15 3 3 4 Ambiguous grammars • A grammar is ambiguous if there is a sentence with >1 parse tree 15 * 3 + 4

  20. Eliminating ambiguity • In programming language syntax, ambiguity often arises from missing operator precedence or associativity • * higher precedence than +? • * and + are left associative? • Can sometimes rewrite the grammar to disambiguate this • Beyond the scope of this course

  21. Unambiguous Grammar E ::= E + T | T T ::= T * F | F F ::= id | num | ( E ) E ::= id | num | E + E | E * E | ( E ) Accepts the same language, but parses unambiguously

  22. Limitations with Predictive Parsing • Rewriting grammar: • to resolve ambiguity • Grammars/trees are ugly • But…easy to write code by hand, and very good for error reporting

  23. Doing better • We can do better • We can use a parsing algorithm that can handle all context-free languages • (though not all context-free grammars) • Remember: a context-free language might have many different context-free grammars

  24. The Yacc Tool semantic analyzer specification parser Yacc Originally developed for C, and now almost every main-stream language has its own Yacc-tool: bison (C), ml-yacc (SML), Cup (Java), GPPG (C#), …

  25. Whole Structure lexical analyzer source code tokens abstract syntax tree parser other part Pentium

More Related