250 likes | 376 Views
Parsing. Discrete Mathematics and Its Applications Baojian Hua bjhua@ustc.edu.cn. Derivations. A string is valid in a language if and only if there exists a derivation from the start state which produces it Begin with the start symbol, and apply grammar rules until you produce the string
E N D
Parsing Discrete Mathematics and Its Applications Baojian Hua bjhua@ustc.edu.cn
Derivations • A string is valid in a language if and only if there exists a derivation from the start state which produces it • Begin with the start symbol, and apply grammar rules until you produce the string • Note that the final string (sentence) consists of only terminals
Question • Given a formal grammar G and a sentence (program) p, is p derivable from grammar G ? • Or equivalently, is a given program p valid according to some language’s syntax (say C)?
Example: Context-Free Grammar S ::= x A | y B A ::= u C | v C B ::= t C ::= w | z // derivable? xum
Example: Context-Free Grammar S ::= x A | y B A ::= u C | v C B ::= t C ::= w | z // derivable? xum xuwz
Example: Context-Free Grammar S ::= x A | y B A ::= u C | v C B ::= t C ::= w | z // derivable? xum xuwz xwu
Example: Context-Free Grammar S ::= x A | y B A ::= u C | v C B ::= t C ::= w | z // derivable? xum xuwz xwu xuz
Lexical Analyzer • The lexical analyzer translates the source program into a stream of lexical tokens • Source program: • stream of (ASCII or Unicode) characters • Lexical token: • compiler data structure that represents the occurrence of a terminal symbol • Valid sentence consists of only allowable terminals
Example: Context-Free Grammar S ::= x A | y B A ::= u C | v C B ::= t C ::= w | z // all terminals T={x, y, u, v, t, w, z}
Example: Context-Free Grammar S ::= x A | y B A ::= u C | v C B ::= t C ::= w | z // all terminals T={x, y, u, v, t, w, z} // allowable strings T*
Predictive Parsing • Parsing: recognizing a string and do something useful • The most naïve approach to use when implementing a parser is to use recursive descent • A form of top-downparsing • Not as powerful as other methods, but easy enough to implement by hand
Predictive Parsing S ::= x A | y B A ::= u C | v C B ::= t C ::= w | z // Valid? xum xuwz xwu xuz
A Predictive Parser in C (Sketch) tokenTy token; void parseS () { switch (token.kind) { case x: token = nextToken (); parseA (); break; case y: token = nextToken (); parseB (); break; default: error (…); } } // other functions are similar
Output:Abstract Syntax Tree xuz S x A u C z
A Predictive Parser Emitting AST in C (Sketch) tokenTy token; S parseS () { switch (token.kind) { case x: token = nextToken (); a=parseA (); return newS1 (x, a); case y: token = nextToken (); b=parseB (); return newS2 (y, b); default: error (…); } } // other functions are similar
Predictive Parsing Difficulties S ::= x A | x B A ::= u C | v C B ::= t C ::= w | z // derivable? xuz
Or Even Worse 15*(3+4) E By 4 => E * E By 5 => E * (E + E) By 2 => E * (E + 4) By 2 => E * (3 + 4) By 2 => 15 * (3 + 4) 1 E ::= id 2 | num 3 | E + E 4 | E * E 5 | ( E )
Or Even Worse 15*(3+4) E E * E E * (E + E) E * (E + 4) E * (3 + 4) 15 * (3 + 4) E E * E 15 * E 15 * (E + E) 15 * (3 + E) 15 * (3 + 4) rightmost derivation leftmost derivation
E E E + E E * E 15 E E 15 * E + E 15 3 3 4 Ambiguous grammars • A grammar is ambiguous if there is a sentence with >1 parse tree 15 * 3 + 4
Eliminating ambiguity • In programming language syntax, ambiguity often arises from missing operator precedence or associativity • * higher precedence than +? • * and + are left associative? • Can sometimes rewrite the grammar to disambiguate this • Beyond the scope of this course
Unambiguous Grammar E ::= E + T | T T ::= T * F | F F ::= id | num | ( E ) E ::= id | num | E + E | E * E | ( E ) Accepts the same language, but parses unambiguously
Limitations with Predictive Parsing • Rewriting grammar: • to resolve ambiguity • Grammars/trees are ugly • But…easy to write code by hand, and very good for error reporting
Doing better • We can do better • We can use a parsing algorithm that can handle all context-free languages • (though not all context-free grammars) • Remember: a context-free language might have many different context-free grammars
The Yacc Tool semantic analyzer specification parser Yacc Originally developed for C, and now almost every main-stream language has its own Yacc-tool: bison (C), ml-yacc (SML), Cup (Java), GPPG (C#), …
Whole Structure lexical analyzer source code tokens abstract syntax tree parser other part Pentium