320 likes | 522 Views
Parsing. Administration. Groups Forum https://forums.cs.tau.ac.il/viewforum.php?f=76. x86 executable. exe. IC Program. ic. IC compiler. Compiler. Lexical Analysis. Syntax Analysis Parsing. AST. Symbol Table etc. Inter. Rep. (IR). Code Generation. Parsing. Input:
E N D
Administration • Groups • Forum • https://forums.cs.tau.ac.il/viewforum.php?f=76
x86 executable exe ICProgram ic IC compiler Compiler LexicalAnalysis Syntax Analysis Parsing AST SymbolTableetc. Inter.Rep.(IR) CodeGeneration
Parsing Input: • Sequence of Tokens • A context free grammar • actions Output: • Abstract Syntax Tree • Decide whether program satisfies syntactic structure
Parsing • Context Free Grammars (CFG) • Captures program structure (hierarchy) • Automatically create “efficient” parsers Grammar:E id E num E E + EE E * EE ( E )
E E + E num(5) ( E ) + E * E num(7) id(x) num(5) * id(x) num(7) From text to abstract syntax 5 + (7 * x) program text Lexical Analyzer token stream Grammar:E id E num E E+EE E*EE ( E ) Parser parse tree valid syntaxerror Abstract syntax tree
E E + E num(5) ( E ) + E * E num(7) id(x) num(5) * id(x) num(7) From text to abstract syntax Note: a parse tree describes a run of the parser,an abstract syntax tree is the result of a successful run token stream Grammar:E id E num E E+EE E*EE ( E ) Parser parse tree valid syntaxerror Abstract syntax tree
Parsing terminology Symbols סימנים)):terminals (tokens)+ * ( )id numnon-terminals E Grammar rules :(חוקי דקדוק)E id E num E E+EE E*EE ( E ) Convention: the non-terminal appearing in the first derivation rule is defined to be the initial non-terminal Parse tree (עץ גזירה): Derivation (גזירה):EE + E1+ E1+ E * E1+2* E 1+2*3 E E + E 1 * E E 3 2
Ambiguity Grammar rules:E id E num E E+EE E*EE ( E ) Definition: a grammar is ambiguous(רב-משמעי) if there exists an input string that has two different derivations Rightmost derivation Leftmost derivation Parse tree: Parse tree: Derivation:EE + E1+ E1+ E * E1+2* E 1+2*3 Derivation:EE * EE *3E + E * 3E +2* 31 + 2* 3 E E E + E E * E 1 3 * + E E E E 3 2 2 1
Grammar rewriting Unambiguous grammar: E E + T E T T T * F T F F id F num F ( E ) Ambiguous grammar:E id E num E E + EE E * EE ( E ) Parse tree: Derivation:EE + T1 + T1 + T * F1 + F * F1 + 2 * F1 + 2 * 3 E E + T T Note the difference between a language and a grammar:A grammar represents a language.A language can be represented by many grammars. * F T F 3 F 1 2
Parsing methods – Top Down • Starts with the start symbol • Tries to transform it to the input if 5 then print 8 else… Token : rule Sif:S if E then S else Sif E then S else S5: E numif 5 then S else Sprint:print Eif 5 then print E else S … Grammar: S if E then S else S S begin S L S print E L end L ; S LE num
Parsing methods – Bottom Up • Starts with the input • Attempt to rewrite it to the start symbol • Widely used in practice • LR(0), SLR(1), LR(1), LALR(1) • JavaCup implements LALR(1)
Bottom Up – parsing 1 + (2) + (3) E E + (E) E i E + (2) + (3) E + (E) + (3) E + (3) E E + (E) E E E E E 1 + ( 2 ) + 3 ( )
Problems • Ambiguity E = E + E E = i 1 + 2 + 3 -> (1 + 2) + 3 ? 1 + 2 + 3 -> 1 + (2 + 3) ?
Cup • Constructor of Useful Parsers • Automatic LALR(1) parser generator • Input: cup spec file • Output: Syntax analyzer in Java tokens Parserspec .java Parser JavaCup javac AST
Expression calculator terminal Integer NUMBER; terminal PLUS, MINUS, MULT, DIV; terminal LPAREN, RPAREN; non terminal Integer expr; expr ::= expr PLUS expr | expr MINUS expr | expr MULT expr | expr DIV expr | MINUS expr | LPAREN expr RPAREN | NUMBER ;
+ * + + + + * + a a a a b b c c b b c c Ambiguities a * b + c a + b + c
Increasing precedence Expression calculator terminal Integer NUMBER; terminal PLUS,MINUS,MULT,DIV; terminal LPAREN, RPAREN; terminal UMINUS; non terminal Integer expr; precedence left PLUS, MINUS; precedence left DIV, MULT; precedence left UMINUS; expr ::= expr PLUS expr | expr MINUS expr | expr MULT expr | expr DIV expr | MINUS expr %prec UMINUS | LPAREN expr RPAREN | NUMBER ; Contextual precedence
+ + + + a a b c b c Resolving ambiguity precedence left PLUS a + b + c
* + + * a a b c b c Resolving ambiguity precedence left PLUSprecedence left MULT a * b + c
+ * * + a a b c b c Resolving ambiguity precedence left PLUSprecedence left MULT a + b * c
Resolving ambiguity precedence left PLUSprecedence left MULT * - * - b a b a - a * b
Resolving ambiguity terminal Integer NUMBER; terminal PLUS,MINUS,MULT,DIV; terminal LPAREN, RPAREN; terminal UMINUS; precedence left PLUS, MINUS; precedence left DIV, MULT; precedence left UMINUS; expr ::= expr PLUS expr | expr MINUS expr | expr MULT expr | expr DIV expr | MINUS expr %prec UMINUS | LPAREN expr RPAREN | NUMBER ; UMINUS never returnedby scanner(used only to define precedence) Rule has precedence of UMINUS
Disambiguation Each terminal assigned with precedence • By default all terminals have lowest precedence • User can assign his own precedence • CUP assigns each production a precedence • Precedence of last terminal in production • expr MINUS expr • User specified contextual precedence • MINUS expr%prec UMINUS
More CUP directives • precedence nonassoc NEQ • Non-associative operators: < > == != etc. • 1<2<3 identified as an error • 6 == 7 == 8 == 9 • start non-terminal • Specifies start non-terminal other than first non-terminal • Can change to test parts of grammar • Getting internal representation • Command line options: • -dump_grammar • -dump_states • -dump_tables • -dump
Generated from tokendeclarations in .cup file Scanner integration import java_cup.runtime.*; %% %cup %eofval{ return new Symbol(sym.EOF); %eofval} NUMBER=[0-9]+ %% <YYINITIAL>”+” { return new Symbol(sym.PLUS); } <YYINITIAL>”-” { return new Symbol(sym.MINUS); } <YYINITIAL>”*” { return new Symbol(sym.MULT); } <YYINITIAL>”/” { return new Symbol(sym.DIV); } <YYINITIAL>”(” { return new Symbol(sym.LPAREN); } <YYINITIAL>”)” { return new Symbol(sym.RPAREN); } <YYINITIAL>{NUMBER} { return new Symbol(sym.NUMBER, new Integer(yytext())); } <YYINITIAL>\n { } <YYINITIAL>. { } Parser gets terminals from the scanner
Assigning meaning • So far, only validation • Add Java code implementing semantic actions expr ::= expr PLUS expr | expr MINUS expr | expr MULT expr | expr DIV expr | MINUS expr %prec UMINUS | LPAREN expr RPAREN | NUMBER ;
Assigning meaning expr ::= expr:e1 PLUS expr:e2 {: RESULT = new Integer(e1.intValue() + e2.intValue()); :} | expr:e1 MINUS expr:e2 {: RESULT = new Integer(e1.intValue() - e2.intValue()); :} | expr:e1 MULT expr:e2 {: RESULT = new Integer(e1.intValue() * e2.intValue()); :} | expr:e1 DIV expr:e2 {: RESULT = new Integer(e1.intValue() / e2.intValue()); :} | MINUS expr:e1 {: RESULT = new Integer(0 - e1.intValue(); :} %prec UMINUS | LPAREN expr:e1 RPAREN {: RESULT = e1; :} | NUMBER:n {: RESULT = n; :} ; • Symbol labels used to name variables • RESULT names the left-hand side symbol
Building an AST • More useful representation of syntax tree • Less clutter • Actual level of detail depends on your design • Basis for semantic analysis • Later annotated with various information • Type information • Computed values
Parse tree vs. AST expr + expr + expr expr expr 1 + ( 2 ) + ( 3 ) 1 2 3
AST construction • AST Nodes constructed during parsing • Bottom-up parser • Grammar rules annotated with actions for AST construction • When node is constructed all children available (already constructed)
int_const int_const int_const val = 3 val = 2 val = 1 plus plus e1 e1 e2 e2 AST construction expr ::= expr:e1 PLUS expr:e2 {: RESULT = new plus(e1,e2); :} | LPAREN expr:e RPAREN {: RESULT = e; :} | INT_CONST:i {: RESULT = new int_const(…, i); :} 1 + (2) + (3) expr + (2) + (3) expr + (expr) + (3) expr + (3) expr + (expr) expr expr expr expr expr expr 1 + ( 2 ) + ( 3 )