230 likes | 406 Views
Abstract Syntax. Mooly Sagiv html://www.cs.tau.ac.il/~msagiv/courses/wcc04.html. Outline. The general idea Bison Motivating example Interpreter for arithmetic expressions The need for abstract syntax Abstract syntax for arithmetic expressions.
E N D
Abstract Syntax Mooly Sagiv html://www.cs.tau.ac.il/~msagiv/courses/wcc04.html
Outline • The general idea • Bison • Motivating example Interpreter for arithmetic expressions • The need for abstract syntax • Abstract syntax for arithmetic expressions
Semantic Analysis during Recursive Descent Parsing • Scanner returns “semantic values” for some tokens • The function of every non-terminal returns the “corresponding subtree value” • When A B C Dis appliedthe function for A can use the values returned by B, C, and D • The function can also pass parameters, e.g., to D(), reflecting left contexts
int E() { switch (tok) { case num : temp=tok.val; eat(num); returnEP(temp); default: error(…); }} E num E’ E’ E’+ num E’ int EP(int left) { switch (tok) { case $: return left; case + : eat(+); temp=tok.val; eat(num); returnEP(left + temp); default: error(…) ;}}
Semantic Analysis during Bottom-Up Parsing • Scanner returns “semantic values” for some tokens • Use parser stack to store the “corresponding subtree values” • When A B C Dis reducedthe function for A can use the values returned by B, C, and D • No action in the middle of the rule
Example num 5 E E + num E num + E 7 E 12
Bison Specification Declarations %% Productions %% C -Routines
Interpreter (in Bison) %{ declarations of yylex() and yyeror() %} %union { int num; string id;} %token <id> ID %token <num> NUM %type <num> e f t %start e %% e : e ‘+’ t {$$ = $1 + $3 ; } | e ‘-’ t { $$ = $1 - $3 ; } | t { $$ = $1; } ; t : t ‘*’ f { $$ = $1 * $3; } | t ‘/’ f { $$= $1 / $3; } | f { $$ = $1; } ; f : NUM { $$ = $1; } | ID { $$ = lookup($1); } | ‘-’ e { $$ = - $2; } | ‘(‘ e ‘)’ { $$ = $2; } ;
%{ declarations of yylex() and yyeror() %} %union { int num; string id;} %token <id> ID %token <num> NUM %type <num> e %start e %left PLUS MINUS %left MUL DIV %right UMINUS %% Interpreter (compact spec.) e : e PLUS e {$$ = $1 + $3 ; } | e MINUS e { $$ = $1 - $3 ; } | e MUL e { $$ = $1 * $3; } | e DIV e { $$= $1 / $3; } | NUM | ID { $$ = lookup($1); } | MINUS e % prec UMINUS { $$ = - $2; } | ‘(‘ e ‘)’ { $$ = $2; } ;
stack input action $ 7+11+17$ shift num 7 $ +11+17$ reduce e num e 7 $ +11+17$ shift + e 7 $ 11+17$ shift num 11 + e 7 $ +17$ reduce enum
stack input action e 18 $ +17$ shift + e 18 $ 17$ shift num 17 + e 18 $ $ reduce enum e 11 + e 7 $ +17$ reduce e e+e
stack input action $ accept e 35 $ e 17 + e 18 $ $ reduce e::= e+e
So why can’t we write all the compiler code in Bison/CUP?
Historical Perspective • Originally parsers were written w/o tools • yacc, bison, ... make tools acceptable • But it is still difficult to write compilers in parser actions (top-down and bottom-up) • Natural grammars are ambiguous • No modularity principle • Many useful programming language features prevent code generation while parsing • Use before declaration • gotos
Abstract Syntax • Intermediate program representation • Defines a tree - Preserves program hierarchy • Generated by the parser • Declared using an (ambiguous) context free grammar (relatively flat) • Not meant for parsing • Keywords and punctuation symbols are not stored (Not relevant once the tree exists) • Big programs can be also handled (possibly via virtual memory)
Issues • Concrete vs. Abstract syntax tree • Need to store concrete source position • Abstract syntax can be defined by: • Ambiguous context free grammar • C recursive data type • “Constructor” functions • Debugging routines linearize the tree
Exp id Unop - (UnMin) (IdExp) Exp num (NumExp) Exp Exp Binop Exp Exp Unop Exp (UnOpExp) (BinOpExp) Binop + (Plus) Binop - (Minus) Binop * (Times) Binop / (Div) Abstract Syntax for Arithmetic Expressions
/* absyn.h */ typedef enum {A_Plus, A_Minus, A_Times, A_Div} Binop; typedef enum {A_UnMin} Unop; typedef enum {A_IdExp, A_NumExp, A_BinOpExp, A_UnopExp} ExpKind; typedef struct exp { ExpKind kind ; union { struct {string repr ;} IdExp; struct {int number ; } NumExp; struct {Binop op; struct exp *left; struct exp *right;} BinExp; struct {Unop op; struct exp *arg;} UnExp; } exp; } *Exp; extern Exp idExp(string repr); extern Exp numExp(int number); extern Exp binOp(Binop, struct exp *, struct exp*); extern Exp unOp(Unop, struct exp*);
e : e ‘+’ t {$$ = binop(A_Plus, $1, $3) ; } | e ‘-’ t { $$ = binop(A_Minus, $1, $3) ; } | t { $$ = $1; } ; t : t ‘*’ f { $$ = binop(A_Times,$1, $3); } | t ‘/’ f { $$= binop(A_Div, $1, $3); } | f { $$ = $1; } ; f : NUM { $$ = numExp($1); } | ID { $$ = idExp($1); } | ‘-’ e { $$ = unOp(A_UnMinus, $2); } | ‘(‘ e ‘)’ { $$ = $2; } ; %{ declarations of yylex() and yyeror() #include “absyn.h” %} %union { int num; string id; Exp exp;} %token <id> ID %token <num> NUM %type <exp> e f t %start e %%
%{ declarations of yylex() and yyeror() #include “absyn.h” %} %union { int num; string id; Exp exp; } %token <id> ID %token <num> NUM %type <exp> e %start e %left PLUS MINUS %left MUL DIV %right UMINUS %% e : e PLUS e {$$ = binop(A_Plus, $1, $3) ; } | e MINUS e { $$ = binop(A_Minus, $1, $3) ; } | e MUL e { $$ = binop(A_Times, $1, $3); } | e DIV e { $$= binop(A_Div, $1, $3); } | NUM {$$ = numExp($1); } | ID { $$ = idExp($1); } | MINUS e % prec UMINUS { $$ = unOp(A_UnMin,$2); } | ‘(‘ e ‘)’ { $$ = $2; } ;
Summary • Flex and Bison simplify the task of writing compiler/interpreter front-ends • Abstract syntax provides a clear interface with other compiler phases • Supports general programming languages • But the design of an abstract syntax for a given PL may take some time