1 / 47

Compiler construction in4020 – lecture 5

Compiler construction in4020 – lecture 5. Koen Langendoen Delft University of Technology The Netherlands. Summary of lecture 4. syntax analysis: tokens  AST bottom-up parsing push-down automaton ACTION/GOTO tables LR(0) NO look-ahead

sylvia
Download Presentation

Compiler construction in4020 – lecture 5

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Compiler constructionin4020 – lecture 5 Koen Langendoen Delft University of Technology The Netherlands

  2. Summary of lecture 4 • syntax analysis: tokens  AST • bottom-up parsing • push-down automaton • ACTION/GOTO tables • LR(0) NO look-ahead • SLR(1) one-token look-ahead, FOLLOW sets to solve shift-reduce conflicts • LR(1) SLR(1), but FOLLOW set per item • LALR(1) LR(1), but “equal” states are merged

  3. Quiz 2.50 Is the following grammar LR(0), SLR(1), LALR(1), or LR(1) ? (a) S  x S x | y (b) S  x S x | x

  4. program text lexical analysis tokens parser language syntax analysis generator grammar AST context handling annotated AST Overview • semantic analysis • identification – symbol tables • type checking • assignment • yacc • LLgen

  5. se-man-tic: of or relating to meaning in language. Webster’s Dictionary Semantic analysis • information is scattered throughout the program • identifiers serve as connectors • find defining occurrence of each applied occurrence of an identifier in the AST • undefined identifiers  error • unused identifiers  warning • check rules in the language definition • type checking • control flow (dead code)

  6. Semantic analysis • information is scattered throughout the program • identifiers serve as connectors • find defining occurrence of each applied occurrence of an identifier in the AST • undefined identifiers  error • unused identifiers  warning • check rules in the language definition • type checking • control flow (dead code)

  7. Symbol table program text • global storage used by all compiler phases • holds information about identifiers: • type • location • size s y m b o l t a b l e lexical analysis syntax analysis context handling annotated AST optimizations code generation executable

  8. next next next bucket 0 name name name bucket 1 type type bucket 2 type … … bucket 3 … Symbol table implementation • extensible string-indexable array • linear list • tree • hash table ”aap” hash function: string  int ”noot” ”mies”

  9. Identification • different kinds of identifiers • variables • type names • field selectors • name spaces • scopes typedef int i; int j; void foo(int j) { struct i {i i;} i; i: i.i = 3; printf( "%d\n", i.i); }

  10. Identification • different kinds of identifiers • variables • type names • field selectors • name spaces • scopes typedef int i; int j; void foo(int j) { struct i {i i;} i; i: i.i = 3; printf( "%d\n", i.i); }

  11. Handling scopes • stack of scope elements • when entering a scope a new element is pushed on the stack • declared identifiers are entered in the top scope element • applied identifiers are looked up in the scope elements from top to bottom • the top element is removed upon scope exit

  12. scope stack 2 1 0 ”mies” ”aap” ”noot” decl decl decl … … … A scoped hash-based symbol table aap( int noot) { int mies, aap; .... } 1 prop hash table bucket 0 2 prop 0 prop bucket 1 bucket 2 bucket 3 level 2 prop

  13. Identification: complications • overloading • operators: N*2prijs*2.20371 • functions:PUT(s:STRING) PUT(i:INTEGER) • solution: yield set of possibilities (to be constrained by type checking) • imported scopes • C++ scope resolution operator x:: • Modula FROM module IMPORT ... • solution: stack (or merge) the new scope

  14. Type checking • operators and functions impose restrictions on the types of the arguments • types • basic types • structured types • type names typedef struct { double re; double im; } complex;

  15. Forward declarations • recursive data structures • type information must be stored • type table • type information must be resolved • undefined types • circularities TYPE Tree = POINTER TO Node; Type Node = RECORD element : Integer; left, right : Tree; END RECORD;

  16. Type equivalence • name equivalence [all types get a unique name] VAR a : ARRAY [Integer 1..10] OF Real; VAR b : ARRAY [Integer 1..10] OF Real; • structural equivalence [difficult to check] TYPE c = RECORD i : Integer; p : POINTER TO c; END RECORD; TYPE d = RECORD i : Integer; p : POINTER TO RECORD i : Integer; p : POINTER to c; END RECORD; END RECORD;

  17. Coercions • implicit data and type conversion to match operand (argument) type • coercions complicate identification (ambiguity) • two phase approach • expand a type to a set by applying coercions • reduce type sets based on constraints imposed by (overloaded) operators and language semantics VAR a : Real; ... a := 5; 3.14 + 7 8 + 9

  18. := (location of) p deref (location of) q Variable: value or location? • two usages of variables rvalue: value lvalue: location • insert coercion to dereference variable • checking rules: VAR p : Real; VAR q : Real; ... p := q;

  19. Exercise (5 min.) complete the table

  20. Answers complete table

  21. Break

  22. Assignment (practicum) Asterix compiler 1) replace yacc by LLgen 2) make Asterix object-oriented • classes and objects • inheritance and dynamic binding Asterix program token description lex lexical analysis Asterix grammar yacc syntax analysis context handling code generation C-code

  23. tokens + properties grammar rules + actions auxiliary C-code Yet another compiler compiler • yacc (bison):parser generator for UNIX • LALR(1) grammar C code • format of the yacc input file: definitions %% rules %% user code

  24. Yacc-basedexpression interpreter • input file • yacc maintains a stack of “values” that may be referenced ($i) in the semantic actions %token DIGIT %% line: expr '\n' { printf("%d\n", $1);} ; expr : expr '+' expr { $$ = $1 + $3;} | expr '*' expr { $$ = $1 * $3;} | '(' expr ')' { $$ = $2;} | DIGIT ; %% grammar semantics

  25. Yacc interface to lexical analyzer • yacc invokes yylex() to get the next token • the “value” of a token must be stored in the global variable yylval • the default value type is int, but can be changed %% yylex() { int c; c = getchar(); if (isdigit(c)) { yylval = c - '0'; return DIGIT; } return c; }

  26. Yacc interface to back-end • yacc generates a function named yyparse() • syntax errors are reported by invoking a callback function yyerror() %% yylex() { ... } main() { yyparse(); } yyerror() { printf("syntax error\n"); exit(1); }

  27. Yacc-basedexpression interpreter • input file (desk0) • run yacc %% line: expr '\n' { printf("%d\n", $1);} ; expr : expr '+' expr { $$ = $1 + $3;} | expr '*' expr { $$ = $1 * $3;} | '(' expr ')' { $$ = $2;} | DIGIT ; %% > make desk0 bison -v desk0.y desk0.y contains 4 shift/reduce conflicts. gcc -o desk0 desk0.tab.c >

  28. Conflict resolution in Yacc • shift-reduce: prefer shift • reduce-reduce: prefer the rule that comes first

  29. NO > desk0 2*3+4 14 Yacc-basedexpression interpreter • input file (desk0) • run yacc • run desk0, is it correct? %% line: expr '\n' { printf("%d\n", $1);} ; expr : expr '+' expr { $$ = $1 + $3;} | expr '*' expr { $$ = $1 * $3;} | '(' expr ')' { $$ = $2;} | DIGIT ; %%

  30. Operator precedence in Yacc priority from top (low) to bottom (high) %token DIGIT %left '+' %left '*' %% line: expr '\n' { printf("%d\n", $1);} ; expr : expr '+' expr { $$ = $1 + $3;} | expr '*' expr { $$ = $1 * $3;} | '(' expr ')' { $$ = $2;} | DIGIT ; %%

  31. Exercise (7 min.) multiple lines: %% lines: line | lines line ; line: expr '\n' { printf("%d\n", $1);} ; expr : expr '+' expr { $$ = $1 + $3;} | expr '*' expr { $$ = $1 * $3;} | '(' expr ')' { $$ = $2;} | DIGIT ; %% Extend the interpreter to a desk calculator with registers named a – z. Example input: v=3*(w+4)

  32. Answers

  33. Answers %{ int reg[26]; %} %token DIGIT %token REG %right '=' %left '+' %left '*' %% expr : REG '=' expr { $$ = reg[$1] = $3;} | expr '+' expr { $$ = $1 + $3;} | expr '*' expr { $$ = $1 * $3;} | '(' expr ')' { $$ = $2;} | REG{ $$ = reg[$1];} | DIGIT ; %%

  34. Answers %% yylex() {int c = getchar(); if (isdigit(c)) { yylval = c - '0'; return DIGIT; } else if ('a' <= c && c <= 'z') { yylval = c - 'a'; return REG; } return c; }

  35. LLgen: LL(1) parser generator • LLgen is part of the Amsterdam Compiler Kit • takes LL(1) grammar + semantic actions in C and generates a recursive descent parser • LLgen features: • repetition operators • advanced error handling • parameter passing • control over semantic actions • dynamic conflict resolvers

  36. LLgen example:expression interpreter • start from LR(1) grammar • make grammar LL(1) • use repetition operators lines : line | lines line ; line: expr '\n' ; expr : expr '+' expr | expr '*' expr | '(' expr ')‘ | DIGIT ; • left recursion • operator precedence yacc

  37. LLgen example:expression interpreter • start from LR(1) grammar • make grammar LL(1) • use repetition operators %token DIGIT; main : [line]+ ; line : expr '\n' ; expr : term [ '+' term ]* ; term : factor [ '*' factor ]* ; factor : '(' expr ')‘ | DIGIT ; • left recursion • operator precedence • add semantic actions • attach parameters to grammar rules • insert C-code between the symbols LLgen

  38. grammar semantics main : [line]+ ; line {int e;} : expr(&e) '\n' { printf("%d\n", e);} ; expr(int *e) {int t;} : term(e) [ '+' term(&t) { *e += t;} ]* ; term(int *t) {int f;} : factor(t) [ '*' factor(&f) { *t *= f;} ]* ; factor(int *f) : '(' expr(f) ')' | DIGIT { *f = yylval;} ;

  39. grammar semantics values/results passed as parameters main : [line]+ ; line {int e;} : expr(&e) '\n' { printf("%d\n", e);} ; expr(int *e) {int t;} : term(e) [ '+' term(&t) { *e += t;} ]* ; term(int *t) {int f;} : factor(t) [ '*' factor(&f) { *t *= f;} ]* ; factor(int *f) : '(' expr(f) ')' | DIGIT { *f = yylval;} ;

  40. grammar semantics semantic actions: C code between {} main : [line]+ ; line {int e;} : expr(&e) '\n' { printf("%d\n", e);} ; expr(int *e) {int t;} : term(e) [ '+' term(&t) { *e += t;} ]* ; term(int *t) {int f;} : factor(t) [ '*' factor(&f) { *t *= f;} ]* ; factor(int *f) : '(' expr(f) ')' | DIGIT { *f = yylval;} ;

  41. LLgen interface to lexical analyzer • by default LLgen invokes yylex() to get the next token • the “value” of a token can be stored in any global variable (yylval) of any type (int) yylex() { int c; c = getchar(); if (isdigit(c)) { yylval = c - '0'; return DIGIT; } return c; }

  42. LLgen interface to back-end • LLgen generates a user-named function (parse) • LLgen handles syntax errors by inserting missing tokens and deleting unexpected tokens • LLmessage() is invoked to notify the lexical analyzer %start parse, main; LLmessage(int class) { switch (class) { case -1: printf("expecting EOF, "); case 0: printf("deleting token (%d)\n",LLsymb); break; default: /* push back token LLsymb */ printf("inserting token (%d)\n",class); break; } }

  43. Exercise (5 min.) • extend LLgen-based interpreter to a desk calculator with registers named a – z. Example input: v=3*(w+4)

  44. Answers

  45. Answers %token REG; expr(int *e) {int r,t;} : %if (ahead() == '=') reg(&r) '=' expr(e) { reg[r] = *e;} | term(e) [ '+' term(&t) { *e += t;} ]* ; factor(int *f) {int r;} : '(' expr(f) ')' | DIGIT { *f = yylval;} | reg(&r) { *f = reg[r];} ; reg(int *r) : REG { *r = yylval;} ;

  46. dynamic conflict resolution Answers %token REG; expr(int *e) {int r,t;} : %if (ahead() == '=') reg(&r) '=' expr(e) { reg[r] = *e;} | term(e) [ '+' term(&t) { *e += t;} ]* ; factor(int *f) {int r;} : '(' expr(f) ')' | DIGIT { *f = yylval;} | reg(&r) { *f = reg[r];} ; reg(int *r) : REG { *r = yylval;} ;

  47. Homework • study sections: • 2.2.4.6 LLgen • 2.2.5.9yacc • assignment 1: • replace yacc with LLgen • deadline April 9 08:59 • print handout for next week [blackboard]

More Related