Review: Syntax directed translation. Translation is done according to the parse tree.

Review: Syntax directed translation. • Translation is done according to the parse tree. • Each production (when used in the parsing) is a sub-structure of the parse tree. • Attributes are associated with grammar symbols • Each grammar symbol represents a construct of the program. • Attributes represent the results of translation for the construct. • E.g: translation = construct the syntax tree, use a tree attribute with each symbol • E.g: translation = calculation the result of the exp, use a val attribute to represent the result. • Semantics rules tell what to do (how to compute the related attributes) when the sub-structure is founded.

Two types of attributes: • Synthesized attribute: • Associated with the left hand side symbol of a production. • the value depends on the attributes associated with the symbols in the right hand side of the production (attributes of its children nodes in the parse tree). • Inherited attribute: • Associated with a symbol in the left hand side of a production. • The value depends on the attributes of its parent or sibling nodes in the parse tree.

Two ways to define the translation: • Syntax directed definition. • Just define the attributes and semantics rules without specifying the order to evaluate the rules. • The order is implicit in the rules • To realize a general syntax directed definition, the compiler needs to conceptually do the following: • Build the parse tree  topologically sort the nodes based on the implicit order  evaluate the attributes • Not efficient if this has to be done. • Some special definitions can be implemented efficiently without actually build the parse tree. • S-attributed definitions. • L-attributed definitions.

Two ways to define the translation: • Syntax directed translation. • Not only define the attributes and the semantics rules, but also specify the order of how the semantics rules should be applied.

Realizing an S-attributed definition in a LR parser: • Extend the stack to have an additional field (val) for the S-attribute. State val … … (X, sx) X.x (Y, sy) Y.y (Z, zy) Z.z … … top Parser stack

L  E n {print(e.val)} E  E1 + T {E.val = E1.val + T.val} E  T {E.val = T.val} T  T1 * F {T.val = T1.val + F.val} T  F {T.val = F. val} F  ( E ) {F.val = E.val} F  digit {F.val = digit.lexval} • Realizing a S-attributed definition in a LR parser (example 5.17 at page 296): L  E n {print(val[top];} E  E1 + T {val[top-2] = val[top-2] + val[top];} E  T T  T1 * F {val[top-2] = val[top-2] * val[top];} T  F F  ( E ) {val[top-2] = val[top-1];} F  digit

YACC allows only synthesized attributes • It can also handle special types of L-attributes • An attributes can depend on the attributes of the sibling to its left. • Those attributes are already on the stack. How to access them: $i with I <= 0. See the example yacc_inherit.y • Using this is somewhat tricky, need to make sure the context of a production is exactly the same outside the production. • Need to use markers in many cases. • Or passing the attributes with global variables. This is also tricky.

Static checking and symbol table • chapter 6, chapter 7.6 and chapter 8.2 • Static checking: check whether the program follows both the syntactic and semantic conventions at compile time (versus dynamic checking -- check at run time). • Examples of static checking • Type checks: • Flow of control checks int a, b[10], c; … a = b + c; main { int I …. I++; break; }

Examples of static checks • uniqueness check: • defined before use: • name related check: • Some checks can only be done at runtime: • arraybound checking in java: a[i] = 0; main() { int i, j; double i, j; …. } main() { int i; i1 = 0; …. } LOOPA: LOOP EXIT WHEN I=N I=I+1; END LOOP LOOPB;

To perform static checks, semantic information must be recorded in some place -- symbol table. • Grammar specifies the syntax, additional (semantic) information, sometime called attributes, must be recorded in symbol table for all identifiers. • Typically attributes in a symbol table entry include type and offset (where in the memory can I find this variable?). • Struct {int id; int type; int offset;} stentry; • Organization of a symbol table: • basic requirement: must be able to find the information associated with a symbol (identifier) quickly. • Example: array, link list, hash table. • Provides two functions: enter(table, name, type, offset) and lookup(name);

Program sort(input, output) var a: array [0..10] of integers; x: integer; procedure readarray var x : real; begin …. x …. End procedure quicksort(i, j) begin … x … end begin … x … end main() { int a, b; a = 0; { int a; a = 1; } printf(“a = %d\n”, a); } • Dealing with nested scope: • How to organize the symbol table? • How to do lookup and enter? • One symbol table for each scope (procedure, blocks)? • Maintain a stack of symbol tables for lookup/enter

Symbol table for sort • Symbol tables for sort: nil header a ... x ... readarray quicksort header x …. header Symbol table for quicksort Symbol table for readarray

How does the compiler created the symbol table? • First let us consider the simple case: no nested scope, every thing entered into one symbol table: table by using • enter (table, id, type, offset) • grammar: P ->D D ->D; D D ->id : T T -> integer T ->real T ->array [num] of T T ->^T I : array [10] of integer; j : real; k : integer I array(10, integer) 0 j real 40 k integer 48

P -> {offset = 0;} D D ->D; D D ->id : T {enter(table, id.name, T.type, offset); offset:= offset + T.width} T -> integer {T.type = integer; T.width = 4} T ->real {T.type = real; T.width = 8;} T ->array [num] of T1 {T.type = array(num.val, T1.type); T.width = num.val * T1.width} T ->^T1 {T.type = pointer(T1.type); T.width = 4;}

Now consider the case when you have nested procedures (blocks can be considered as special procedures) • must maintain a stack of symbol tables, create new ones when entering new procedure • must reset offset when entering new procedures (a stack of offsets) • Let us also compute the total size of a table • Grammar: P->D D ->D; D D->id : T D->proc id; D; S T ->integer | real | array[num] of T | ^T

mktable(previous): make a new table, properly set all links and related information. • Enter(table, name, type, offset). • Addwidth(table, width): compute all memory needed by the symbol table. • Enterproc(table, name, newtable): enter the procedure name with its symbol table into the old table. • Grammar: P->{t=mktable(nil); push(t, tblptr);push(0, offset);}D {addwidth(top(tblptr), top(offset))} D ->D; D D->id : T {enter(top(tblptr), id.name, T.type, top(offset)); top(offset) = top(offset) + T.width;} D->proc id; {t:=mktable(top(tblptr));push(t, tblptr); push(0, offset);}D; S {t:= top(tblptr);addwidth(t, top(offset)); pop(tblptr); pop(offset);enterproc(top(tblptr), id.name, t)}

Dealing with structure (record): • T ->record D end • Make a new symbol table for all the fields in the record.

T->record { t=mktable(nil); push(t, tblptr); push(0, offset); } D end { T.type = record(top(tblptr)); T.width = top(offset); pop(tblptr); pop(offset); }

Question: How does allowing variable declaration at anywhere in a program (like in C++, java) affect the maintenance of the symbol tables?

Review: Syntax directed translation. Translation is done according to the parse tree.