Chap. 4, Intermediate Code Generation

Chap. 4, Intermediate Code Generation

Compilation in a Nutshell 1 Source code (character stream) if (b == 0) a = b; Lexical analysis if ( b == 0 ) a = b ; Token stream Parsing if ; == = Abstract syntax tree (AST) b 0 a b Semantic Analysis if boolean int == = ; Decorated AST int b int 0 int a lvalue intb

fp 4 fp 8 Compilation in a Nutshell 2 if boolean int == = ; Intermediate Code Generation int b int 0 int a lvalue intb CJUMP == MEM CONST MOVE NOP Optimization + 0 MEM MEM fp 8 + + CJUMP == Code generation CX CONST MOVE NOP CMP CX, 0 CMOVZ DX,CX 0 DX CX

Outline • Intermediate Code Representation • Expressions Translation • Array Element Translation • Type Conversion • Boolean Expression Translation • Procedure Translation

Role of Intermediate Code • Closer to target language. • simplifies code generation. • Machine-independent. • simplifies retargeting of the compiler. • Allows a variety of optimizations to be implemented in a machine-independent way. • Many compilers use several different intermediate representations.

Different Kinds of IRs • Graphical IRs: the program structure is represented as a graph (or tree) structure. Example: parse trees, syntax trees, DAGs. • Linear IRs: the program is represented as a list of instructions for some virtual machine. Example: three-address code. • Hybrid IRs: combines elements of graphical and linear IRs.

Graphical IRs 1: Parse Trees • A parse tree is a tree representation of a derivation during parsing. • Constructing a parse tree: • The root is the start symbol S of the grammar. • Given a parse tree for  X , if the next derivation step is  X    1…n  then the parse tree is obtained as:

Graphical IRs 2: Abstract Syntax Trees (AST) A syntax tree shows the structure of a program by abstracting away irrelevant details from a parse tree. • Each node represents a computation to be performed; • The children of the node represents what that computation is performed on.

Graphical IRs 3: Directed Acyclic Graphs (DAGs) A DAG is a contraction of an AST that avoids duplication of nodes. • reduces compiler memory requirements; • exposes redundancies. E.g.: for the expression (x+y)*(x+y), we have: AST: DAG:

Linear IR 1: Three Address Code • Instructions are of the form ‘x = y op z,’ where x, y, z are variables, constants, or “temporaries”. • At most one operator allowed on RHS, so no ‘built-up” expressions.

Three Address Code: Example • Source: if ( x + y*z > x*y + z) a = 0; • Three Address Code: t1 = y*z t2 = x+t1 // x + y*z t3 = x*y t4 = t3+z // x*y + z if (t2  t4) goto L a = 0 L:

Assignment: x = y op z (op binary) x = op y (op unary); x = y Jumps: if ( x op y ) goto L (L a label); goto L Pointer and indexed assignments: x = y[ z ] y[ z ] = x x = &y x = *y *y = x. Procedure call/return: param x, k (x is the kth param) retval x call p enter p leave p return retrieve x Type Conversion: x = cvt_A_to_B y (A, B base types) e.g.: cvt_int_to_float Miscellaneous label L An Example Intermediate Instruction Set

Three Representations of Instructions • Three representations of instructions in a data structure • Quadruples • Triples • Indirect triples

Quadruples • Quadruple (quad): four fields • op, arg1, arg2, result • Exceptions: • Unary operators: no arg2 • Param: no arg2 and result • Conditional and unconditional jumps: put the target label in result

Quadruples for a=c*b+c*b op arg1 arg2 result (1) * c b T1 (2) * c b T2 (3) + T1 T2 a

Quadruples for a:=(b+c)*e+(b+c)/f

Triple • Triple : three fields • op, arg1, arg2

SDD for Expression Translation • Thesynthesized attribute S.code represents the three-address code for non-terminal S。 • Each non-terminalE has two attributes： • E.place represents the place to store E’s value。 • E.code represents the three-address code for non-terminal E。 • Function newtemp returns a different temp variable, such as T1,T2,…, for each call.

Three-address Code Generation SDD for Expression Translation Production Semantic Rules S→id:=E S.code:=E.code || gen(id.place ‘:=’ E.place) E→E1+E2 E.place:=newtemp; E.code:=E1.code || E2.code || gen(E.place ‘:=’ E1.place ‘+’ E2.place) E→E1*E2 E.place:=newtemp; E.code:=E1.code || E2.code || gen(E.place ‘:=’ E1.place ‘*’ E2.place) E→-E1 E.place:=newtemp; E.code:=E1.code || gen(E.place ‘:=’ ‘uminus’ E1.place) E→ (E1) E.place:=E1.place; E.code:=E1.code E→id E.place:=id.place; E.code=‘ ’

S→id:=E S.code:=E.code || gen(id.place ‘:=’ E.place) E→E1+E2 E.place:=newtemp; E.code:=E1.code || E2.code ||gen(E.place ‘:=’ E1.place ‘+’ E2.place) E→E1*E2 E.place:=newtemp; E.code:=E1.code || E2.code || gen(E.place ‘:=’ E1.place ‘*’ E2.place) Three-address Code Generation SDT for Expression Translation S→id:=E { p:=lookup(id.name); if pnil then emit(p ‘:=’ E.place) else error } E→E1+E2 { E.place:=newtemp; emit(E.place ‘:=’ E1.place ‘+’ E2.place)} E→E1*E2 { E.place:=newtemp; emit(E.place ‘:=’ E 1.place ‘*’ E 2.place)}

E→-E1 E.place:=newtemp; E.code:=E1.code || gen(E.place ‘:=’ ‘uminus’ E1.place) E→ (E1) E.place:=E1.place; E.code:=E1.code E→id E.place:=id.place; E.code=‘ ’ Three-address Code Generation SDT for Expression Translation E→-E1 { E.place:=newtemp; emit(E.place‘:=’ ‘uminus’E 1.place)} E→(E1) { E.place:=E1.place} E→id { p:=lookup(id.name); if pnil then E.place:=p else error }

a:=(b+c)*e+(b+c)/f

Addressing Array Elements A: array[1..2, 1..3] • Column major A[1, 1], A[2, 1], A[1, 2], A[2, 2], A[1, 3], A[2, 3] • Row major A[1, 1], A[1, 2], A[1, 3], A[2, 1], A[2, 2], A[2, 3] A[i1, i2]address: base + ( (i1 low1) n2 + (i2 low2 ) ) w =( (i1n2 ) + i2 ) w + (base ( (low1 n2 ) + low2 ) w)

Addressing Array Elements • For an array A[low, low+n-1] with n elements • A[i] begins at: base + (i-low)*w • For k-dimensional arrays, • lowi is the lower-bound of i-th dimension, • ((…i1 n2+i2)n3+i3)…)nk+ik)×w + base-((…((low1 n2+low2)n3+low3)…)nk+lowk)×w VARPART CONSPART

Array Element Processing Grammar • L → id [ Elist ] | id Elist→Elist,E | E To facilitate processing, We rewrite the grammar as L→Elist ] | id Elist→Elist, E | id [ E

New attributes and functions • Elist.array ： Symbol table entry of id • Elist.ndim ：number of dimensions. • Elist.place ：a temporary variable to store the value calculated from the index expression. • limit(array，j) ：return the length of the j-th dimension.

Each non-terminal Lhas two attribute values • L.place： • Symbol table entry of L if L is a simple variable • CONSPART value if L is a indexed variable • L.offset ： • Null if L is a simple variable • VARPART value if L is a indexed variable

(1) S→L:=E (2) E→E+E (3) E→(E) (4) E→L (5) L→Elist ] (6) L→id (7) Elist→ Elist, E (8) Elist→id [ E

(1) S→L:=E { if L.offset=null then /*L is a simple variable*/ emit(L.place ‘:=’ E.place) else emit( L.place ‘ [’ L.offset ‘]’ ‘:=’ E.place)} (2) E→E1 +E2 { E.place:=newtemp; emit(E.place ‘:=’ E 1.place ‘+’ E 2.place)}

(3) E→(E1) {E.place:=E1.place} (4) E→L { if L.offset=null then E.place:=L.place else begin E.place:=newtemp; emit(E.place ‘:=’ L.place ‘[’ L.offset ‘]’ ) end }

A[i1,i2,…,ik]((…i1 n2+i2)n3+i3)…)nk+ik)×w +base-((…((low1 n2+low2)n3+low3)…)nk+lowk)×w (8) Elist→id [ E { Elist.place:=E.place; Elist.ndim:=1; Elist.array:=id.place }

A[ i1,i2,…,ik ]( (…i1 n2+i2)n3+i3)…)nk+ik)×w +base-((…((low1 n2+low2)n3+low3)…)nk+lowk)×w (7) Elist→ Elist1, E { t:=newtemp; m:=Elist1.ndim+1; emit(t ‘:=’ Elist1.place ‘*’ limit(Elist1.array,m) ); emit(t ‘:=’ t ‘+’ E.place); Elist.array:= Elist1.array; Elist.place:=t; Elist.ndim:=m }

A[i1,i2,…,ik]((…i1 n2+i2)n3+i3)…)nk+ik) ×w +base-((…((low1 n2+low2)n3+low3)…)nk+lowk)×w (5) L→Elist ] { L.place:=newtemp; emit(L.place ‘:=’ Elist.array ‘－’ C); L.offset:=newtemp; emit(L.offset ‘:=’ w ‘*’Elist.place) } (6) L→id { L.place:=id.place; L.offset:=null }

a:=B[i,j]

Type Conversion • E.type: the data type of non-terminal E • Suppose there are two data types: • int op • real op • The semantic action for EE1 op E2： { if E1.type=integer and E2.type=integer E.type:=integer else E.type:=real }

Type Conversion Example • x:=y＋i*j in which x,yare realand i,j are int。 Three address codes: T1:=i int* j T3:=inttoreal T1 T2:=y real+ T3 x:=T2

Semantic Action for E→E1 ＋E2 { E.place:=newtemp; if E1.type=integer and E2.type=integer then begin emit (E.place ‘:=’ E 1.place ‘int+’ E 2.place); E.type:=int end else if E1.type=real and E2.type=real then begin emit (E.place ‘:=’ E 1.place ‘real+’ E 2.place); E.type:=real end

else if E1.type=integer and E2.type=real then begin u:=newtemp; emit (u ‘:=’ ‘inttoreal’ E 1.place); emit (E.place ‘:=’ u ‘real+’ E 2.palce); E.type:=real end else if E1.type=real and E1.type=integer then begin u:=newtemp; emit (u ‘:=’ ‘inttoreal’ E 2.place); emit (E.place ‘:=’ E 1.place ‘real+’ u); E.type:=real end else E.type:=type_error}

Two translation methods • Direct translation： A or B and C=D (1) (=, C, D, T1) (2) (and, B, T1, T2) (3) (or, A, T2, T3) • Translation with optimization • if (x<100 or x>200 and x<>y) x:=0; • if x<100 goto L2ifFalse x>200 goto L1ifFlase x<>y goto L1L2: x=0L1:

Outline • Three-Address Code • Expressions Translation • Array Element Translation • Type Conversion • Boolean Expression Translation • Direct Translation • Optimized Translation • Backpatching • Procedure Translation

Direct translation • a or b and not c can be translated into T1:=not c T2:=b and T1 T3:=a or T1 • a<bcan be written as if a<b then 1 else 0 Hence, it can translated into 100: if a<b goto 103 101: T:=0 102: goto 104 103: T:=1 104:

Boolean Expression Direct Translation SDT • emit – print the three address code to the output file • nextstat – address index for the next three address code • emitwill add 1 to nextstat by generating a new three address code

Boolean Expression Direct Translation SDT E→E1 or E2 {E.place:=newtemp; emit(E.place ‘:=’ E 1.place ‘or’ E2.place)} E→E1 and E2 {E.place:=newtemp; emit(E.place ‘:=’ E 1.place ‘and’ E2.place)} E→not E1 {E.place:=newtemp; emit(E.place ‘:=’ ‘not’ E 1.place)} E→(E1) {E.place:=E1.place}

Boolean Expression Direct Translation SDT • a<b is translated into • 100: if a<b goto 103 • 101: T:=0 • 102: goto 104 • 103: T:=1 • 104: Eid1 relop id2 { E.place:=newtemp; emit(‘if’ id1.place relop. op id2. place ‘goto’ nextstat+3); emit(E.place ‘:=’ ‘0’); emit(‘goto’ nextstat+2); emit(E.place‘:=’ ‘1’) } E→id { E.place:=id.place }

a<b or c<d and e<fDirection Translation Eid1 relop id2 { E.place:=newtemp; emit(‘if’ id1.place relop. op id2. place ‘goto’ nextstat+3); emit(E.place ‘:=’ ‘0’); emit(‘goto’ nextstat+2); emit(E.place‘:=’ ‘1’) } E→id { E.place:=id.place } E→E1 or E2 { E.place:=newtemp; emit(E.place ‘:=’ E 1.place ‘or’ E2.place)} E→E1 and E2 { E.place:=newtemp; emit(E.place ‘:=’ E 1.place ‘and’ E2.place) } 100: if a<b goto 103 101: T1:=0 102: goto 104 103: T1:=1 104: if c<d goto 107 105: T2:=0 106: goto 108 107: T2:=1 108: if e<f goto 111 109: T3:=0 110: goto 112 111: T3:=1 112: T4:=T2 and T3 113: T5:=T1 or T4

Outline • Three-Address Code • Expressions Translation • Array Element Translation • Type Conversion • Boolean Expression Translation • Direct Translation • Optimized Translation • Backpatching • Procedure Translation

Chap. 4, Intermediate Code Generation

Chap. 4, Intermediate Code Generation

Presentation Transcript

Intermediate Code Generation

Generation of Intermediate Code

Intermediate Code Generation

Intermediate Code Generation

UNIT – 6 INTERMEDIATE-CODE GENERATION

Intermediate Code Generation

Intermediate Code Generation

Intermediate code generation

Intermediate Code Generation

Chap 8: Intermedicate Code Generation

8 Intermediate code generation

Intermediate Code Generation

Intermediate Code Generation

Intermediate Code Generation

Intermediate Code Generation

Intermediate Code Generation

Intermediate Code Generation

Intermediate code generation

Intermediate Code Generation

Intermediate Code Generation

Intermediate Code Generation