740 likes | 926 Views
Chap. 4, Intermediate Code Generation. Compilation in a Nutshell 1. Source code (character stream). if (b == 0) a = b;. Lexical analysis. if. (. b. ==. 0. ). a. =. b. ;. Token stream. Parsing. if. ;. ==. =. Abstract syntax tree (AST). b. 0. a. b. Semantic Analysis. if.
E N D
Compilation in a Nutshell 1 Source code (character stream) if (b == 0) a = b; Lexical analysis if ( b == 0 ) a = b ; Token stream Parsing if ; == = Abstract syntax tree (AST) b 0 a b Semantic Analysis if boolean int == = ; Decorated AST int b int 0 int a lvalue intb
fp 4 fp 8 Compilation in a Nutshell 2 if boolean int == = ; Intermediate Code Generation int b int 0 int a lvalue intb CJUMP == MEM CONST MOVE NOP Optimization + 0 MEM MEM fp 8 + + CJUMP == Code generation CX CONST MOVE NOP CMP CX, 0 CMOVZ DX,CX 0 DX CX
Outline • Intermediate Code Representation • Expressions Translation • Array Element Translation • Type Conversion • Boolean Expression Translation • Procedure Translation
Role of Intermediate Code • Closer to target language. • simplifies code generation. • Machine-independent. • simplifies retargeting of the compiler. • Allows a variety of optimizations to be implemented in a machine-independent way. • Many compilers use several different intermediate representations.
Different Kinds of IRs • Graphical IRs: the program structure is represented as a graph (or tree) structure. Example: parse trees, syntax trees, DAGs. • Linear IRs: the program is represented as a list of instructions for some virtual machine. Example: three-address code. • Hybrid IRs: combines elements of graphical and linear IRs.
Graphical IRs 1: Parse Trees • A parse tree is a tree representation of a derivation during parsing. • Constructing a parse tree: • The root is the start symbol S of the grammar. • Given a parse tree for X , if the next derivation step is X 1…n then the parse tree is obtained as:
Graphical IRs 2: Abstract Syntax Trees (AST) A syntax tree shows the structure of a program by abstracting away irrelevant details from a parse tree. • Each node represents a computation to be performed; • The children of the node represents what that computation is performed on.
Graphical IRs 3: Directed Acyclic Graphs (DAGs) A DAG is a contraction of an AST that avoids duplication of nodes. • reduces compiler memory requirements; • exposes redundancies. E.g.: for the expression (x+y)*(x+y), we have: AST: DAG:
Linear IR 1: Three Address Code • Instructions are of the form ‘x = y op z,’ where x, y, z are variables, constants, or “temporaries”. • At most one operator allowed on RHS, so no ‘built-up” expressions.
Three Address Code: Example • Source: if ( x + y*z > x*y + z) a = 0; • Three Address Code: t1 = y*z t2 = x+t1 // x + y*z t3 = x*y t4 = t3+z // x*y + z if (t2 t4) goto L a = 0 L:
Assignment: x = y op z (op binary) x = op y (op unary); x = y Jumps: if ( x op y ) goto L (L a label); goto L Pointer and indexed assignments: x = y[ z ] y[ z ] = x x = &y x = *y *y = x. Procedure call/return: param x, k (x is the kth param) retval x call p enter p leave p return retrieve x Type Conversion: x = cvt_A_to_B y (A, B base types) e.g.: cvt_int_to_float Miscellaneous label L An Example Intermediate Instruction Set
Three Representations of Instructions • Three representations of instructions in a data structure • Quadruples • Triples • Indirect triples
Quadruples • Quadruple (quad): four fields • op, arg1, arg2, result • Exceptions: • Unary operators: no arg2 • Param: no arg2 and result • Conditional and unconditional jumps: put the target label in result
Quadruples for a=c*b+c*b op arg1 arg2 result (1) * c b T1 (2) * c b T2 (3) + T1 T2 a
Triple • Triple : three fields • op, arg1, arg2
Outline • Intermediate Code Representation • Expressions Translation • Array Element Translation • Type Conversion • Boolean Expression Translation • Procedure Translation
SDD for Expression Translation • Thesynthesized attribute S.code represents the three-address code for non-terminal S。 • Each non-terminalE has two attributes: • E.place represents the place to store E’s value。 • E.code represents the three-address code for non-terminal E。 • Function newtemp returns a different temp variable, such as T1,T2,…, for each call.
Three-address Code Generation SDD for Expression Translation Production Semantic Rules S→id:=E S.code:=E.code || gen(id.place ‘:=’ E.place) E→E1+E2 E.place:=newtemp; E.code:=E1.code || E2.code || gen(E.place ‘:=’ E1.place ‘+’ E2.place) E→E1*E2 E.place:=newtemp; E.code:=E1.code || E2.code || gen(E.place ‘:=’ E1.place ‘*’ E2.place) E→-E1 E.place:=newtemp; E.code:=E1.code || gen(E.place ‘:=’ ‘uminus’ E1.place) E→ (E1) E.place:=E1.place; E.code:=E1.code E→id E.place:=id.place; E.code=‘ ’
S→id:=E S.code:=E.code || gen(id.place ‘:=’ E.place) E→E1+E2 E.place:=newtemp; E.code:=E1.code || E2.code ||gen(E.place ‘:=’ E1.place ‘+’ E2.place) E→E1*E2 E.place:=newtemp; E.code:=E1.code || E2.code || gen(E.place ‘:=’ E1.place ‘*’ E2.place) Three-address Code Generation SDT for Expression Translation S→id:=E { p:=lookup(id.name); if pnil then emit(p ‘:=’ E.place) else error } E→E1+E2 { E.place:=newtemp; emit(E.place ‘:=’ E1.place ‘+’ E2.place)} E→E1*E2 { E.place:=newtemp; emit(E.place ‘:=’ E 1.place ‘*’ E 2.place)}
E→-E1 E.place:=newtemp; E.code:=E1.code || gen(E.place ‘:=’ ‘uminus’ E1.place) E→ (E1) E.place:=E1.place; E.code:=E1.code E→id E.place:=id.place; E.code=‘ ’ Three-address Code Generation SDT for Expression Translation E→-E1 { E.place:=newtemp; emit(E.place‘:=’ ‘uminus’E 1.place)} E→(E1) { E.place:=E1.place} E→id { p:=lookup(id.name); if pnil then E.place:=p else error }
Outline • Intermediate Code Representation • Expressions Translation • Array Element Translation • Type Conversion • Boolean Expression Translation • Procedure Translation
Addressing Array Elements A: array[1..2, 1..3] • Column major A[1, 1], A[2, 1], A[1, 2], A[2, 2], A[1, 3], A[2, 3] • Row major A[1, 1], A[1, 2], A[1, 3], A[2, 1], A[2, 2], A[2, 3] A[i1, i2]address: base + ( (i1 low1) n2 + (i2 low2 ) ) w =( (i1n2 ) + i2 ) w + (base ( (low1 n2 ) + low2 ) w)
Addressing Array Elements • For an array A[low, low+n-1] with n elements • A[i] begins at: base + (i-low)*w • For k-dimensional arrays, • lowi is the lower-bound of i-th dimension, • ((…i1 n2+i2)n3+i3)…)nk+ik)×w + base-((…((low1 n2+low2)n3+low3)…)nk+lowk)×w VARPART CONSPART
Array Element Processing Grammar • L → id [ Elist ] | id Elist→Elist,E | E To facilitate processing, We rewrite the grammar as L→Elist ] | id Elist→Elist, E | id [ E
New attributes and functions • Elist.array : Symbol table entry of id • Elist.ndim :number of dimensions. • Elist.place :a temporary variable to store the value calculated from the index expression. • limit(array,j) :return the length of the j-th dimension.
Each non-terminal Lhas two attribute values • L.place: • Symbol table entry of L if L is a simple variable • CONSPART value if L is a indexed variable • L.offset : • Null if L is a simple variable • VARPART value if L is a indexed variable
(1) S→L:=E (2) E→E+E (3) E→(E) (4) E→L (5) L→Elist ] (6) L→id (7) Elist→ Elist, E (8) Elist→id [ E
(1) S→L:=E { if L.offset=null then /*L is a simple variable*/ emit(L.place ‘:=’ E.place) else emit( L.place ‘ [’ L.offset ‘]’ ‘:=’ E.place)} (2) E→E1 +E2 { E.place:=newtemp; emit(E.place ‘:=’ E 1.place ‘+’ E 2.place)}
(3) E→(E1) {E.place:=E1.place} (4) E→L { if L.offset=null then E.place:=L.place else begin E.place:=newtemp; emit(E.place ‘:=’ L.place ‘[’ L.offset ‘]’ ) end }
A[i1,i2,…,ik]((…i1 n2+i2)n3+i3)…)nk+ik)×w +base-((…((low1 n2+low2)n3+low3)…)nk+lowk)×w (8) Elist→id [ E { Elist.place:=E.place; Elist.ndim:=1; Elist.array:=id.place }
A[ i1,i2,…,ik ]( (…i1 n2+i2)n3+i3)…)nk+ik)×w +base-((…((low1 n2+low2)n3+low3)…)nk+lowk)×w (7) Elist→ Elist1, E { t:=newtemp; m:=Elist1.ndim+1; emit(t ‘:=’ Elist1.place ‘*’ limit(Elist1.array,m) ); emit(t ‘:=’ t ‘+’ E.place); Elist.array:= Elist1.array; Elist.place:=t; Elist.ndim:=m }
A[i1,i2,…,ik]((…i1 n2+i2)n3+i3)…)nk+ik) ×w +base-((…((low1 n2+low2)n3+low3)…)nk+lowk)×w (5) L→Elist ] { L.place:=newtemp; emit(L.place ‘:=’ Elist.array ‘-’ C); L.offset:=newtemp; emit(L.offset ‘:=’ w ‘*’Elist.place) } (6) L→id { L.place:=id.place; L.offset:=null }
Outline • Intermediate Code Representation • Expressions Translation • Array Element Translation • Type Conversion • Boolean Expression Translation • Procedure Translation
Type Conversion • E.type: the data type of non-terminal E • Suppose there are two data types: • int op • real op • The semantic action for EE1 op E2: { if E1.type=integer and E2.type=integer E.type:=integer else E.type:=real }
Type Conversion Example • x:=y+i*j in which x,yare realand i,j are int。 Three address codes: T1:=i int* j T3:=inttoreal T1 T2:=y real+ T3 x:=T2
Semantic Action for E→E1 +E2 { E.place:=newtemp; if E1.type=integer and E2.type=integer then begin emit (E.place ‘:=’ E 1.place ‘int+’ E 2.place); E.type:=int end else if E1.type=real and E2.type=real then begin emit (E.place ‘:=’ E 1.place ‘real+’ E 2.place); E.type:=real end
else if E1.type=integer and E2.type=real then begin u:=newtemp; emit (u ‘:=’ ‘inttoreal’ E 1.place); emit (E.place ‘:=’ u ‘real+’ E 2.palce); E.type:=real end else if E1.type=real and E1.type=integer then begin u:=newtemp; emit (u ‘:=’ ‘inttoreal’ E 2.place); emit (E.place ‘:=’ E 1.place ‘real+’ u); E.type:=real end else E.type:=type_error}
Outline • Intermediate Code Representation • Expressions Translation • Array Element Translation • Type Conversion • Boolean Expression Translation • Procedure Translation
Two translation methods • Direct translation: A or B and C=D (1) (=, C, D, T1) (2) (and, B, T1, T2) (3) (or, A, T2, T3) • Translation with optimization • if (x<100 or x>200 and x<>y) x:=0; • if x<100 goto L2ifFalse x>200 goto L1ifFlase x<>y goto L1L2: x=0L1:
Outline • Three-Address Code • Expressions Translation • Array Element Translation • Type Conversion • Boolean Expression Translation • Direct Translation • Optimized Translation • Backpatching • Procedure Translation
Direct translation • a or b and not c can be translated into T1:=not c T2:=b and T1 T3:=a or T1 • a<bcan be written as if a<b then 1 else 0 Hence, it can translated into 100: if a<b goto 103 101: T:=0 102: goto 104 103: T:=1 104:
Boolean Expression Direct Translation SDT • emit – print the three address code to the output file • nextstat – address index for the next three address code • emitwill add 1 to nextstat by generating a new three address code
Boolean Expression Direct Translation SDT E→E1 or E2 {E.place:=newtemp; emit(E.place ‘:=’ E 1.place ‘or’ E2.place)} E→E1 and E2 {E.place:=newtemp; emit(E.place ‘:=’ E 1.place ‘and’ E2.place)} E→not E1 {E.place:=newtemp; emit(E.place ‘:=’ ‘not’ E 1.place)} E→(E1) {E.place:=E1.place}
Boolean Expression Direct Translation SDT • a<b is translated into • 100: if a<b goto 103 • 101: T:=0 • 102: goto 104 • 103: T:=1 • 104: Eid1 relop id2 { E.place:=newtemp; emit(‘if’ id1.place relop. op id2. place ‘goto’ nextstat+3); emit(E.place ‘:=’ ‘0’); emit(‘goto’ nextstat+2); emit(E.place‘:=’ ‘1’) } E→id { E.place:=id.place }
a<b or c<d and e<fDirection Translation Eid1 relop id2 { E.place:=newtemp; emit(‘if’ id1.place relop. op id2. place ‘goto’ nextstat+3); emit(E.place ‘:=’ ‘0’); emit(‘goto’ nextstat+2); emit(E.place‘:=’ ‘1’) } E→id { E.place:=id.place } E→E1 or E2 { E.place:=newtemp; emit(E.place ‘:=’ E 1.place ‘or’ E2.place)} E→E1 and E2 { E.place:=newtemp; emit(E.place ‘:=’ E 1.place ‘and’ E2.place) } 100: if a<b goto 103 101: T1:=0 102: goto 104 103: T1:=1 104: if c<d goto 107 105: T2:=0 106: goto 108 107: T2:=1 108: if e<f goto 111 109: T3:=0 110: goto 112 111: T3:=1 112: T4:=T2 and T3 113: T5:=T1 or T4
Outline • Three-Address Code • Expressions Translation • Array Element Translation • Type Conversion • Boolean Expression Translation • Direct Translation • Optimized Translation • Backpatching • Procedure Translation