270 likes | 629 Views
Intermediate Code Generation. Mooly Sagiv msagiv@post.tau.ac.il Schrierber 317 03-640-7606 Wed 10:00-12:00 html://www.math.tau.ac.il/~msagiv/courses/wcc01.html Chapter 7 (Chapter 6 next week). Source program (string). Basic Compiler Phases. lexical analysis. Tokens. syntax analysis.
E N D
Intermediate Code Generation Mooly Sagiv msagiv@post.tau.ac.il Schrierber 317 03-640-7606 Wed 10:00-12:00 html://www.math.tau.ac.il/~msagiv/courses/wcc01.html Chapter 7 (Chapter 6 next week)
Source program (string) Basic Compiler Phases lexical analysis Tokens syntax analysis Abstract syntax tree semantic analysis Translate Intermediate representation Instruction selection Assembly Register Allocation Fin. Assembly
C++ C++ Pentium Pentium Java Java MIPS MIPS C C Sparc Sparc Pascal Pascal ML ML Why use intermediate languages? • Simplify the compilation phase • ultimately leads to a more efficient code • Portability of the compiler front-end • Reusability of the compiler back-end IR
IR Design Goals • Convenient to generate IR from the source • Convenient to generate machine code from IR • Missmatches between Source and Target • Clear operational meaning • Solution • Simple intermediate instructions • Tree like expressions
T_stm ::= T_stm T_stm (T_SEQ) T_stm ::= T_label (T_LABEL) (T_JUMP) T_stm ::= T_exp Temp_labelList T_stm::= T_relop T_exp T_exp Temp_label Temp_label (T_CJUMP) T_stm::=T_exp T_exp (T_MOVE) T_stm ::= T_exp (T_EXP) T_exp ::=T_binop T_exp T_Exp (T_BINOP) T_exp ::= Temp_temp (T_TEMP) T_exp ::=- Temp_label (T_LABEL) T_exp ::=int (T_CONST) T_exp::= T_exp T_expList (T_CALL) A Grammar for the Tree IR T_exp ::=T_exp (T_MEM) T_exp ::= T_stm T+exp (T_ESEQ)
/* tree.h */ typedef struct T_exp_ *T_exp; struct T_stm_ { enum {T_SEQ, T_LABEL, T_JUMP, …, T_EXP} kind; union { struct {T_stm left, right;} SEQ; … } u;}; T_stm T_Seq(T_stm left, T_stm right); T_stm T_Label(Temp_label); T_stm T_Jump(T_exp exp, Temp_labelList labels); T_stm T_Cjump(T_relOp op, T_exp left, T_exp right, Temp_label _true, Temp_label _false ); T_stm T_Move(T_exp, T_exp); T_stm T_Exp(T_exp); typedef enum {T_plus, T_minus, T_mul, T_div, T_and, T_or, T_lshift, T_rshift, T_arshift, T_xor} T_binOp ; typedef enum {T_eq, T_ne, T_lt, T_gt, T_le, T_ge, T_ult, T_ule, T_ugt, T_uge} T_relOp; struct T_ exp_ { enum {T_BINOP, T_MEM, T_TEMP, …, T_CALL} kind; union {struct {T_binop op; T_exp left; T_exp right;} BINOP; … } u; } ;
Example factorial let function nfactor (n: int): int = if n = 0 then 1 else n * nfactor(n-1) in nfactor(10) end
Abstract Tiger Program letExp(decList( functionDec(fundecList( fundec(nfactor, fieldList( field(n, int, fld-escaped=FALSE), fieldList()), int, ifExp( opExp(EQUAL, varExp(simpleVar(n)), intExp(0)), intExp(1), opExp(TIMES, varExp(simpleVar(n)), callExp(nfactor, expList(opExp(MINUS, varExp(simpleVar(n)), intExp(1)), expList()))))), fundecList())), decList()), seqExp(expList( callExp(nfactor, expList(intExp(10), expList())), expList())))
IR for Main /* prologue of main starts with l1 */ /* body of main */ MOV(TEMP(RV), CALL(NAME(l2), ExpList(CONST(10), null /* next argument */))) /* epilogue of main */
IR for nfact /* Prologue of nfunc starts with l2 */ /* body of nfunc */ MOV(TEMP(RV), ESEQ(SEQ( CJUMP(=, “n”, CONST(0), NAME(l3), NAME(l4)), LABEL(l3) /* then-clause */, MOV(TEMP(t1), CONST(1)), JUMP(NAME(l5)), LABEL(l4), /* else-clause */ MOV(TEMP(t1), BINOP(MUL, “n”, CALL(NAME(l2), ExpList(BINOP(MINUS, “n”, CONST(1)), null /* next argument */)))), LABEL(l5)), TEMP(t1))) /* epilogue of nfunc */
Outline of the Translation (translate.c) • Top-down traversal over the abstract syntax tree • Generate code to allocate memory for declarations and initializations (next week) • Generate code for function declarations: • Prologue • The body expression • Epilogue • Generate code for expressions • Value expressions • x + y • Location expressions • x < y • Statements • x := y • while-statement
The rest of this lecture • L-values and R-Values • Arithmetic expressions • Conditionals and Loops • Conversions • Complex data types • Arrays • Structures • Memory Checks
L-values vs. R-values • Assignment x := exp is compiled into: • Compute the address of x • Compute the value of exp • Store the value of exp into the address of x • Generalization • R-value • L-value
Translating Expressions • Straightforward by induction on the abstract expression tree /* translate.c */ Tr_exp Tr_opExp(A_oper oper, Tr_exp left, Tr_exp right) { switch (oper) { case A_plusOp: return Tr_opArithExp(T_plus, left, right); case A_minusOp: return Tr_opArithExp(T_minus, left, right); case A_timesOp: … case A_eqOp: return Tr_opCondExp(T_eq,left,right); case A_neqOp: return Tr_opCondExp(T_ne,left,right); case A_ltOp: … } assert(0); return NULL; }
Conditional Expressions • Translating Expressions in Conditions may be tricky • Two options • Value computation • Compute a value of Boolean Expression • Location computation • Compute a label in the code that will be reached if the expression holds • Allows shortcut computations
Example C code • if (a < 6 && b+1 >7) a = b * b • Tree code
SEQ CJUMP SEQ GT “a” “b” NAME NAME LABEL SEQ t f t Code for x:=5 LABEL f Conditional Expressions in Tiger • if a >b then x := 5 static Tr_exp Tr_opCondExp( T_relOp oper, Tr_exp left, Tr_exp right) { struct Cx cx; cx.stm = T_Cjump(oper, left, right, NULL, NULL); cx.trues = PatchList(cx.stm->u.CJUMP._true, NULL); cx.falses = PatchList(cx.stm->u.CJUMP._false, NULL); return Tr_Cx(cx.trues, cx.falses, cx.stm); }
Loops • Similar to if-then else • Need to handle break
Conversions • Local translation may lead to converting representations • Value-computation Location-computation • Examples if (x+5) then 0 else 1 (a > b) + b (if a>b then a else b) + 1 x := (a > b) x := if (a>b) then a else b
Complex Data Types • Data types like arrays, strings, and records may require special treatment • Important questions • Duration • Static vs. Dynamic size • Structured L-values
Complex Data Types in Tiger • Arrays, strings, and record’s fields are long-lived • Usually allocated in the heap • No structured L-values • Example: Tiger Record Allocation ESEQ (SEQ ( MOV(TEMP r, CALL(NAME MALLOC, CONST 2*W)), SEQ( MOV(MEM(+(0*W, TEMP r)), TransExp(e1))), MOV(MEM(+(1*W, TEMP r)), (TransExp(e2))))), TEMP r) type foo = { a : ty1 , b : ty2} ... = foo {a =e1, b = e2}
Example Tiger Arrays let type intArray = array of int var a := intArray[12] of 0 var b := intArray[13] of 7 in a := b SEQ( SEQ( CONST 0, SEQ( MOVE(TEMP ta, CALL(NAME initArray, CONST 12, CONST 0)), SEQ( MOVE(TEMP tb, CALL(NAME initArray, CONST 13, CONST 7)), MOVE(TEMP ta, TEMP tb)))))
L-values of Arrays and Structures(Tiger) • The l-value of a[i] MEM(+(“a”, *(CONST W, “i”))) • For a structure s.f MEM(+(“s”, *(CONST W, CONST kf)))
Big L-values • In some programming languages, more than one word need to be copied or stored • Examples: • C structures • Pascal arrays • How can this be handled?
Memory checks • Can the compiler guarantee that no invalid memory is referred • At compile-time • At runtime? • Examples • Array references • Algol, Pascal, Java, PL.1 • Runtime checks • C • No checks • Ada • User control • Field and pointer dereferences • The best solutions combine runtime and compile-time checks
Summary • Intermediate code simplifies the translation and increases re-use • Tree-like intermediate code simplifies the translation of expressions • No temporaries • Abstract syntax helps • Memory management is interesting • Mostly next week