340 likes | 463 Views
CSE P501 – Compiler Construction. Mapping source code to x86 Basic statements and expressions Objects, method calls, and dynamic dispatch. Review: Variables. For MiniJava , data is held in either A stack frame - for variables local to the method An object - for fields
E N D
CSE P501 – Compiler Construction Mapping source code to x86 Basic statements and expressions Objects, method calls, and dynamic dispatch Jim Hogg - UW - CSE - P501
Review: Variables • For MiniJava, data is held in either • A stack frame - for variables local to the method • An object - for fields • Local variables accessed via ebp ('above' ebp) • eg: moveax, [ebp- 12] • Fields accessed via object address in a register • Instead of ebp pointing to start of frame, use ecx pointing to start of object • Details later Jim Hogg - UW - CSE - P501
Basic Statements and Expressions • How to translate language constructs from MiniJava into assembly code? • X = Y, X op Y, IF-THEN, IF-THEN-ELSE, WHILE, etc • If you know assembler code for any chip, the translation templates should be obvious • If you don't know assembler code, imagine translating MiniJava constructs into a primitive language whose only control flow is GOTO Jim Hogg - UW - CSE - P501
Conventions in Examples • Examples show code snippets in isolation • Real code generator needs to deal with things like: • Which registers are in use at which point in the program • Which registers to spill into memory (pushed onto stack or stored in stack frame) when a new register is needed and no free ones are available • Register eaxused below as a generic example • Rename as needed for more complex code • Examples include a few peephole optimizations Jim Hogg - UW - CSE - P501
Code Generation for Constants • Source • 17 • Idea • Realize constant value in a register • x86 • moveax, 17 • Peephole Optimization • if constant is 0, then: xoreax, eax Jim Hogg - UW - CSE - P501
Assignment Statement • Source • var = exp; // var is a local variable • x86 • <evaluate exp into, say, eax> • mov [ebp-offsetvar], eax Jim Hogg - UW - CSE - P501
Unary Minus • Source • -exp • x86 • <evaluate exp into eax> • negeax • Optimization • Collapse -(-exp) to exp • Unary plus is a no-op Jim Hogg - UW - CSE - P501
Binary + • Source exp1 + exp2 • x86 <evaluate exp1 into eax> <evaluate exp2 into edx> add eax, edx • Optimizations • If exp2 is a simple variable or constant: add eax, exp2 • Change exp1 + (-exp2) into exp1 - exp2 • If exp2 is 1 then: inceax Jim Hogg - UW - CSE - P501
Binary -, * • Same as + • Use sub for – (but not commutative!) • Use imul for * • Optimizations • Use left shift to multiply by powers of 2 • If your multiplier is really slow or you’ve got free scalar units and multiplier is busy, consider: 10*x = (8*x)+(2*x) • But this may perform worse on a different micro-architecture • Use x+x instead of 2*x, etc (faster) • Use dec for x-1 Jim Hogg - UW - CSE - P501
Integer Division • Slow on x86 • Divides 64-bit integer in edx:eax by a 32-bit integer (eg, in ebx) • ~50 clock latency • Source exp1 / exp2 • x86 <evaluate exp1 into eax> <evaluate exp2 into ebx> cdq ; extend eax into edx:eax, clobbers edx idivebx ; => quotient in eax; remainder in edx Jim Hogg - UW - CSE - P501
Control Flow • Basic idea: decompose higher level operation into jmps and conditional jmps • In following slides, jfalse is used to mean jump when a condition is false • No such instruction on x86! • Will have to use appropriate instructions to set condition codes, followed by conditional jumps • Normally wouldn’t actually generate the value “true” or “false” in a register Jim Hogg - UW - CSE - P501
While • Source while (cond) stm • x86 loop: <evaluate cond> jfalse done <code for stm> jmp loop done: • Note • In generated asm code we’ll need to create a unique label name for each loop, conditional statement, etc. eg: L123: Jim Hogg - UW - CSE - P501
Optimization for While • Put the test at the end jmp test loop: <code for stm> test: <evaluate cond> jtrue loop • Why bother? • Pulls one jmp instruction out of the loop • May avoid pipeline stall on jmp on each iteration • But many chips assume backwards-branch is always taken • Hardware branch-predictor will 'tune in' to the pattern • Easy to do from AST or other IR; not so easy if generating code on-the-fly (eg, recursive-descent, 1-pass compiler) Jim Hogg - UW - CSE - P501
Do-While • Source do stm while (cond); • x86 loop: <code for stm> <evaluate cond> jtrue loop Jim Hogg - UW - CSE - P501
IF-THEN • Source if (cond) stm • x86 <evaluate cond> jfalse skip <code for stm> skip: Jim Hogg - UW - CSE - P501
IF-THEN-ELSE • Source if (cond) stm1 else stm2 • x86 <evaluate cond> jfalse else <code for stm1> jmp done else: <code for stm2> done: Jim Hogg - UW - CSE - P501
Jump Chaining • Observation: naïve code gen can will produce jumps to jumps • Optimization: if a jump targets an unconditional jump, change target of first jump to the target of the second • Repeat until no further changes Jim Hogg - UW - CSE - P501
IF-ELSEIF-ELSE • x86 <evaluate cond1> jfalseelseif <code for stm1> jmp done elseif: <evaluate cond2> jfalseelse <code for stm2> jmp done else: <code for stm3> jmp done done: • Source if (cond1) stm1 else if (cond2) stm2 else stm3 Note: jump-to-next redundancy Jim Hogg - UW - CSE - P501
Boolean Expressions • What do we do with this? x > y • Expression evaluates to true or false • Could generate the value in a register (eg: 0|1) • But normally we don’t want/need the value; we’re only trying to decide whether to jump • Eg: if (exp1 > exp2) stm <evaluate exp1 to eax> <evaluate exp2 to edx> cmpeax, edx jle L123 <code for stm> L123: Jim Hogg - UW - CSE - P501
Boolean Operators: ! • Source • ! exp • eg: if (!exp) stm • To compile: • <evaluate expinto eaxas 0 or 1> • jnzL123, where L123 is label after stm Jim Hogg - UW - CSE - P501
Boolean Operators: && and || • In C/C++/Java/C#, these are short-circuit operators • Right operand is evaluated only if needed • if (cond1 || cond2) • if cond1 evaluates to true, then no need to evaluate cond2 • if (cond1 && cond2) • if cond1 evaluates to false, then no need to evaluate cond2 • if (p != null && p.x == 2) . . . • this check relies upon short-circuit evaluation! • Basically, generate if statements that jump appropriately and only evaluate operands when needed Jim Hogg - UW - CSE - P501
Example: Code for && • Source • if (exp1 && exp2) stm • x86 <evaluate exp1> jfalse skip <evaluate exp2> jfalse skip <code for stm> skip: Jim Hogg - UW - CSE - P501
Example: Code for || • Source • if (exp1 || exp2) stm • x86 <evaluate exp1> jtruedoit <evaluate exp2> jfalse skip doit: <code for stm> skip: Jim Hogg - UW - CSE - P501
Realizing Boolean Values • If a boolean value needs to be stored in a variable or method call parameter, generate code to create result • Typical representations: 0 for false, +1 or -1 for true • C specifies 0 and 1 if stored; we’ll use that • Best choice can depend on machine instructions; normally some convention is established during the primeval history of the architecture (eg: VAX blbc) Jim Hogg - UW - CSE - P501
Boolean Values: Example • Source • var = bexp ; • x86 <code for bexp> jfalsegenFalse moveax, 1 jmpstoreIt genFalse: moveax, 0 storeIt: mov [ebp+offsetvar], eax Jim Hogg - UW - CSE - P501
Better, If Enough Registers • Source • var = bexp ; • x86 xoreax, eax <code for bexp> jfalsestoreIt inceax storeIt: mov [ebp+offsetvar], eax • Or use conditional move instruction; avoids pipeline stalls moveax, 1 ; assume bexp true <code for bexp> cmovzeax, 0 mov [ebp+offsetvar], eax Jim Hogg - UW - CSE - P501
Alternatively: setcc • Source • var = x < y; • x86 moveax, [ebp+offsetx] ; load x cmpeax, [ebp+offsety] ; compare to y setl al; set AL to 0|1 if less than movzxeax,al ; zero-extend to 32 bits mov [ebp+offsetvar],eax ; generated by asgstm • Gnu mnemonic for movzx (byte->dbl word) is movzbl • Or use conditional move (cmovcc) instruction for sequences like:x = y < z ? y : z Jim Hogg - UW - CSE - P501
Other Control Flow: switch • Naïve: generate a chain of nested if-then-else’s • Better: switch is intended to allow an O(1) selection, provided the set of switch values is reasonably compact • Idea: create a 1-D array of jumps or labels and use the switch expression to select the right one • Also need to generate an IF check that expression lies within bounds Jim Hogg - UW - CSE - P501
Source switch (exp) { case 0: stms0; case 1: stms1; case 2: stms2; } X86-ish <evaluate exp into eax> “if (eax < 0 || eax > 2) jmp done” moveax, swtab[eax * 4] jmpeax L0: <code for stms0> jmp done L1: <code for stms1> jmp done L2: <code for stms2> jmp done done: .data swtabddL0 ; switch table dd L1 ddL Switch Assume no "fallthru" Jim Hogg - UW - CSE - P501
Arrays • C/C++/Java have 1-dimensional arrays • 0-origin • array with n elements contains variables: a[0] … a[n-1] • C/C++ n-dimensional arrays • Multiple dimensions; row major order • Java n-dimensional "jagged" arrays Jim Hogg - UW - CSE - P501
0-Origin 1-D Integer Arrays • Source exp1[exp2] • x86 <evaluate exp1into eax> <evaluate exp2into edx> address is [eax+4*edx] ; 4 bytes per element contiguous cells in memory Jim Hogg - UW - CSE - P501
int[][] array = {{3, 4, 5}, {77, 50}}; 2-D Arrays ... • C, etc (the curly-brace languages) use row-major order • Fortran (and Excel) uses column-major order • Exercises: What is the layout? How do you calculate location of a[i, j]? What happens when you pass array references between Fortran and C code? • Java has "jagged" arrays - array of arrays Java jagged arrays int[][] a = { {1,5,-4,23}, {1,2,6,3}, {99} } 1 5 -4 23 1 2 6 3 99 Jim Hogg - UW - CSE - P501
How to find A[row, col] column row-major layout row int A[row, col] = @A + (row * numcols + col) * 4 // 0-based A[row, col] = @A + (row – rowmin) * (colmax – colmin + 1) * elemsize + (col – colmin) * elemsize Jim Hogg - UW - CSE - P501
Next • Code Generation for Objects • Representation • Method calls • Inheritance and overriding • Strategies for implementing code generators • Code improvement – optimization Jim Hogg - UW - CSE - P501