140 likes | 400 Views
Code generation. Our book's target machine (appendix A): opcode source1 , source2 , destination add r1, r2, r3 addI r1, c, r2 loadI c, r2 load r1, r2 loadAI r1, c, r2 // r2 := *(r1+c) loadAO r1, r2, r3 i2i r1, r2 // r2 := r1 (for integers)
E N D
Code generation • Our book's target machine (appendix A): • opcodesource1, source2, destination • add r1, r2, r3 • addI r1, c, r2 • loadI c, r2 • load r1, r2 • loadAI r1, c, r2 // r2 := *(r1+c) • loadAO r1, r2, r3 • i2i r1, r2 // r2 := r1 (for integers) • cmp_LE r1, r2, r3 // if r1<=r2, r3:=true, else r3:=false • cbr r1, L1, L2 // if (r1) goto L1, else goto L2 • jump r1 • Symbols: • @x represents x's offset from the sp
Code generation • Let's start with some examples. • Generate code from a tree representing x = a+2 - (c+d-4) • Issues: • which children should go first? • what if we already had a-c in a register? • Does it make a difference if a and c are floating point as opposed to integer? • Generate code for a switch statement • Generate code forw = w*2*x*y*z
Code generation • Code generation = • instruction selection • instruction scheduling • register allocation
Instruction selection • IR to assembly • Why is it an issue? • Example: copy a value from r1 to r2 • Let me count the ways... • Criteria • How hard is it? • Use a cost model to choose. • How about register usage?
Instruction selection • How hard is it? • Can make locally optimal choices • Global optimality is NP-complete • Criteria • speed of generated code • size of generated code • power consumption • Considering registers • Assume enough registers are available, let register allocator figure it out.
Instruction scheduling • Reorder instructions to hide latencies. • Example: ( ) loadAI $sp, @w, r1 ( ) add r1, r1, r1 ( ) loadAI $sp, @x, r2 ( ) mult r1, r2, r1 ( ) loadAI $sp, @y, r2 ( ) mult r1, r2, r1 ( ) loadAI $sp, @z, r2 ( ) mult r1, r2, r1 ( ) storeAI r1, $sp, @w memory ops : 3 cycles multiply : 2 cycles everything else: 1 cycle
Instruction scheduling • Reorder instructions to hide latencies. • Example: • (1) loadAI $sp, @w, r1 • (4) add r1, r1, r1 • (5) loadAI $sp, @x, r2 • (8) mult r1, r2, r1 • (9) loadAI $sp, @y, r2 • (12) mult r1, r2, r1 • (13) loadAI $sp, @z, r2 • (16) mult r1, r2, r1 • (18) storeAI r1, $sp, @w (1) loadAI $sp, @w, r1 (2) loadAI $sp, @x, r2 (3) loadAI $sp, @y, r3 (4) add r1, r1, r1 (5) mult r1, r2, r1 (6) loadAI $sp, @z, r2 (7) mult r1, r3, r1 (9) mult r1, r2, r1 (11) storeAI r1, $sp, @w (13)
Instruction scheduling • Reorder instructions to hide latencies. • Example2: (1) loadAI $sp, @x, r1 (4) mult r1, r1, r1 (6) mult r1, r1, r1 (8) mult r1, r1, r1 (10) storeAI r1, $sp, @x
Instruction scheduling • Reorder instructions to hide latencies. • We need to collect dependence info • Scheduling affects register lifetimes ==> different demand for registers • Should we do register allocation before or after? • How hard is it? • more than one instructions may be ready • too many variables may be live at the same time • NP-complete!
Register allocation • Consists of two parts: • register allocation • register assignment • Goal : minimize spills • How hard is it? • BB w/ one size of data: polynomial • otherwise, NP-complete • based on graph coloring.
Code generation • Generating code for simple expressions • If the expression is represented by a tree, a post-order walk gives an evaluation sequence • Changing the evaluation sequence may result in code that uses fewer registers. • Idea: find out how many registers are needed for each subtree and evaluate the most expensive one first. • Consider the following algorithm that labels tree nodes with the number of registers needed (for the ILOC architecture): 1 if n is a left leaf, or right leaf and variable 0 if n is a right leaf and constant max(labellchild, labelrchild) if labellchild labelrchild labellchild +1 if labellchild == labelrchild label(n) =
Code generation • Generating code for simple expressions • The idea behind our algorithm: • Variables need to be loaded in registers, so leaves need one register • For rhs constants we can use opI operations, so no extra register needed • If we need k registers for each subtree, we'll use k for the left one, keep 1 for the result, and then use another k for the right subtree. That's a total of k+1 registers • If we need k registers for one subtree and m<k for the other, then use k for the one subtree, keep the result in 1, use the remaining k-1 (which is at least m) for the other subtree. That's a total of k registers. • When generating code, make sure to evaluate the most expensive subtree first.
Code generation • Generating code for expressions • What if an expression contains a function call? • Should the compiler be allowed to change the evaluation order? • Big idea #1 wrt to optimizing or moving code around : Be conservative! • Generating code for • boolean operators • relational operators read the book • array references