1 / 22

Peephole Optimization

Peephole Optimization. Final pass over generated code: examine a few consecutive instructions: 2 to 4 See if an obvious replacement is possible: store/load pairs MOV %eax => mema MOV mema => %eax Can eliminate the second instruction without needing any global knowledge of mema

bjorn
Download Presentation

Peephole Optimization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Peephole Optimization • Final pass over generated code: • examine a few consecutive instructions: 2 to 4 • See if an obvious replacement is possible: store/load pairs • MOV %eax => memaMOV mema => %eax • Can eliminate the second instruction without needing any global knowledge of mema • Use algebraic identities • Special-case individual instructions

  2. Algebraic identities worth recognizing single instructions with a constant operand: • A * 2 = A + A • A * 1 = A • A * 0 = 0 • A / 1 = A • More delicate with floating-point

  3. Is this ever helpful? • Why would anyone write X * 1? • Why bother to correct such obvious junk code? • In fact one might write • #define MAX_TASKS 1...a = b * MAX_TASKS; • Also, seemingly redundant code can be produced by other optimizations. This is an important effect.

  4. Replace Multiply by Shift • A := A * 4; • Can be replaced by 2-bit left shift (signed/unsigned) • But must worry about overflow if language does • A := A / 4; • If unsigned, can replace with shift right • But shift right arithmetic is a well-known problem • Language may allow it anyway (traditional C)

  5. Addition chains for multiplication • If multiply is very slow (or on a machine with no multiply instruction like the original SPARC), decomposing a constant operand into sum of powers of two can be effective: • X * 125 = x * 128 - x*4 + x • two shifts, one subtract and one add, which may be faster than one multiply • Note similarity with efficient exponentiation method

  6. The Right Shift problem • Arithmetic Right shift: • shift right and use sign bit to fill most significant bits • -5 111111...1111111011 • SAR 111111...1111111101 • which is -3, not -2 • in most languages -5/2 = -2 • Prior to C99, implementations were allowed to truncate towards or away from zero if either operand was negative

  7. Folding Jumps to Jumps • A jump to an unconditional jump can copy the target address • JNE lab1 ...lab1 JMP lab2 • Can be replaced by JNE lab2 • As a result, lab1 may become dead (unreferenced)

  8. Jump to Return • A jump to a return can be replaced by a return • JMP lab1 ... lab1 RET • Can be replaced by RET • lab1 may become dead code

  9. Tail Recursion Elimination • A subprogram is tail-recursive if the last computation is a call to itself: function last (lis : list_type) return lis_type is begin if lis.next = null then return lis; else return last (lis.next); end; • Recursive call can be replaced with lis := lis.next; goto start; -- added label

  10. Advantages of tail recursion elimination • saves time: an assignment and jump is faster than a call with one parameter • saves stack space: converts linear stack usage to constant usage. • In languages with no loops, this may be a required optimization: specified in Scheme standard.

  11. Tail-recursion elimination at the Instruction Level • Consider the sequence on the x86: • CALL func RET • CALL pushes return point on stack, RET in body of func removes it, RET in caller returns • Can generate instead: JMP func • Now RET in func returns to original caller, because single return address on stack

  12. Peephole optimization in the REALIA COBOL compiler • Full compiler for Standard COBOL, targeted to the IBM PC. • Now distributed by Computer Associates • Runs in 150K bytes, but must be able to handle very large programs that run on mainframes • No global optimization possible: multiple linear passes over code, no global data structures, no flow graph. • Multiple peephole optimizations, compiler iterates until code is stable. Each pass scan code backwards to minimize address recomputations

  13. Typical COBOL code: control structures and perform blocks. • Process-Balance. if Balance is negative then perform Send-Bill else perform Record-Credit end-if.Send-Bill. ...Record-Credit. ...

  14. Simple Assembly: perform equivalent to call • Pb: cmp balance, 0 jnl L1 call Sb jmp L2L1: call RcL2: retSb: ... retRc: ... ret

  15. Fold jump to return statement • Pb: cmp balance, 0 jnl L1 call Sb jmp L2 -- jump to return • L1: call Rc L2: retSb: ... retRc: ... ret

  16. Corresponding Assembly • Pb: cmp balance, 0 jnl L1 -- jump to unconditional jump call Sb ret -- foldedL1: jmp Rc -- will become uselessL2: ret Sb: ... retRc: ... ret

  17. code following a jump is unreachable • Pb: cmp balance, 0 jnl Rc -- folded jmp Sb ret -- unreachableL1: jmp Rc -- unreachableSb: ... retRc: ... ret

  18. Jump to following instruction is a noop • Pb: cmp balance, 0 jnl Rc jmp Sb -- jump to next instructionSb: ... retRc: ... ret

  19. Final code • Pb: cmp balance, 0 jnl RcSb: ... retRc: ... ret • Final code as efficient as inlining. • All transformations are local. Each optimization may yield further optimization opportunities • Iterate till no further change

  20. Arcane tricks • Consider typical maximum computation • if A >= B then C := A;else C := B;end if; • For simplicity assume all unsigned, and all in registers

  21. Eliminating max jump on x86 • Simple-minded asm code • CMP A, B JNAE L1 MOV A=>C JMP L2L1: MOV B=>CL2: • One jump in either case

  22. Computing max without jumps on X86 • Architecture-specific trick: use subtract with borrow instruction and carry flag • CMP A, B ; CF=1 if B > A, CF = 0 if A >= B SBB %eax,%eax ; all 1's if B > A, all 0's if A >= B MOV %eax, C NOT C ; all 0's if B > A, all 1's if A >= B AND B=>%eax ; B if B>A, 0 if A>=B AND A=>C ; 0 if B >A, A if A>=B OR %eax=>C ; B if B>A, A if A>=B • More instructions, but NO JUMPS • Supercompiler: exhaustive search of instruction patterns to uncover similar tricks

More Related