320 likes | 441 Views
Ch 15. Procedure Optimizations. 2006.5.1 고급 컴파일러 발표 발표자 : 김영식. Overview. Tail-Call Optimization vs. Tail-Recursion Elimination Procedure Integration vs. In-line Expansion Leaf-routine Optimization vs. Shrink Wrapping. Drawbacks of “ Call ”. Calling convention
E N D
Ch 15. Procedure Optimizations 2006.5.1 고급 컴파일러 발표 발표자 : 김영식
Overview • Tail-Call Optimization vs. Tail-Recursion Elimination • Procedure Integration vs. In-line Expansion • Leaf-routine Optimization vs. Shrink Wrapping
Drawbacks of “Call” • Calling convention • caller : parameter passing, caller-saved register, return address branch • callee • prologue : save FP, compute SP, callee-saved register • epilogue : callee-saved register, return value, SP, FP, branch • Optimization view • less chance to optimize between proceduresex) aliasing
Definition • Tail-Recursion • Tail-Call void f(int x) { ... g(x); (return;) } void f(int x) { ... f(x); (return;) }
Effect of tail-recursion • eliminate procedure-call overhead • enable loop optimization void insert_node(int n, struct node *l) { if(n > l→values) if(l→next==null) make_node(l,n); else insert_node(n,l→next); } void insert_node(int n, struct node *l) { Loop: if(n > l→values) if(l→next==null) make_node(p,n); else { l = l→next; goto Loop; } }
Effect of tail-call void make_node(struct node *p, int n) { L0: struct node *q = malloc(...); p→next = q; ... } void insert_node(int n, struct node *l) { if(n > l→value) if(l→next==null) make_node(l, n); ... } ? goto L0: • Two problems of high level implementation 1. Branch into the body of the other procedure 2. Local scope of parameters
Low-level implementation make_node : ... return insert_node : ... if !r7 goto L2 r2 ← r1 r1 ← r4 call make_node return L2 : r2 ← r2 *. next call insert_node return make_node : ... return insert_node : ... if !r7 goto L2 r2 ← r1 r1 ← r4 goto make_node L2 : r2 ← r2 *. next goto insert_node
An issue about stack frame • Original stack frame implementation (function call) caller’s caller (main() or else) caller (insert_node()) fp callee (make_node()) sp
An issue about stack frame • Stack frames after optimization caller’s caller caller callee • Result of optimization : one procedure (caller + callee) • We don’t know the size of stack frame needed by callee • If (stack frame size of callee) > (stack frame size of caller) → allocate the remainder of the callee’s stack frame → deallocate caller’s stack frame & reallocate callee’s stack by standard procedure prologue
Determining tail-call • The routine performing the call does nothingafter the call returns except itself return • It’s easy!
Performing tail-recursion elimination • Replace the recursive call by • assigning proper values to the parameters, and • branch to the beginning of the body of the procedure • Delete ‘return’ after recursive call void replace(int n) { Loop: if(n>=10) return; if(A[n]==0) A[n]=1; else { n = n+1; goto Loop; } void replace(int n) { if(n>=10) return; if(A[n]==0) A[n]=1; else replace(n+1); }
Performing tail-call optimization (1) • Both procedure bodies should be visible to the compiler. • same compilation unit / saving intermediate-code • Need to know about callee. • where it expects to find its parameters. • where to branch to. • stack frame size.
Performing tail-call optimization (2) • Replace the call by three things, • evaluation of the arguments and putting them where the callee expects to find them. • if callee’s stack frame is larger than caller’s, an instruction that extends stack frame as difference. • a branch to the beginning of the body of the callee • Also, delete ‘return’ after a call. insert_node : ... r2 ← r1 r1 ← r4 call make_node return insert_node : ... r2 ← r1 r1 ← r4 goto make_node
An issue about architecture • Alpha Both “jmp” / “jsr” (jump to subroutine) use registers as operand • MIPS Both “jal” (jump and link) / “j” use 26-bit absolute target address • SPARC“call” : 30-bit PC-relative word displacement“ba” (branch always) : 22-bit PC-relative word displacement“jmpl” : 32-bit absolute address (sum of two registers)
Procedure Integration • Also called, ‘automatic inlining’ • Replace calls to the copy of the procedure body • call : unknown effect of the objects in the procedure on aliased variableslocal code : expose effects, enable more optimization • Better than ‘inline’ of C++, optimized by user’s intuition
Issues of Procedure Integration (1) • Range of inlined procedure • need to save intemediate-code representations • Languages of caller and callee (cross compilation units) • different languages require different parameter passing conventions • “external language_name procedure_name” declaration to specify source languates • Saving intermediate-code of inlined routines • depends on the purpose of saving intermediate-code
Issues of Procedure Integration (2) • Need to compile a copy of the whole inlined procedure • address of the procedure has been taken • calls from other compilation units, currently invisible • Inlining on recursive procedures • until running out of calls to them - could be infinite process • can be valuable to inline once or twice
Which procedures are worth inlining? (1) • Goal : reduce execution time • Inlining every procedure • decrease overhead costs of call • increase object code size → more cache misses • compilation terminates only by exhaustion of resources • We need heuristics or profiling feedback
Which procedures are worth inlining? (2) Choose the procedure • whose body size is small, • that is called less, • that is called inside a loop, and • whose call includes constant-valued parameter
How to perform the inlining? • Three major issues • Different parameter passing conventions • “external language_name procedure_name” declaration • call-by-reference in Fortran vs. call-by-value in C • Name conflicts • conflicts between source symbol names • detect conflicts and rename symbols of called procedure • Static variables • makes only one copy of static variable • initialized once
In-Line Expansion • similar to procedure integration • low-level (assembly-language, machine code) • enables high-level operations providing templates • ex) exchange the values of two registers ra ← ra xor rb rb ← ra xor rb ra ← ra xor rb
In-Line Expansion • enables to write OS, I/O device drivers using high level language • ex) template - DisableInterrupts() functionality - setting bit 15 in the PSW getpsw ra ori ra,0x8000,ra setpsw ra
Providing in-line expansion • make assembly-language sequence into a template • performs inlining –“inliner” • specify real registers • definition of the template • register coalescing is needed .template ProcName, ArgBytes, regs=(r1,...,rn)...instructions....end
Leaf-Routine Optimization • leaf routine • leaf node in the call graph of a program • routine that calls no procedures • many procedures are leaf routines • leaf routine optimization • simplify the way parameters are passed • remove procedure prologue / epilogue • highly desirable with little effort
Candidates • the routine calls no others (obvious) • architecture-dependent • how many registers and stack the procedure requires? • requires no more registers than caller-saved registers • requires no stack space (w/ sufficient registers)
Shrink Wrapping (1) • moving prologue and epilogue code to enclose the minimal part of the procedure • inside a loop • making many copies of codes prologue prologue epilogue epilogue
Shrink Wrapping (2) prologue a > b a > b save a ← 1 a ← 1 c ← a c ← a c ← b save epilogue c ← b restore
Shrink Wrapping (3) • Define again, • move the prologue and epilogue code to enclose the minimal code segments • not being contained in a loop • maintain correctness • data-flow analysis • similar to the problem ‘anticipation’ in PRE • similar to the problem ‘available expression’in global CSE
Data-flow analysis (1) • register is anticipatable, if all execution paths from that point contain definitions or uses of the register • register is available if all execution paths to that point include definitions or uses of it • for basic block i, • RUSE(i) : set of registers used or defined in block i • RANTin(i), RANTout(i) : set of registers anticipatable • RAVin(i), RAVout(i) : set of registers availabe
Data-flow analysis (2) • Anticipatable registers • backward problem • meet operator : ∩ (intersection) • initialization : RANTin(exit)={}, RANTin(b)=U • transfer function : RANTin(i) = RUSE(i) ∪ RANTout(i) • Available registers • forward problem • meet operator : ∩ (intersection) • initialization : RAVout(entry)={}, RAVout(b)=U • transfer function : RAVout(i) = RUSE(i) ∪ RAVin(i) • representation using bit vector • bit vector is a single word w/ 32 registers machine
Data-flow analysis (3) • for register r, and block i, • insert save code at the earliest point leading to contiguous blocks that use r. • no previous save of r. • by symmetry,
Data-flow analysis (4) • Still, suffering from two problems. • save / restore inside a loop • move save and restore code outward to surround the loop • correctness • split edge and move save code a ← 1 save save c ← a c ← b epilogue