Feedback directed optimization in Compaq’s compilation tools for Alpha

Feedback directed optimization in Compaq’s compilation tools for Alpha Robert Cohn (Robert.Cohn@compaq.com) P. Geoffrey Lowney (Geoff.Lowney@compaq.com) Compaq Computer Corporation

Feedback directed optimization • Compilers • profiles used to determine frequently executed paths • optimization makes common paths fast • other paths might be slower FDO workshop

Feedback directed optimization • Mature and powerful classical optimizer • leverage existing optimizations • Feedback directed optimizations: • Augment cost model with profile information • Simple feedback directed restructuring • enables classical optimizations • FDO 1% of compiler FDO workshop

Feedback directed optimization in the tool chain Compiler Linker Bin Opt FrontEnd Optimizer CodeGen IL  IL IL  obj obj  bin Source  IL bin  bin inliner tracer switch commando loe switch real flow register allocation scheduling layout alignment FDO workshop

Profile information • Basic block counts • pixie: instrumentation • DCPI: statistical sampling • Call edge counts computed from basic block counts • Flow edge counts estimated from basic block counts FDO workshop

Procedure inliner • Static heuristics: estimates benefit of inlining a call site: • code size, register pressure, constant arguments, number of static callers • Frequency of execution: • lower or raise desirability • number of dynamic callers FDO workshop

Tracer • Transforms complicated control flow to superblocks • single entrance, multiple exit code sequence • Benefit from larger superblocks: • bigger scheduling unit • isolation of infrequently executed paths FDO workshop

Tracer: superblock formation • Pick trace N1,…, Nn • Change trace to superblock • visit nodes N2,…, Nn • if > 1 predecessor • copy node and outgoing edges • redirect incoming trace edge to copy A B C D E FDO workshop

Tracer: superblock formation • Pick trace N1,…, Nn • Change trace to superblock • visit nodes N2,…, Nn • if > 1 predecessor • copy node and outgoing edges • redirect incoming trace edge to copy A B C1 C D E FDO workshop

Tracer: superblock formation • Pick trace N1,…, Nn • Change trace to superblock • visit nodes N2,…, Nn • if > 1 predecessor • copy node and outgoing edges • redirect incoming trace edge to copy A B C1 C D E1 E FDO workshop

Tracer: loop peeling Pull 1 or 2 iterations out of loop Implemented as superblock formation p = p->n; if (p == a) goto L1; do {p = p-> n; } while (p != a); return p; p = p->n; if (p == a) goto L1; do { p = p->n; } while (p != a); L1: return p; FDO workshop

Tracer: loop peeling • Pick trace N1,…, Nn • Change trace to superblock • visit nodes N2,…, Nn • if > 1 predecessor • copy node and outgoing edges • redirect incoming trace edge to copy A 1 1 B 1 C FDO workshop

A B1 Tracer: loop peeling • Pick trace N1,…, Nn • Change trace to superblock • visit nodes N2,…, Nn • if > 1 predecessor • copy node and outgoing edges • redirect incoming trace edge to copy 1 1 0 B 1 0 C FDO workshop

A B1 B2 Tracer: loop peeling • Pick trace N1,…, Nn • Change trace to superblock • visit nodes N2,…, Nn • if > 1 predecessor • copy node and outgoing edges • redirect incoming trace edge to copy 1 1 0 B 0 0 C 1 FDO workshop

A B1 B2 Tracer: loop peeling • Pick trace N1,…, Nn • Change trace to superblock • visit nodes N2,…, Nn • if > 1 predecessor • copy node and outgoing edges • redirect incoming trace edge to copy 1 0 B 1 0 0 C 1 FDO workshop

A B1 B2 C1 Tracer: loop peeling • Pick trace N1,…, Nn • Change trace to superblock • visit nodes N2,…, Nn • if > 1 predecessor • copy node and outgoing edges • redirect incoming trace edge to copy 1 0 B 1 0 0 C 1 FDO workshop

Commando loop optimization • Restructure loop • frequent paths are in inner loop • infrequent paths moved to outer loop • Create opportunities for classical opt. • loop invariant removal • register allocation • Generalization of superblock loop optimization FDO workshop

S V W Commando loop optimization • Make two loop bottoms • Redirect infrequent back edges to one and the rest to other • Add loop preheader • Infrequent loop bottom targets preheader • Frequent loop bottom targets loop top 0 42 Q 40 4 R T U FDO workshop

S V W Commando loop optimization • Make two loop bottoms • Redirect infrequent back edges to one and the rest to other • Add loop preheader • Infrequent loop bottom targets preheader • Frequent loop bottom targets loop top 0 42 Q 40 4 R T U H C FDO workshop

S V W Commando loop optimization • Make two loop bottoms • Redirect infrequent back edges to one and the rest to other • Add loop preheader • Infrequent loop bottom targets preheader • Frequent loop bottom targets loop top Q R T U 0 42 4 40 H C FDO workshop

S V W Commando loop optimization • Make two loop bottoms • Redirect infrequent back edges to one and the rest to other • Add loop preheader • Infrequent loop bottom targets preheader • Frequent loop bottom targets loop top P Q R T U 0 42 4 40 H C FDO workshop

S V W Commando loop optimization • Make two loop bottoms • Redirect infrequent back edges to one and the rest to other • Add loop preheader • Infrequent loop bottom targets preheader • Frequent loop bottom targets loop top P 4 Q R T U 0 42 4 40 H C FDO workshop

S V W Commando loop optimization • Make two loop bottoms • Redirect infrequent back edges to one and the rest to other • Add loop preheader • Infrequent loop bottom targets preheader • Frequent loop bottom targets loop top P 4 Q 82 R T U 0 42 4 40 H C FDO workshop

S V W Commando loop optimization • Make two loop bottoms • Redirect infrequent back edges to one and the rest to other • Add loop preheader • Infrequent loop bottom targets preheader • Frequent loop bottom targets loop top P 4 Inner loop Q 82 R T U 0 42 4 40 H C FDO workshop

Code layout • Place code to improve: • instruction cache utilization • memory working set • instruction prefetch • Pettis and Hansen • Basic block chaining • Routine ordering • Routine splitting FDO workshop

Switch statement optimization C switch statement test for most frequent case first switch (a) { case 1: return 3; case 2: return 4; case 4: return 5; } if (a == 4) return 5; else switch (a) { case 1: return 3; case 2: return 4; } FDO workshop

Evaluation • DS20: 500MHZ 21264 • SPECInt95 • train: train workload • time: ref workload • Aggressive optimization for baseline • Median of 9 runs FDO workshop

Speedup by optimization FDO workshop

Speedup for inlining FDO workshop

Code layout FDO workshop

Tracer FDO workshop

Commando FDO workshop

Loop unroller FDO workshop

Switch optimization FDO workshop

Code growth by optimization FDO workshop

Summary and conclusions • FDO is effective: 17% speedup • Complement to a strong classical optimizer • augment cost model of static optimization • simple restructuring transformations • Inlining is most important • Reduces code size FDO workshop

Acknowledgements • Gene Albert • Michael Adler, David Blickstein, Peter Craig, Caroline Davidson, Neil Faiman, Kent Glossop, David Goodwin, Rich Grove, Lucy Hamnett, Steve Hobbs, Bob Nix, Bill Noyce, and John Pieper FDO workshop

Feedback directed optimization in Compaq’s compilation tools for Alpha

Feedback directed optimization in Compaq’s compilation tools for Alpha

Presentation Transcript