430 likes | 439 Views
Explore feedback-directed optimization in Compaq's compilation tools for Alpha processors. Learn about profiles used, superblock formation, loop peeling, and more.
E N D
Feedback directed optimization in Compaq’s compilation tools for Alpha Robert Cohn (Robert.Cohn@compaq.com) P. Geoffrey Lowney (Geoff.Lowney@compaq.com) Compaq Computer Corporation
Feedback directed optimization • Compilers • profiles used to determine frequently executed paths • optimization makes common paths fast • other paths might be slower FDO workshop
Feedback directed optimization • Mature and powerful classical optimizer • leverage existing optimizations • Feedback directed optimizations: • Augment cost model with profile information • Simple feedback directed restructuring • enables classical optimizations • FDO 1% of compiler FDO workshop
Feedback directed optimization in the tool chain Compiler Linker Bin Opt FrontEnd Optimizer CodeGen IL IL IL obj obj bin Source IL bin bin inliner tracer switch commando loe switch real flow register allocation scheduling layout alignment FDO workshop
Profile information • Basic block counts • pixie: instrumentation • DCPI: statistical sampling • Call edge counts computed from basic block counts • Flow edge counts estimated from basic block counts FDO workshop
Procedure inliner • Static heuristics: estimates benefit of inlining a call site: • code size, register pressure, constant arguments, number of static callers • Frequency of execution: • lower or raise desirability • number of dynamic callers FDO workshop
Tracer • Transforms complicated control flow to superblocks • single entrance, multiple exit code sequence • Benefit from larger superblocks: • bigger scheduling unit • isolation of infrequently executed paths FDO workshop
Tracer: superblock formation • Pick trace N1,…, Nn • Change trace to superblock • visit nodes N2,…, Nn • if > 1 predecessor • copy node and outgoing edges • redirect incoming trace edge to copy A B C D E FDO workshop
Tracer: superblock formation • Pick trace N1,…, Nn • Change trace to superblock • visit nodes N2,…, Nn • if > 1 predecessor • copy node and outgoing edges • redirect incoming trace edge to copy A B C D E FDO workshop
Tracer: superblock formation • Pick trace N1,…, Nn • Change trace to superblock • visit nodes N2,…, Nn • if > 1 predecessor • copy node and outgoing edges • redirect incoming trace edge to copy A B C1 C D E FDO workshop
Tracer: superblock formation • Pick trace N1,…, Nn • Change trace to superblock • visit nodes N2,…, Nn • if > 1 predecessor • copy node and outgoing edges • redirect incoming trace edge to copy A B C1 C D E FDO workshop
Tracer: superblock formation • Pick trace N1,…, Nn • Change trace to superblock • visit nodes N2,…, Nn • if > 1 predecessor • copy node and outgoing edges • redirect incoming trace edge to copy A B C1 C D E1 E FDO workshop
Tracer: superblock formation • Pick trace N1,…, Nn • Change trace to superblock • visit nodes N2,…, Nn • if > 1 predecessor • copy node and outgoing edges • redirect incoming trace edge to copy A B C1 C D E1 E FDO workshop
Tracer: loop peeling Pull 1 or 2 iterations out of loop Implemented as superblock formation p = p->n; if (p == a) goto L1; do {p = p-> n; } while (p != a); return p; p = p->n; if (p == a) goto L1; do { p = p->n; } while (p != a); L1: return p; FDO workshop
Tracer: loop peeling • Pick trace N1,…, Nn • Change trace to superblock • visit nodes N2,…, Nn • if > 1 predecessor • copy node and outgoing edges • redirect incoming trace edge to copy A 1 1 B 1 C FDO workshop
Tracer: loop peeling • Pick trace N1,…, Nn • Change trace to superblock • visit nodes N2,…, Nn • if > 1 predecessor • copy node and outgoing edges • redirect incoming trace edge to copy A 1 1 B 1 C FDO workshop
A B1 Tracer: loop peeling • Pick trace N1,…, Nn • Change trace to superblock • visit nodes N2,…, Nn • if > 1 predecessor • copy node and outgoing edges • redirect incoming trace edge to copy 1 1 0 B 1 0 C FDO workshop
A B1 Tracer: loop peeling • Pick trace N1,…, Nn • Change trace to superblock • visit nodes N2,…, Nn • if > 1 predecessor • copy node and outgoing edges • redirect incoming trace edge to copy 1 1 0 B 1 0 C FDO workshop
A B1 B2 Tracer: loop peeling • Pick trace N1,…, Nn • Change trace to superblock • visit nodes N2,…, Nn • if > 1 predecessor • copy node and outgoing edges • redirect incoming trace edge to copy 1 1 0 B 0 0 C 1 FDO workshop
A B1 B2 Tracer: loop peeling • Pick trace N1,…, Nn • Change trace to superblock • visit nodes N2,…, Nn • if > 1 predecessor • copy node and outgoing edges • redirect incoming trace edge to copy 1 0 B 1 0 0 C 1 FDO workshop
A B1 B2 C1 Tracer: loop peeling • Pick trace N1,…, Nn • Change trace to superblock • visit nodes N2,…, Nn • if > 1 predecessor • copy node and outgoing edges • redirect incoming trace edge to copy 1 0 B 1 0 0 C 1 FDO workshop
A B1 B2 C1 Tracer: loop peeling • Pick trace N1,…, Nn • Change trace to superblock • visit nodes N2,…, Nn • if > 1 predecessor • copy node and outgoing edges • redirect incoming trace edge to copy 1 0 B 1 0 0 C 1 FDO workshop
Commando loop optimization • Restructure loop • frequent paths are in inner loop • infrequent paths moved to outer loop • Create opportunities for classical opt. • loop invariant removal • register allocation • Generalization of superblock loop optimization FDO workshop
S V W Commando loop optimization • Make two loop bottoms • Redirect infrequent back edges to one and the rest to other • Add loop preheader • Infrequent loop bottom targets preheader • Frequent loop bottom targets loop top 0 42 Q 40 4 R T U FDO workshop
S V W Commando loop optimization • Make two loop bottoms • Redirect infrequent back edges to one and the rest to other • Add loop preheader • Infrequent loop bottom targets preheader • Frequent loop bottom targets loop top 0 42 Q 40 4 R T U H C FDO workshop
S V W Commando loop optimization • Make two loop bottoms • Redirect infrequent back edges to one and the rest to other • Add loop preheader • Infrequent loop bottom targets preheader • Frequent loop bottom targets loop top Q R T U 0 42 4 40 H C FDO workshop
S V W Commando loop optimization • Make two loop bottoms • Redirect infrequent back edges to one and the rest to other • Add loop preheader • Infrequent loop bottom targets preheader • Frequent loop bottom targets loop top P Q R T U 0 42 4 40 H C FDO workshop
S V W Commando loop optimization • Make two loop bottoms • Redirect infrequent back edges to one and the rest to other • Add loop preheader • Infrequent loop bottom targets preheader • Frequent loop bottom targets loop top P 4 Q R T U 0 42 4 40 H C FDO workshop
S V W Commando loop optimization • Make two loop bottoms • Redirect infrequent back edges to one and the rest to other • Add loop preheader • Infrequent loop bottom targets preheader • Frequent loop bottom targets loop top P 4 Q 82 R T U 0 42 4 40 H C FDO workshop
S V W Commando loop optimization • Make two loop bottoms • Redirect infrequent back edges to one and the rest to other • Add loop preheader • Infrequent loop bottom targets preheader • Frequent loop bottom targets loop top P 4 Inner loop Q 82 R T U 0 42 4 40 H C FDO workshop
Code layout • Place code to improve: • instruction cache utilization • memory working set • instruction prefetch • Pettis and Hansen • Basic block chaining • Routine ordering • Routine splitting FDO workshop
Switch statement optimization C switch statement test for most frequent case first switch (a) { case 1: return 3; case 2: return 4; case 4: return 5; } if (a == 4) return 5; else switch (a) { case 1: return 3; case 2: return 4; } FDO workshop
Evaluation • DS20: 500MHZ 21264 • SPECInt95 • train: train workload • time: ref workload • Aggressive optimization for baseline • Median of 9 runs FDO workshop
Speedup by optimization FDO workshop
Speedup for inlining FDO workshop
Code layout FDO workshop
Tracer FDO workshop
Commando FDO workshop
Loop unroller FDO workshop
Switch optimization FDO workshop
Code growth by optimization FDO workshop
Summary and conclusions • FDO is effective: 17% speedup • Complement to a strong classical optimizer • augment cost model of static optimization • simple restructuring transformations • Inlining is most important • Reduces code size FDO workshop
Acknowledgements • Gene Albert • Michael Adler, David Blickstein, Peter Craig, Caroline Davidson, Neil Faiman, Kent Glossop, David Goodwin, Rich Grove, Lucy Hamnett, Steve Hobbs, Bob Nix, Bill Noyce, and John Pieper FDO workshop