1 / 43

Feedback directed optimization in Compaq’s compilation tools for Alpha

Explore feedback-directed optimization in Compaq's compilation tools for Alpha processors. Learn about profiles used, superblock formation, loop peeling, and more.

Download Presentation

Feedback directed optimization in Compaq’s compilation tools for Alpha

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Feedback directed optimization in Compaq’s compilation tools for Alpha Robert Cohn (Robert.Cohn@compaq.com) P. Geoffrey Lowney (Geoff.Lowney@compaq.com) Compaq Computer Corporation

  2. Feedback directed optimization • Compilers • profiles used to determine frequently executed paths • optimization makes common paths fast • other paths might be slower FDO workshop

  3. Feedback directed optimization • Mature and powerful classical optimizer • leverage existing optimizations • Feedback directed optimizations: • Augment cost model with profile information • Simple feedback directed restructuring • enables classical optimizations • FDO 1% of compiler FDO workshop

  4. Feedback directed optimization in the tool chain Compiler Linker Bin Opt FrontEnd Optimizer CodeGen IL  IL IL  obj obj  bin Source  IL bin  bin inliner tracer switch commando loe switch real flow register allocation scheduling layout alignment FDO workshop

  5. Profile information • Basic block counts • pixie: instrumentation • DCPI: statistical sampling • Call edge counts computed from basic block counts • Flow edge counts estimated from basic block counts FDO workshop

  6. Procedure inliner • Static heuristics: estimates benefit of inlining a call site: • code size, register pressure, constant arguments, number of static callers • Frequency of execution: • lower or raise desirability • number of dynamic callers FDO workshop

  7. Tracer • Transforms complicated control flow to superblocks • single entrance, multiple exit code sequence • Benefit from larger superblocks: • bigger scheduling unit • isolation of infrequently executed paths FDO workshop

  8. Tracer: superblock formation • Pick trace N1,…, Nn • Change trace to superblock • visit nodes N2,…, Nn • if > 1 predecessor • copy node and outgoing edges • redirect incoming trace edge to copy A B C D E FDO workshop

  9. Tracer: superblock formation • Pick trace N1,…, Nn • Change trace to superblock • visit nodes N2,…, Nn • if > 1 predecessor • copy node and outgoing edges • redirect incoming trace edge to copy A B C D E FDO workshop

  10. Tracer: superblock formation • Pick trace N1,…, Nn • Change trace to superblock • visit nodes N2,…, Nn • if > 1 predecessor • copy node and outgoing edges • redirect incoming trace edge to copy A B C1 C D E FDO workshop

  11. Tracer: superblock formation • Pick trace N1,…, Nn • Change trace to superblock • visit nodes N2,…, Nn • if > 1 predecessor • copy node and outgoing edges • redirect incoming trace edge to copy A B C1 C D E FDO workshop

  12. Tracer: superblock formation • Pick trace N1,…, Nn • Change trace to superblock • visit nodes N2,…, Nn • if > 1 predecessor • copy node and outgoing edges • redirect incoming trace edge to copy A B C1 C D E1 E FDO workshop

  13. Tracer: superblock formation • Pick trace N1,…, Nn • Change trace to superblock • visit nodes N2,…, Nn • if > 1 predecessor • copy node and outgoing edges • redirect incoming trace edge to copy A B C1 C D E1 E FDO workshop

  14. Tracer: loop peeling Pull 1 or 2 iterations out of loop Implemented as superblock formation p = p->n; if (p == a) goto L1; do {p = p-> n; } while (p != a); return p; p = p->n; if (p == a) goto L1; do { p = p->n; } while (p != a); L1: return p; FDO workshop

  15. Tracer: loop peeling • Pick trace N1,…, Nn • Change trace to superblock • visit nodes N2,…, Nn • if > 1 predecessor • copy node and outgoing edges • redirect incoming trace edge to copy A 1 1 B 1 C FDO workshop

  16. Tracer: loop peeling • Pick trace N1,…, Nn • Change trace to superblock • visit nodes N2,…, Nn • if > 1 predecessor • copy node and outgoing edges • redirect incoming trace edge to copy A 1 1 B 1 C FDO workshop

  17. A B1 Tracer: loop peeling • Pick trace N1,…, Nn • Change trace to superblock • visit nodes N2,…, Nn • if > 1 predecessor • copy node and outgoing edges • redirect incoming trace edge to copy 1 1 0 B 1 0 C FDO workshop

  18. A B1 Tracer: loop peeling • Pick trace N1,…, Nn • Change trace to superblock • visit nodes N2,…, Nn • if > 1 predecessor • copy node and outgoing edges • redirect incoming trace edge to copy 1 1 0 B 1 0 C FDO workshop

  19. A B1 B2 Tracer: loop peeling • Pick trace N1,…, Nn • Change trace to superblock • visit nodes N2,…, Nn • if > 1 predecessor • copy node and outgoing edges • redirect incoming trace edge to copy 1 1 0 B 0 0 C 1 FDO workshop

  20. A B1 B2 Tracer: loop peeling • Pick trace N1,…, Nn • Change trace to superblock • visit nodes N2,…, Nn • if > 1 predecessor • copy node and outgoing edges • redirect incoming trace edge to copy 1 0 B 1 0 0 C 1 FDO workshop

  21. A B1 B2 C1 Tracer: loop peeling • Pick trace N1,…, Nn • Change trace to superblock • visit nodes N2,…, Nn • if > 1 predecessor • copy node and outgoing edges • redirect incoming trace edge to copy 1 0 B 1 0 0 C 1 FDO workshop

  22. A B1 B2 C1 Tracer: loop peeling • Pick trace N1,…, Nn • Change trace to superblock • visit nodes N2,…, Nn • if > 1 predecessor • copy node and outgoing edges • redirect incoming trace edge to copy 1 0 B 1 0 0 C 1 FDO workshop

  23. Commando loop optimization • Restructure loop • frequent paths are in inner loop • infrequent paths moved to outer loop • Create opportunities for classical opt. • loop invariant removal • register allocation • Generalization of superblock loop optimization FDO workshop

  24. S V W Commando loop optimization • Make two loop bottoms • Redirect infrequent back edges to one and the rest to other • Add loop preheader • Infrequent loop bottom targets preheader • Frequent loop bottom targets loop top 0 42 Q 40 4 R T U FDO workshop

  25. S V W Commando loop optimization • Make two loop bottoms • Redirect infrequent back edges to one and the rest to other • Add loop preheader • Infrequent loop bottom targets preheader • Frequent loop bottom targets loop top 0 42 Q 40 4 R T U H C FDO workshop

  26. S V W Commando loop optimization • Make two loop bottoms • Redirect infrequent back edges to one and the rest to other • Add loop preheader • Infrequent loop bottom targets preheader • Frequent loop bottom targets loop top Q R T U 0 42 4 40 H C FDO workshop

  27. S V W Commando loop optimization • Make two loop bottoms • Redirect infrequent back edges to one and the rest to other • Add loop preheader • Infrequent loop bottom targets preheader • Frequent loop bottom targets loop top P Q R T U 0 42 4 40 H C FDO workshop

  28. S V W Commando loop optimization • Make two loop bottoms • Redirect infrequent back edges to one and the rest to other • Add loop preheader • Infrequent loop bottom targets preheader • Frequent loop bottom targets loop top P 4 Q R T U 0 42 4 40 H C FDO workshop

  29. S V W Commando loop optimization • Make two loop bottoms • Redirect infrequent back edges to one and the rest to other • Add loop preheader • Infrequent loop bottom targets preheader • Frequent loop bottom targets loop top P 4 Q 82 R T U 0 42 4 40 H C FDO workshop

  30. S V W Commando loop optimization • Make two loop bottoms • Redirect infrequent back edges to one and the rest to other • Add loop preheader • Infrequent loop bottom targets preheader • Frequent loop bottom targets loop top P 4 Inner loop Q 82 R T U 0 42 4 40 H C FDO workshop

  31. Code layout • Place code to improve: • instruction cache utilization • memory working set • instruction prefetch • Pettis and Hansen • Basic block chaining • Routine ordering • Routine splitting FDO workshop

  32. Switch statement optimization C switch statement test for most frequent case first switch (a) { case 1: return 3; case 2: return 4; case 4: return 5; } if (a == 4) return 5; else switch (a) { case 1: return 3; case 2: return 4; } FDO workshop

  33. Evaluation • DS20: 500MHZ 21264 • SPECInt95 • train: train workload • time: ref workload • Aggressive optimization for baseline • Median of 9 runs FDO workshop

  34. Speedup by optimization FDO workshop

  35. Speedup for inlining FDO workshop

  36. Code layout FDO workshop

  37. Tracer FDO workshop

  38. Commando FDO workshop

  39. Loop unroller FDO workshop

  40. Switch optimization FDO workshop

  41. Code growth by optimization FDO workshop

  42. Summary and conclusions • FDO is effective: 17% speedup • Complement to a strong classical optimizer • augment cost model of static optimization • simple restructuring transformations • Inlining is most important • Reduces code size FDO workshop

  43. Acknowledgements • Gene Albert • Michael Adler, David Blickstein, Peter Craig, Caroline Davidson, Neil Faiman, Kent Glossop, David Goodwin, Rich Grove, Lucy Hamnett, Steve Hobbs, Bob Nix, Bill Noyce, and John Pieper FDO workshop

More Related