230 likes | 376 Views
Performance Tools developed in IBM Haifa. http://www.haifa.il.ibm.com/dept/svt/code_paot.html Gad Haber (haber@il.ibm.com). HRL Performance Tools. FDPR-Pro Feedback-based optimizer operating on binary executable files Part of the AIX 5L Available on Linux on Power via alphaworks
E N D
Performance Tools developed in IBM Haifa http://www.haifa.il.ibm.com/dept/svt/code_paot.html Gad Haber (haber@il.ibm.com)
HRL Performance Tools • FDPR-Pro • Feedback-based optimizer operating on binary executable files • Part of the AIX 5L • Available on Linux on Power via alphaworks • Under development for • Mac OS X – to be available soon via alphaworks • z/OS • CodeAnalyzer • Eclipse plugin tool for analyzing executable files • Under development • To be added as part of the Performance Work Bench (PerfWB) • BProber • Utility for instrumenting binary executable files • Under development • ESTO • Utility for identifying the optimal set of optimization options • Under development
FDPR-Pro Feedback Directed Program Restructuring
FDPR-Pro - Feedback Directed Program Restructuring • Using a global view of the entire program • Operating on the executable file after linkage • These properties enable FDPR-Pro to do: • Global Code Reordering • Inter Procedure Boundaries Optimizations • Static Data Rearrangement • Constant Area Rearrangement • Data Prefetching • Examples of FDPR-Pro additional optimizations: • Usage of Branch Tables • Usage of TOC load instructions • More..
Method • Phase 1: Code instrumentation • Basic block level • Phase 2: Profile information gathering • Selection of "right" input set (representative workload) • Accumulation over several input sets • Phase 3: Global Code & Data Optimizations • Complements the compiler
FDPR-Pro Optimization Options • -RC Reorder Code • -bf Branch folding • -bp Branch prediction bit setting • -align Code alignment • -nop Eliminate nop instructions • -uce Unreachable code elimination • -hco_resched Hot/Cold instruction scheduling • -RD, -build_dcg Static data reordering • -tocload, -reduce_toc Tocload optimizations • -si, -ipht, -ihf, -isf Aggressive function inlining options • -ptrgl_optimization Optimize function calls via pointers • -dcbt_optimization Inject data prefetching instructions • -link_reg_optimization Eliminate stores/restore of link register • -volatile_regs Eliminate stores/restores using available volatile regs • -killed_regs Eliminate stores/restores of killed registers • -load_after_store Separate between frequent load and store to same address • -loop_unroll Loop unrolling • -stack_opt Reduce stack frame size of Hot functions • -dce Dead code elimination
CodeAnalyzer - Motivation • Architectures are becoming more complex • Using only hardware simulators to detect information about potential performance bottlenecks in a given program is hard • There is a need for performance tools that can statically analyze and visualize programs for a platform design, to be used by: • Hardware architects • Compiler writers • Application developers
CodeAnalyzer • CodeAnalyzer is an eclipse plugin which performs comprehensive static analysis on given executable files and DLLs • Relies on the FDPR-Pro tool for the analysis phase • CodeAnalyzerdisplays the analyzed information together with profiling data collected by: • tprof • FDPR-Pro • The code is then colored according to: • Frequency counters - gathered by FDPR-Pro • Hardware event ticks - gathered by tprof
CodeAnalyzer – (continued) • Provides several views of the input binary • Assembly instructions • Basic blocks • Procedures • CSECT modules • control flow graph • Hot loops • Call graph • Annotated source code • Dispatch group formation • Pipeline slots and functional units
CodeAnalyzer – Performance Comments • Performance comments displayed by CodeAnalyzer • Comments which do not require profiling • Pipeline stalls for the Power architecture • Unreachable code and non-used data • Profile-based comments • Non-variant instructions within Hot loops • Hot function calls proceeded by overwriting non-volatile registers • Hot saves and restores of registers which could be relocated to cold spill areas • Hot instructions that could be scheduled to colder areas in the code • Removable hot branches • Hot direct unconditional branches • Hot direct conditional branches that are taken, which have a colder fallthru • Hot call sites that are appropriate candidates for function inlining • Hot call sites that are appropriate for function specialization • Hot loops that are appropriate for loop unrolling • Hot TOC load instructions that can be replaced by immediate add instructions
PerfWB • CodeAnalyzer is part of the Performance Workbench (PerfWB) utility • PerfWB is a collection of eclipse plugins that provide performance monitoring, tuning and analysis • PerfWB consists of the following eclipse plugins: • ProcMon - system-level monitoring tool for displaying system state and for monitoring running processes and threads • E-Tune- visualizer of feedback information produced by tprof • CodeAnalyzer – performance analyzer of executables and DLLs