1 / 24

Software Performance Analysis Using CodeAnalyst for Windows

Software Performance Analysis Using CodeAnalyst for Windows. Lei Yu Member Technical Staff SRD lei.yu@amd.com Advanced Micro Devices. Sherry Hurwitz SW Applications Manager SRD sherry.hurwitz@amd.com Advanced Micro Devices. Session Outline. Exploiting Performance Opportunities

Download Presentation

Software Performance Analysis Using CodeAnalyst for Windows

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Software Performance Analysis UsingCodeAnalyst for Windows Lei Yu Member Technical StaffSRDlei.yu@amd.com Advanced Micro Devices Sherry Hurwitz SW Applications ManagerSRDsherry.hurwitz@amd.com Advanced Micro Devices

  2. Session Outline • Exploiting Performance Opportunities • Obvious Performance Potential • Hidden Performance Potential • Exposing Untapped Performance Potential • Analyzing Performance Improvement Trials • AMD CodeAnalyst Performance Analysis Tool • Capabilities of CodeAnalyst • Functionality of CodeAnalyst • Profile Capabilities • Thread Analysis • Pipeline Simulation

  3. Obvious Performance Potential • Processor Architecture • x64 Processors • Extended Memory Addressing • Additional Registers • Deeper Execution Pipeline • Multi-Core Processors • Multiprocessing for the desktop system • Multiple processor platforms • 64-bit Windows® operating systems • Compiler optimization switches • Optimized libraries (for example AMD ACML)

  4. Hidden Performance Potential • Efficient algorithms • Cache friendly memory access • Branch Prediction friendly conditionals • Parallel work through Threads • Object Synchronization

  5. Expose Untapped Performance Potential • Profile your application with the AMD CodeAnalyst Performance Analyzer • Timer-based sampling - identify time consuming or frequently executed code possibly pointing to algorithm issues (Hot Spots) • Opteron and Athlon 64 processor performance events - evaluate the applications use of architectural features • Thread View - evaluate effective use of multiple processors • Pipeline Simulation - understand how data dependencies can stall the processor execution • Iterate - between profiling and code modifications testing if there are performance benefits

  6. Analyzing Performance Improvement Trials

  7. Capabilities of AMD CodeAnalyst • CodeAnalyst CAN: • Assist in optimizing your application • Identify program bottlenecks • Monitor and Analyze software performance • CodeAnalyst CANNOT: • Identify defects in your program (Profile a functioning stable application.) • CodeAnalyst RUNS ON: • Windows: WinNT, Win2K , WinXP, 64-bit Windows® operating systems

  8. Key Functionality of AMD CodeAnalyst • Profiling • Timer-based sampling • Event-based sampling • Thread analysis • Execution Pipeline Simulation

  9. Profile Capabilities • Low overhead system-wide profile • Timer-based profile: • 0.1 ms resolution on APIC enabled systems • 1.0ms resolution on APIC disabled systems • Event-based profile: • 32 AMD Athlon™ and AMD Athlon™ XP performance events • 78 AMD Opteron™ and AMD Athlon™ 64 performance events • Simultaneously profile up to 4 user selected performance events. • Profiles multiple processor systems up to 16 processor cores

  10. Profile Analysis • Identifies all active Process Names, Process IDs, Thread IDs • Identifies the Process CPU affinity • Identifies performance event per CPU • Maps sample addresses to Process, Module, Function, Source Line, Assembly Instruction, Code Byte

  11. Hierarchical Navigation of Data Views • System Data View • System Graph View • Module Data View • Module Graph View • Source View • Disassembly View Demo will show the details of each of these views and the navigation between the views.

  12. Timer-based Profiling - the First Level of Analysis • Exposes areas of intense activity • Identifies the most likely suspects • Provides a sample distribution chart • Ability to drill down through several data views • View the source code on and around the sample • Algorithmic issues may be evident from the hot spot code • Hot spot code might suggest particular events to profile in next level of Analysis

  13. Common Hot Spots • Loops • Large content and large loop counts are natural hot spots but not bad for performance • Small content with small fixed loop counts should be unrolled • Remove redundant constant calculations from inner loops, including from inner control structures • Long Logical Expressions in If Statements • Long data dependent expressions • Complicated Floating Point expressions

  14. Event-based Profile - Second Level of Analysis • Useful Events to Identify Memory Issues • “Data Cache Access” and “Data Cache Misses” simultaneously • use the ratio of Misses to Access • Count Misaligned Data Reference • Useful Events to Identify Branching Issues • “Retired branch mispredicted” and “Retired taken branches” • use the ratio of mispredicted to branch taken

  15. Examples of Memory Issues • Large data structures with variable size members not sorted by size • Use of pointer notation in manipulating large data arrays • Dereferenced pointer arguments inside a function • Large declarations of local variables declared randomly with respect to size • Memory buffers shared between threads

  16. Examples of Branch Prediction Issues • Order of the expressions in compound branch conditions • Order of operands in Logical expressions • Large switch statements with noncontiguous expressions • Large switch statements cases out of order in respect to probability

  17. Thread Analysis • Identities threads in the target application. • Shows Thread creation and termination • Monitors CPU affinity of each thread • Identifies Non-local memory access • Graphs thread activity on each CPU

  18. Thread Analysis Data View

  19. Pipeline Simulation Capabilities CodeAnalyst can simulate a user specified block of code on AMD microprocessors and provide cycle-precise execution info. Requirement: Defining a code block to simulate, requires the user to provide debug info for the target module. Limitation: • Cannot simulate instructions inside system space • Cannot simulate multi-thread

  20. Some Assumptions in the Simulator • Assumes perfect memory subsystem • All Load/Store Micro-ops hit in the Data Cache • Assumes that 1 misaligned load = 2 back-to-back aligned loads (64-bit) • Assumes no cache bank conflicts • 100% Instruction cache hit rate • Assumes perfect branch prediction • Assumes all schedulers are of infinite size

  21. Pipeline Data View

  22. CodeAnalyst Simulation Analysis • User specifies Simulation configuration • User sets Trace Point Start, Trace point End, and trace trigger • Pipeline Data View • Pipeline stage • Penalty • Dependency • Delta completion • IPC • User can view Simulation History

  23. Call to Action • Download CodeAnalyst • Improve Your Software!

  24. Additional Resources • Web Resources at: • http://www.developwithamd.com • Download CodeAnalyst • Software Optimization Guide for AMD Athlon 64 and AMD Opteron • AMD64 Architecture Programmer's Manual Volume 1: Application Programming • AMD64 Architecture Programmer's Manual Volume 2: System Programming • AMD64 Architecture Programmer's Manual Volume 3: General-Purpose and System Instructions • AMD64 Architecture Programmer's Manual Volume 4: 128-Bit Media Instructions • AMD64 Architecture Programmer's Manual Volume 5: 64-Bit Media and x87 Floating-Point Instructions • http://www.devx.com • Optimizing Your C/C++ Applications, Part 1 & 2 • Whitepapers: • Porting and Optimizing Applications on 64-bit Windows for AMD64 Architecture, Winhec 2004 paper by Mike Wall

More Related