1 / 15

TAU Performance Tuning Tool

TAU is a performance system framework for scalable parallel and distributed high-performance computing, offering a portable, configurable profiling and tracing facility. It supports manual and automatic instrumentation options for in-depth analysis.

bnumbers
Download Presentation

TAU Performance Tuning Tool

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TAU Performance Tuning Tool Sun Yongzhao

  2. Introduction - What is TAU? • Tuning and Analysis Utilities • Performance system framework for scalable parallel and distributed high-performance computing • Targets a general complex system computation model • nodes / contexts / threads • Multi-level: system / software / parallelism • Integrated toolkit for performance instrumentation, measurement, analysis, and visualization • Portable, configurable performance profiling/tracing facility • Open software approach • University of Oregon • http://www.cs.uoregon.edu/research/paracomp/tau

  3. Paraver EPILOG TAU Performance System Architecture

  4. General Complex System Computation Model • Node:physically distinct shared memory machine • Message passing node interconnection network • Context: distinct virtual memory space within node • Thread: execution threads (user/system) in context Interconnection Network Inter-node messagecommunication * * Node Node Node node memory memory memory SMP physicalview VM space … modelview … Context Threads

  5. Definitions – Profiling • Profiling • Recording of summary information during execution • inclusive, exclusive time, # calls, hardware statistics, … • Reflects performance behavior of program entities • functions, loops, basic blocks • user-defined “semantic” entities • Helps to expose performance bottlenecks and hotspots • Implemented through • sampling: periodic OS interrupts or hardware counter traps • instrumentation: direct insertion of measurement code

  6. Definitions – Tracing • Tracing • Recording of information about significant points (events) during program execution • Save information in event record • timestamp • CPU identifier, thread identifier • Event type and event-specific information • Event trace is a time-sequenced stream of event records • Can be used to reconstruct dynamic program behavior

  7. TAU Instrumentation Options • Manual instrumentation • TAU Profiling API • Automatic instrumentation approaches • PDT – Source-to-source translation • MPI - Wrapper interposition library • Opari – OpenMP directive rewriting

  8. Manual Instrumentation – Using TAU • Install TAU % configure ; % make install; • Instrument application • TAU Profiling API • Modify application makefile • include TAU’s stub makefile, modify variables • Execute application % mpirun –np <procs> a.out; • Analyze performance data • jracy, vampir, pprof, paraver …

  9. TAU Measurement • Performance information • High-resolution timer library (real-time / virtual clocks) • General software counter library(user-defined events) • Hardware performance counters

  10. TAU Measurement (continued) • Parallel profiling • Function-level, block-level, statement-level • Supports user-defined events • TAU parallel profile database • Hardware counts values • Tracing • All profile-level events • Inter-process communication events • Timestamp synchronization • User-configurable measurement library (user controlled)

  11. TAU Analysis • Profile analysis • pprof • parallel profiler with text-based display • racy • graphical interface to pprof • jracy • Java implementation of Racy • Trace analysis and visualization • Trace merging and clock adjustment (if necessary) • Trace format conversion

  12. jracy (NAS Parallel Benchmark – LU) Routine profile across all nodes Global profiles n: node c: context t: thread Individual profile

  13. jracy

  14. Using TAU In Boss • /ihepbatch/bes/sunyz/workarea/TestRelease-bak/TestRelease-00-00-03/run:boss.exe HelloWorldOptions.txt • /ihepbatch/bes/sunyz/workarea/TestRelease-bak/TestRelease-00-00-03/run:pprof • Reading Profile files in profile.* • NODE 0;CONTEXT 0;THREAD 0: • --------------------------------------------------------------------------------------- • %Time Exclusive Inclusive #Call #Subrs Inclusive Name • msec total msec usec/call • --------------------------------------------------------------------------------------- • 100.0 0.442 0.442 1 0 442 execute() int () • --------------------------------------------------------------------------------------- • USER EVENTS Profile :NODE 0, CONTEXT 0, THREAD 0 • --------------------------------------------------------------------------------------- • NumSamples MaxValue MinValue MeanValue Std. Dev. Event Name • --------------------------------------------------------------------------------------- • 1 2048 2048 2048 0 Memory allocated by arrays • 1 1 1 1 0 Number of Iterates • ---------------------------------------------------------------------------------------

  15. Using TAU In Boss • /ihepbatch/bes/sunyz/workarea/TestRelease/TestRelease-00-00-06/run:boss.exe jobOptions.G4Sim.txt • /ihepbatch/bes/sunyz/workarea/TestRelease/TestRelease-00-00-06/run:pprof • Reading Profile files in profile.* • NODE 0;CONTEXT 0;THREAD 0: • --------------------------------------------------------------------------------------- • %Time Exclusive Inclusive #Call #Subrs Inclusive Name • msec total msec usec/call • --------------------------------------------------------------------------------------- • 100.0 2:09.364 2:09.364 1 0 129364843 initializ() int () • 0.4 494 494 1 0 494272 execute() int () • --------------------------------------------------------------------------------------- • USER EVENTS Profile :NODE 0, CONTEXT 0, THREAD 0 • --------------------------------------------------------------------------------------- • NumSamples MaxValue MinValue MeanValue Std. Dev. Event Name • --------------------------------------------------------------------------------------- • 2 4096 2048 3072 1024 Memory allocated by arrays • 2 1 1 1 0 Number of Iterates • ---------------------------------------------------------------------------------------

More Related