170 likes | 331 Views
Dynaprof Evaluation Report . Adam Leko, Hans Sherburne UPC Group HCS Research Laboratory University of Florida. Color encoding key: Blue: Information Red: Negative note Green: Positive note. Basic Information. Name: Dynaprof Developer: Philip Mucci (UTK) Current versions:
E N D
Dynaprof Evaluation Report Adam Leko, Hans Sherburne UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative note Green: Positive note
Basic Information • Name: Dynaprof • Developer: Philip Mucci (UTK) • Current versions: • Dynaprof CVS as of 2/21/2005 • DynInst API v4.1.1 (dependency) • PAPI v3.0.7 (dependency) • Website:http://www.cs.utk.edu/~mucci/dynaprof/ • Contact: • Philip Mucci (mucci@cs.utk.edu)
Dynaprof Overview DynaProf 0.9 Philip J. Mucci, mucci@cs.utk.edu, 2000-2003 Provided courtesy of UTK's Innovative Computing Laboratory. See http://icl.cs.utk.edu for more information. This is Open Source Software! (dynaprof)| • Merges existing tools • PAPI • DynInst API • Command-line tool • Dynamically instruments programs at runtime • Requires no recompilation! • Insert probes at runtime • Metrics available • Wall clock time • Any PAPI metrics • Can be extended • Only simple GUI available (see right) • Just wrapper around command-line version • Currently pretty broken
Instrumentation Overview • Instrumentation very easy • Especially for sequential/threaded applications • Compile application regularly (-g eases naming later) • gcc -O3 -g -o camel camel.c • Dynaprof commands • Load the exe • load camel • Specify which probe you wish to use • use papiprobe [args] • List available functions • list camel.c • Instrument command • All functions in a file: instr module camel.c • A single function: instr function camel.c main • Run command • continue • <CTRL-C> pauses execution (currently does not work) • Instrumentation output is produced in an additional file (will be shown at runtime)
Instrumentation Overview (2) • No special commands needed for • sequential applications • pthread applications • MPI not supported directly through command line • Wrapper scripts available for MPICH and LAM • Dynaprof must be run in “batch mode” • A file containing all instrumentation commands • Halts the app before MPI_Init() is called • However, not working with current version of MPICH • Get assertion failure and stops working • Can only use MPI programs with 1 process • UPC? • Tried • GCC-UPC • BUPC (smp + pthreads) • Both produced no output or crashed Dynaprof
Instrumentation Overhead • Only could instrument one-process MPI code • MPI run wrapper script broken • No PPerf apps! (all require > 1 process) • Camel overhead very high • Only instrumented main • LU overhead really low? • Possible causes of overhead • Frequent subroutine calls from main • Use of tsc.h processor counters for timers confuse Dynaprof • Expect overhead similar to Paradyn • 5-10% for most applications with a reasonable number of instrumentation points
Dynaprof Probe Information • Probes perform all data collection and analysis • Provide code to insert into a function when instrumented • Probes can be called 4 different times • Function entry point • Function exit point • Function call point • Function return point • Each probe is encapsulated in a shared library • Allows relatively easy creation of new probes • Available probes • “Wallclock” probe (records wall clock time) • PAPI wallclock probe (same as wallclock, uses high-resolution timers) • PAPI probe (records any PAPI metric, such as FLOPs) • Specify PAPI metrics as args in use papiprobe [args] command • Existing probes provide profile-style data only • Although no reason that a trace could not also be collected
Probe Output • After running, an ASCII file containing raw data is created • At runtime, a message like “…output will be in /home/leko/…” will be printed indicating where file will be • Three programs are provided which analyze the raw data • wallclockrpt – for wall clock probe • papiclockrpt – for PAPI wall clock probe • papiproberpt – for PAPI probe • Summary statistics are provided • Exclusive profile (metric collected excluding children) • Inclusive profile (metric collected including children) • 1-call level deep profile (see which functions an instrumented function called) • Output from *rpt programs is simple ASCII (sample next page)
Sample Probe Report (lu.W.1) [leko@eta-1 dynaprof]$ wallclockrpt lu-1.wallclock.16143 Exclusive Profile. Name Percent Total Calls ------------- ------- ----- ------- TOTAL 100 1.436e+11 1 unknown 100 1.436e+11 1 main 3.837e-06 5511 1 Inclusive Profile. Name Percent Total SubCalls ------------- ------- ----- ------- TOTAL 100 1.436e+11 0 main 100 1.436e+11 5 1-Level Inclusive Call Tree. Parent/-Child Percent Total Calls ------------- ------- ----- -------- TOTAL 100 1.436e+11 1 main 100 1.436e+11 1 - f_setarg.0 1.414e-05 2.03e+04 1 - f_setsig.1 1.324e-05 1.902e+04 1 - f_init.2 2.569e-05 3.691e+04 1 - atexit.3 7.042e-06 1.012e+04 1 - MAIN__.4 0 0 1 Note: only “main” was instrumented in this profiled run
Bottleneck Identification Test Suite • Testing metric: what did output of probe tell us? • CAMEL: FAILED • Instrumenting main caused too much application perturbation • NAS LU (“W” workload): TOSS-UP • Given enough time, any bottleneck could be identified • Even cache miss problems, thanks to PAPI! • But how much time to identify bottlenecks? • Communication problems difficult/impossible to pinpoint • No tracing • No communication visualization • PPerfMark tests: NOT TESTED • Could not evaluate PPerfMark suite (running MPI commands broken) • However, same comments for LU would probably apply to all • In general, • Heavily reliant on user’s proficiency with pinpointing problems • Incremental approach • Instrument, re-run, instrument w/PAPI, re-run… • Process can be tedious • But, ease of instrumentation does ease this
Dynaprof General Comments • Good points • Free • Source code available, relatively organized • Good reference on how to use PAPI & DynInst API • Very easy to use • Relatively easy to extend • Developer very responsive to questions • Not-so-good points • High instrumentation overhead in a few cases • Simple to understand, but not much available functionality • Only profiling data with current probes • Not really being updated much any more • Changing program arguments requires reloading & reinstrumenting executable • Dynaprof illustrates that a tool doesn’t have to be ultra-complicated to be useful • KISS!
Adding UPC/SHMEM Support • However, a few potential problems • Reliance on DynInst • Hard to port • Hard to compile! • Reliance on PAPI • Can add own probes which do not use PAPI though… • Best way to use Dynaprof • Steal ideas on how to make tool extensible • Probes as shared libraries nice idea! • Steal code on how to use DynInst & PAPI • UPC support • Would need to do a ton of work • Best bet • Provide a UPC probe • Instrument “known” UPC runtime functions • Gasnet functions for Berkeley • Etc. • Need one probe per UPC runtime/compiler environment • SHMEM support • No extra work necessary! • Handles instrumenting libraries like any other code
Evaluation (1) • Available metrics: 1/5 • Can use PAPI to get lots of data • Limited in what you can collect in a single run, only • Two PAPI metrics or • Wall clock time • Cost: 5/5 • Free • Documentation quality: 4/5 • Minimal documentation, but covers the basics pretty well • Extensibility: 3.5/5 • Open source • Can add new functionality by writing new probes • Must write new code to extend (not much existing functionality) • Filtering and aggregation: 2/5 • Most program data is filtered out for you • Direct result of profile-nature of current probes • Many times too much information is lost • Filtering and aggregation behavior fixed in source code of probes
Evaluation (2) • Hardware support: 3/5 • 64-bit Linux (Itanium only), Sparc, IRIX, AlphaServer (Tru64), IBM SP (AIX) • Most everything supported: Linux, AIX, IRIX, HP-UX • Reliance on PAPI and DynInst could hinder porting • No Cray support • Heterogeneity support: 0/5 (not supported) • Installation: 3/5 • Dynaprof easy to compile, but • PAPI and DynInst a nightmare to install • Also had to hack up some source code a bit to work with newer versions of gcc & javac (JDK1.5) • Interoperability: 0.5/5 • No export interoperability with other tools • There is a half-done TAU probe • Not sure if it works • Or how useful it is! • Learning curve: 4/5 • Very easy to use • Anyone used to prof/gprof will feel right at home
Evaluation (3) • Manual overhead: 3/5 • Can automatically instrument all functions, a handful of functions, and all function calls within a given function • Very easy to choose which functions you want instrumented • Can script behavior of dynaprof executable • Reinstrumenting requires no recompilation • Measurement accuracy: 5/5 • For LU, tracing overhead almost negligible using PAPI probes • Tracing overhead small as long as number of instrumented functions kept reasonable • Program’s correctness of execution not affected • Dynamic instrumentation does not get in compiler’s way for optimizations • Multiple executions: 0/5 • Not supported • Multiple analyses & views: 1/5 • One way of recording data, one way of presenting it • Probes could theoretically present things differently, but none currently do
Evaluation (4) • Performance bottleneck identification: 1/5 • No automatic detection • Usefulness of tool directly related to cleverness of user • Many bottlenecks would be very difficult to detect with only basic profile information given by hardware counters only • Profiling/tracing support: 2/5 • Only supports profiling • Could feasibly add tracing if you wanted to code • Response time: 3/5 • No data at all until after run has completed and tracefile has been opened • Generating reports from raw data instantaneous though • Software support: 4.5/5 • Can link against (and instrument!!) any existing library • Supports MPI (although broken) and shared-memory threaded programs • Source code correlation: 2/5 • Data reported to user at the function name level • Searching: 0/5 (not supported)
Evaluation (5) • System stability: 3/5 • Command-line interface relatively stable • <CTRL-C> pause while running broken in command-line • GUI severely broken • Technical support: 4/5 • Responses from contact within 24 hours • Philip Mucci very helpful, knowledgeable