1 / 17

Dynaprof Evaluation Report

Dynaprof Evaluation Report . Adam Leko, Hans Sherburne UPC Group HCS Research Laboratory University of Florida. Color encoding key: Blue: Information Red: Negative note Green: Positive note. Basic Information. Name: Dynaprof Developer: Philip Mucci (UTK) Current versions:

ike
Download Presentation

Dynaprof Evaluation Report

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dynaprof Evaluation Report Adam Leko, Hans Sherburne UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative note Green: Positive note

  2. Basic Information • Name: Dynaprof • Developer: Philip Mucci (UTK) • Current versions: • Dynaprof CVS as of 2/21/2005 • DynInst API v4.1.1 (dependency) • PAPI v3.0.7 (dependency) • Website:http://www.cs.utk.edu/~mucci/dynaprof/ • Contact: • Philip Mucci (mucci@cs.utk.edu)

  3. Dynaprof Overview DynaProf 0.9 Philip J. Mucci, mucci@cs.utk.edu, 2000-2003 Provided courtesy of UTK's Innovative Computing Laboratory. See http://icl.cs.utk.edu for more information. This is Open Source Software! (dynaprof)| • Merges existing tools • PAPI • DynInst API • Command-line tool • Dynamically instruments programs at runtime • Requires no recompilation! • Insert probes at runtime • Metrics available • Wall clock time • Any PAPI metrics • Can be extended • Only simple GUI available (see right) • Just wrapper around command-line version • Currently pretty broken

  4. Instrumentation Overview • Instrumentation very easy • Especially for sequential/threaded applications • Compile application regularly (-g eases naming later) • gcc -O3 -g -o camel camel.c • Dynaprof commands • Load the exe • load camel • Specify which probe you wish to use • use papiprobe [args] • List available functions • list camel.c • Instrument command • All functions in a file: instr module camel.c • A single function: instr function camel.c main • Run command • continue • <CTRL-C> pauses execution (currently does not work) • Instrumentation output is produced in an additional file (will be shown at runtime)

  5. Instrumentation Overview (2) • No special commands needed for • sequential applications • pthread applications • MPI not supported directly through command line • Wrapper scripts available for MPICH and LAM • Dynaprof must be run in “batch mode” • A file containing all instrumentation commands • Halts the app before MPI_Init() is called • However, not working with current version of MPICH • Get assertion failure and stops working • Can only use MPI programs with 1 process • UPC? • Tried • GCC-UPC • BUPC (smp + pthreads) • Both produced no output or crashed Dynaprof

  6. Instrumentation Overhead • Only could instrument one-process MPI code • MPI run wrapper script broken • No PPerf apps! (all require > 1 process) • Camel overhead very high • Only instrumented main • LU overhead really low? • Possible causes of overhead • Frequent subroutine calls from main • Use of tsc.h processor counters for timers confuse Dynaprof • Expect overhead similar to Paradyn • 5-10% for most applications with a reasonable number of instrumentation points

  7. Dynaprof Probe Information • Probes perform all data collection and analysis • Provide code to insert into a function when instrumented • Probes can be called 4 different times • Function entry point • Function exit point • Function call point • Function return point • Each probe is encapsulated in a shared library • Allows relatively easy creation of new probes • Available probes • “Wallclock” probe (records wall clock time) • PAPI wallclock probe (same as wallclock, uses high-resolution timers) • PAPI probe (records any PAPI metric, such as FLOPs) • Specify PAPI metrics as args in use papiprobe [args] command • Existing probes provide profile-style data only • Although no reason that a trace could not also be collected

  8. Probe Output • After running, an ASCII file containing raw data is created • At runtime, a message like “…output will be in /home/leko/…” will be printed indicating where file will be • Three programs are provided which analyze the raw data • wallclockrpt – for wall clock probe • papiclockrpt – for PAPI wall clock probe • papiproberpt – for PAPI probe • Summary statistics are provided • Exclusive profile (metric collected excluding children) • Inclusive profile (metric collected including children) • 1-call level deep profile (see which functions an instrumented function called) • Output from *rpt programs is simple ASCII (sample next page)

  9. Sample Probe Report (lu.W.1) [leko@eta-1 dynaprof]$ wallclockrpt lu-1.wallclock.16143 Exclusive Profile. Name Percent Total Calls ------------- ------- ----- ------- TOTAL 100 1.436e+11 1 unknown 100 1.436e+11 1 main 3.837e-06 5511 1 Inclusive Profile. Name Percent Total SubCalls ------------- ------- ----- ------- TOTAL 100 1.436e+11 0 main 100 1.436e+11 5 1-Level Inclusive Call Tree. Parent/-Child Percent Total Calls ------------- ------- ----- -------- TOTAL 100 1.436e+11 1 main 100 1.436e+11 1 - f_setarg.0 1.414e-05 2.03e+04 1 - f_setsig.1 1.324e-05 1.902e+04 1 - f_init.2 2.569e-05 3.691e+04 1 - atexit.3 7.042e-06 1.012e+04 1 - MAIN__.4 0 0 1 Note: only “main” was instrumented in this profiled run

  10. Bottleneck Identification Test Suite • Testing metric: what did output of probe tell us? • CAMEL: FAILED • Instrumenting main caused too much application perturbation • NAS LU (“W” workload): TOSS-UP • Given enough time, any bottleneck could be identified • Even cache miss problems, thanks to PAPI! • But how much time to identify bottlenecks? • Communication problems difficult/impossible to pinpoint • No tracing • No communication visualization • PPerfMark tests: NOT TESTED • Could not evaluate PPerfMark suite (running MPI commands broken) • However, same comments for LU would probably apply to all • In general, • Heavily reliant on user’s proficiency with pinpointing problems • Incremental approach • Instrument, re-run, instrument w/PAPI, re-run… • Process can be tedious • But, ease of instrumentation does ease this

  11. Dynaprof General Comments • Good points • Free • Source code available, relatively organized • Good reference on how to use PAPI & DynInst API • Very easy to use • Relatively easy to extend • Developer very responsive to questions • Not-so-good points • High instrumentation overhead in a few cases • Simple to understand, but not much available functionality • Only profiling data with current probes • Not really being updated much any more • Changing program arguments requires reloading & reinstrumenting executable • Dynaprof illustrates that a tool doesn’t have to be ultra-complicated to be useful • KISS!

  12. Adding UPC/SHMEM Support • However, a few potential problems • Reliance on DynInst • Hard to port • Hard to compile! • Reliance on PAPI • Can add own probes which do not use PAPI though… • Best way to use Dynaprof • Steal ideas on how to make tool extensible • Probes as shared libraries nice idea! • Steal code on how to use DynInst & PAPI • UPC support • Would need to do a ton of work • Best bet • Provide a UPC probe • Instrument “known” UPC runtime functions • Gasnet functions for Berkeley • Etc. • Need one probe per UPC runtime/compiler environment • SHMEM support • No extra work necessary! • Handles instrumenting libraries like any other code

  13. Evaluation (1) • Available metrics: 1/5 • Can use PAPI to get lots of data • Limited in what you can collect in a single run, only • Two PAPI metrics or • Wall clock time • Cost: 5/5 • Free • Documentation quality: 4/5 • Minimal documentation, but covers the basics pretty well • Extensibility: 3.5/5 • Open source • Can add new functionality by writing new probes • Must write new code to extend (not much existing functionality) • Filtering and aggregation: 2/5 • Most program data is filtered out for you • Direct result of profile-nature of current probes • Many times too much information is lost • Filtering and aggregation behavior fixed in source code of probes

  14. Evaluation (2) • Hardware support: 3/5 • 64-bit Linux (Itanium only), Sparc, IRIX, AlphaServer (Tru64), IBM SP (AIX) • Most everything supported: Linux, AIX, IRIX, HP-UX • Reliance on PAPI and DynInst could hinder porting • No Cray support • Heterogeneity support: 0/5 (not supported) • Installation: 3/5 • Dynaprof easy to compile, but • PAPI and DynInst a nightmare to install • Also had to hack up some source code a bit to work with newer versions of gcc & javac (JDK1.5) • Interoperability: 0.5/5 • No export interoperability with other tools • There is a half-done TAU probe • Not sure if it works • Or how useful it is! • Learning curve: 4/5 • Very easy to use • Anyone used to prof/gprof will feel right at home

  15. Evaluation (3) • Manual overhead: 3/5 • Can automatically instrument all functions, a handful of functions, and all function calls within a given function • Very easy to choose which functions you want instrumented • Can script behavior of dynaprof executable • Reinstrumenting requires no recompilation • Measurement accuracy: 5/5 • For LU, tracing overhead almost negligible using PAPI probes • Tracing overhead small as long as number of instrumented functions kept reasonable • Program’s correctness of execution not affected • Dynamic instrumentation does not get in compiler’s way for optimizations • Multiple executions: 0/5 • Not supported • Multiple analyses & views: 1/5 • One way of recording data, one way of presenting it • Probes could theoretically present things differently, but none currently do

  16. Evaluation (4) • Performance bottleneck identification: 1/5 • No automatic detection • Usefulness of tool directly related to cleverness of user • Many bottlenecks would be very difficult to detect with only basic profile information given by hardware counters only • Profiling/tracing support: 2/5 • Only supports profiling • Could feasibly add tracing if you wanted to code • Response time: 3/5 • No data at all until after run has completed and tracefile has been opened • Generating reports from raw data instantaneous though • Software support: 4.5/5 • Can link against (and instrument!!) any existing library • Supports MPI (although broken) and shared-memory threaded programs • Source code correlation: 2/5 • Data reported to user at the function name level • Searching: 0/5 (not supported)

  17. Evaluation (5) • System stability: 3/5 • Command-line interface relatively stable • <CTRL-C> pause while running broken in command-line • GUI severely broken • Technical support: 4/5 • Responses from contact within 24 hours • Philip Mucci very helpful, knowledgeable

More Related