Projections – A Performance Tool for Charm++ Applications

Chee Wai Lee cheelee@uiuc.edu Parallel Programming Laboratory Dept. of Computer Science University of Illinois at Urbana Champaign http://charm.cs.uiuc.edu Projections – A Performance Tool for Charm++ Applications

Outline • General Introduction to Projections • Projections Basics • Advanced Features • Features to aid Effective Analysis • Extremely Large Datasets • Tips and Notes

Projections • Projections is a performance tool designed for use with Charm++/AMPI. • Trace-based, post-mortem analysis. • Supports highly detailed traces, summary formats and a flexible user-level API. • Java-based visualization tool for presenting performance information. 10/19/05 Projections Tutorial 3

Charm++ Model • Object Oriented. Chares (objects) encapsulate data, standard C++ methods and entry methods. • Message Driven. Entry methods represent work units activated by an incoming message. • Only one entry method may execute at any time in a Chare. • A runtime schedules incoming messages. 10/19/05 Projections Tutorial 4

What you will need • A version of Charm++ built without the CMK_OPTIMIZE flag. • User-built, download our release or check out a copy from anonymous CVS. • Pre-built, check with your machine sysadmin. • Java Runtime 1.3.1 or higher. • Projections Visualization binary (projections.jar) • User-built Charm++, located in charm/tools/projections/bin • Pre-built, check with sysadmin or acquire the binary separately.

Outline: Basics • General Introduction to Projections • Projections Basics • Instrumentation • Trace generation • Visualization • Advanced Features • Features to aid Effective Analysis • Extremely Large Datasets

Trace Generation: Basics • Automatic trace instrumentation - No user codes required by default. • Any Charm++ version built without the CMK_OPTIMIZE flag supports tracing. • All Charm++ entry methods and messaging events are traced.

Trace Generation: Steps • Link your application with link-time options “-tracemode projections -tracemode summary” • Run your application normally. • At the end of the run, you will see “.log”,“.sum” as well as “.sts” files generated on the same directory as your application binary.

Visualization: Basic Steps • Run the script found at charm/tools/projections/bin as such: • projections [<application>.sts] • Or activate the Java binary projections.jar via the command-line passing, in the optional <application>.sts filename as argument.

Visualization: Main Window 10/19/05 Projections Tutorial 10

Visualization: Overview

Visualization: Usage Profile

Visualization: Time Profile

Visualization: Timeline

Visualization: Task Histogram

Visualization: Communication

Visualization: Tabulated Call Info

Projections Basic demo. 10/19/05 Projections Tutorial 18

Outline: Advanced Features • General Introduction to Projections • Projections Basics • Advanced Features • Partial Tracing • Tracing User Events • Tracing AMPI Functions • Features to aid Effective Analysis • Extremely Large Datasets • Tips and Notes

Partial Trace Generation • The following API calls are provided by the tracing framework: • void traceBegin() • void traceEnd() • The above calls turns tracing on/off for the processor on which the call was made. • int traceIsOn() queries the tracing framework status. • +traceoff runtime option • Causes tracing (over all processors) to be turned off when the application is started. 10/19/05 Projections Tutorial 20

Partial Tracing “watch-it”s • traceBegin() and traceEnd() calls apply only on the processor on which it invoked. • Offers flexibility but vulnerable to programmer error. • Typically used in a collective manner. (eg. NAMD does this at specific load balancing operations) • Partial trace calls are invoked in the context of an entry method. One should be prepared to drop initial performance data just after turning tracing on and just before turning tracing off. • Appropriate use of the +traceoff runtime option is essential.

Partial Tracing Example // in the case when trace is off at the beginning, // only turn trace of from after the first LB to the firstLdbStep after // the second LB. // 1 2 3 4 5 6 7 // off on Alg7 refine refine ... on #if CHARM_VERSION >= 050606 if (traceAvailable()) { static int specialTracing = 0; if (ldbCycleNum == 1 && traceIsOn() == 0) specialTracing = 1; if (specialTracing) { if (ldbCycleNum == 4) traceBegin(); if (ldbCycleNum == 6) traceEnd(); } } #endif

Tracing User Events • The following APIs are provided for user event registration and tracing: • int traceRegisterUserEvent(char *eventDesc, int EventNum=-1) • Acquire or specify an event ID to be associated with event name. • void traceUserEvent(int eventNum) • Use a valid event ID to record an event. • void traceUserBracketEvent(int eventNum, double startTime, double endTime) • Use a valid event ID to record an event interval. 10/19/05 Projections Tutorial 23

AMPI Function Tracing API • Works like User Events in Charm++ Applications. • Why? MPI Function abstraction is invisible to the Charm++ runtime. • REGISTER_FUNCTION(<namestring>) to register <namestring> as a string-id to be traced. • TRACEFUNC(<funcall>,<namestring>) to record a call to function <funcall> to be associated with the event registered as <namestring>.

Advanced Features Demo

Outline: Effective Analysis • General Introduction to Projections • Projections Basics • Advanced Features • Features to aid Effective Analysis • How Tracing Works • Memory Footprint Control • Data Volume Control • Visualization Controls • Extremely Large Datasets • Tips and Notes

Log Event Tracing • Each Charm++ event and any registered user events are recorded into a pre-assigned memory buffer on each processor. • Default Buffer size is 10,000 trace log entries. • When a buffer is full, a special flush event is logged and buffer is flushed to disk. • Flushing is done independent of other processors. There is no synchronization on a flush.

Summary Tracing • Invoked by the same event set as in Log tracing. User events are, however, ignored. • Memory buffer is organized into k bins of representing an initial time of 1ms. Each event contributes data into the appropriate time-bin. • When a buffer is filled, bin-time representation is doubled and the data is packed into the first k/2 bins. Event contribution continues at the (k/2+1)th bin.

Memory Footprint Control • Tracing Memory Footprint • Event Log tracemode (-tracemode projections) • Default 10,000 event entries. • Controlled by runtime flag “+logsize <size>”. • Summary tracemode (-tracemode summary) • Default 10,000 time bins of 1 ms. • Controlled by runtime flags “+bincount <#bins>” and “+binsize <seconds>”.

Memory Footprint Control (2) • Trade-offs to consider • Log Buffers • Flush overhead vs Memory usage • Frequency of flushes vs Size of flushes • Summary Buffers • Frequency of compaction + Data Granularity vs Memory usage

Controlling Data Volume • [Code-time] User API for partial tracing. • [Link-time] Generating only summary data. • [Run-time] Writing compressed output (runtime flag): +gz-trace • [Post-run] Deleting subset of generated logs. • [Visualization] Parameter range control, analysis “Memory”.

Visualization Parameter Control

Specific Visualization Tool Issues • Memory usage constraints • Timeline - dependent on the event-density of the selected time range of the application. Typical workable range is 10ms to 10s for between 10-20 processors. • Processor based tools (Overview, Communications, Usage Profile) - limit to 2000 processors or less. • Interval based tools (Graph, Time Profile, Communication vs Time, Animation) - limit to 1500 time intervals. • Histogram tools – limit to 1000 bins or less. 10/19/05 Projections Tutorial 33

Analysis Techniques • Zoom in from wide-ranged low-detail views to detailed look at problem spots. • Make use of range histories. • Control data volume. • Use effective colors. • Make effective use of specific tool features (see manual).

Analysis Techniques (2) • Load Imbalance: Overview, Usage Profile. • Where's my work going?: Time Profile. • How is my communication behaving?: Communication, Communication vs Time. • Are there critical paths? I need details!: Timeline.

Putting it all together

Outline: Extremely Large Datasets • General Introduction to Projections • Projections Basics • How Tracing Works • Features to aid Effective Analysis • Extremely Large Datasets • Tips and Notes

Considerations for Extremely Large Datasets • Large number of files – ware thee the filesystem. • Huge amounts of data – control!

Demo – NAMD on 8192 processors

Outline: Miscellany • General Introduction to Projections • Projections Basics • How Tracing Works • Features to aid Effective Analysis • Extremely Large Datasets • Tips and Notes

Conveniently Placing Logs • Specifying a user-defined output location (runtime option): +traceroot <desired log directory> • It is important to note that <desired log directory> must be available on a machine's compute nodes. 10/19/05 Projections Tutorial 41

Perturbation Issues • Perturbation of application. • Tracing overhead • Timer overheads (rdtsc, machine wallclock). • Acquisition of performance data and storage. • Observed in the case of NAMD with timesteps below 10ms with many compute objects in the microsecond range. 10/19/05 Projections Tutorial 42

Look Closely! • Do not be fooled! Projections does not handle some visualization artifacts well: • Fine grain details can sometimes look like one big solid block on timeline. • It is hard to mouse-over items that represent fine-grained events. • Other times, tiny slivers of activity become too small to be drawn.

Answering my questions (again!) • Load Imbalance: Overview, Usage Profile. • Where's my work going?: Time Profile. • How is my communication behaving?: Communication, Communication vs Time. • Are there critical paths? I need details!: Timeline.

Frequently Asked Questions Q: I tried user events and projections visualization crashes! A: Did you register the events before using them? Q: There are giant stretched event(s) in my run! A: Did you set a large enough log buffer size? Q: Projections visualization crashes! A: Are your logs corrupted? These can happen on bad I/O (experienced on NFS).

Projections – A Performance Tool for Charm++ Applications