450 likes | 528 Views
MPICL/ParaGraph Evaluation Report . Adam Leko, Hans Sherburne UPC Group HCS Research Laboratory University of Florida. Color encoding key: Blue: Information Red: Negative note Green: Positive note. Basic Information. Name: MPICL/ParaGraph Developer:
E N D
MPICL/ParaGraph Evaluation Report Adam Leko, Hans Sherburne UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative note Green: Positive note
Basic Information • Name: MPICL/ParaGraph • Developer: • ParaGraph: University of Illinois, University of Tennessee • MPICL: ORNL • Current versions: • Paragraph (no version number, but last available update 1999) • MPICL 2.0 • Website:http://www.csar.uiuc.edu/software/paragraph/http://www.csm.ornl.gov/picl/ • Contacts: • ParaGraph • Michael Heath (heath@cs.uiuc.edu) • Jennifer Finger • MPICL • Patrick Worley (worleyph@ornl.gov) Note: Paragraph last updated 1999, MPICL last updated 2001 [both seem dead]
MPICL/ParaGraph Overview • MPICL • Trace file creation library • Uses MPI profiling interface • Only records MPI commands • Support for “custom” events using manual instrumentation • Writes traces in documented ASCII PICL format • ParaGraph • PICL trace visualization tool • Very old tool (first written during 1989-1991) • Offers a lot of visualizations • Analog: MPICL -> MPE, Jumpshot -> ParaGraph
MPICL Overview • Installation a nightmare • Requires knowledge of F2C symbol naming convention (!) • Had to edit and remove some code to work with new version of MPICH • Hardcoded values for certain field sizes had to be updated • One statement in the Fortran environment setup was causing a coredump of instrumented programs on startup • Automatic instrumentation of MPI programs offered via profiling interface • Once installed, very easy to use • Have to add 3 lines of code to enable creation of trace files • Calls to tracefiles(), tracelevel(), and tracenode() (see ParaGraph documentation) • Minor annoyance, could be done automatically • Manual instrumentation routines also available • Calls to tracedata() and traceevent() (see ParaGraph documentation) • Notion of program “phases” which allow crude form of source code correlation • Also has extra code to ensure accurate clock synchronization • Extra work is done to ensure consistent ordering of events • Helps prevent “tachyons” (showing messages received before they are sent) • Delays startup by several seconds (but is not mandatory) • After trace file is collected, it has to be sorted using tracesort
MPICL Overhead • Instrumentation performed using MPI profiling interface • Used a 5MB buffer for trace files • On average, instrumentation relatively intrusive, but within 20% • Does not include overhead for synchronizing clocks • Note: Benchmarks marked with * have high variability in runtimes
ParaGraph Overview • Uses its own widget set • Probably necessary when it was first written in 1989 • Widgets look extremely crude by today’s standards • Button = square with text in the middle • Uses its own conventions, takes a bit getting used to • Once you adjust to interface, becomes less of an issue, but at times conventions used become cumbersome • Example: closing any child window shuts down entire application • ParaGraph philosophy • Provide as many different types of visualizations as possible • 4 categories: Utilization, communication, tasks, other • Use a tape player abstraction for viewing trace data • Similar to Paraver, cumbersome for trying to maneuver to specific times • All visualizations use a form of animation • Trace data is drawn as fast as possible • This creates problems on modern machines • “Slow motion” option available, but doesn’t work that well • Supports application-specific visualizations • Have to write custom code and link against it during ParaGraph compilation
ParaGraph Visualizations • Utilization visualizations • Display rough estimate of processor utilization • Utilization broken down into 3 states: • Idle – When program is blocked waiting for a communication operation (or it has stopped execution) • Overhead – When a program is performing communication but is not blocked (time spent within MPI library) • Busy – if execution part of program other than communication • “Busy” doesn’t necessarily mean useful work is being done since it assumes (not communication) := busy • Communication visualizations • Display different aspects of communication • Frequency, volume, overall pattern, etc. • “Distance” computed by setting topology in options menu • Task visualizations • Display information about when processors start & stop tasks • Requires manually instrumented code to identify when processors start/stop tasks • Other visualizations • Miscellaneous things • Can load/save a visualization window set (does not work)
Utilization Visualizations – Utilization Count • Displays # of processors in each state at a given moment in time • Busy shown on bottom, overhead in middle, idle on top
Utilization Visualizations – Gantt Chart • Displays utilization state of each processor as a function of time
Utilization Visualizations – Kiviat Diagram • Shows our friend, the Kiviat diagram • Each spoke is a single processor • Dark green shows moving average, light green shows current high watermark • Timing parameters for each can be adjusted • Metric shown can be “busy” or “busy + overhead”
Utilization Visualizations – Streak • Shows “streak” of state • Similar to winning/losing streaks of baseball teams • Win = overhead or busy • Loss = idle • Not sure how useful this is
Utilization Visualizations – Utilization Summary • Shows percentage of time spent in each utilization state up to current time
Utilization Visualizations – Utilization Meter • Shows percentage of processors in each utilization state at current time
Utilization Visualizations – Concurrency Profile • Shows histograms of # processors in a particular utilization state • Ex: Diagram shows • Only 1 processor was busy ~5% of the time • All 8 processors were busy ~90% of the time
Communication Visualizations – Color Code • Color code controls colors used on most communication visualizations • Can have color indicate message sizes, message distance, or message tag • Distance computed by topology set in options menu
Communication Visualizations – Communication Traffic • Shows overall traffic at a given time • Bandwidth used, or • Number of messages in flight • Can show single node or aggregate of all nodes
Communication Visualizations – Spacetime Diagram • Shows standard space-time diagram for communication • Messages sent from node to node at which times
Communication Visualizations – Message Queues • Shows data about message queue lengths • Incoming/outgoing • Number of bytes queued/number of messages queued • Colors mean different things • Dark color shows current moving average • Light color shows high watermark
Communication Visualizations – Communication Matrix • Shows which processors sent data to which other processors
Communication Visualizations – Communication Meter • Show percentage of communication used at the current time • Message count or bandwidth • 100% = max # of messages / max bandwidth used by the application at a specific time
Communication Visualizations – Animation • Animates messages as they occur in trace file • Can overlay messages over topology • Available topologies • Mesh • Ring • Hypercube • User-specified • Can layout each node as you want • Can store to a file and load later on
Communication Visualizations – Node Data • Shows detailed communication data • Can display • Metrics • Which node • Message tag • Message distance • Message length • For a single node, or aggregate for all nodes
Task Visualizations – Task Count • Shows number of processors that are executing a task at the current time • At end of run, changes to show summary of all tasks
Task Visualizations – Task Gantt • Shows Gantt chart of which task each processor was working on at a given time
Task Visualizations – Task Speed • Similar to Gantt chart, but displays “speed” of each task • Must record work done by task in instrumentation call (not done for example shown above)
Task Visualizations – Task Status • Shows which tasks have started and finished at the current time
Task Visualizations – Task Summary • Shows % time spent on each task • Also shows any overlap between tasks
Task Visualizations – Task Surface • Shows time spent on each task by each processor • Useful for seeing load imbalance on a task-by-task basis
Task Visualizations – Task Work • Displays work done by each processor • Shows rate and volume of work being done • Example doesn’t show anything because no work amounts recorded in trace being visualized
Other Visualizations – Clock, Coordinates • Clock • Shows current time • Coordinate information • Shows coordinates when you click on any visualization
Other Visualizations – Critical Path • Highlights critical path in space-time diagram in red • Longest serial path shown in red • Depends on point-to-point communication (collective can screw it up)
Other Visualizations – Phase Portrait • Shows relationship between processor utilization and communication usage
Other Visualizations – Statistics • Gives overall statistics for run • Data • % busy, overhead, idle time • Total count and bandwidth of messages • Max, min, average • Message size • Distance • Transit time • Shows max of 16 processors at a time
Other Visualizations – Processor Status • Shows • Processor status • Which task each processor is executing • Communication (sends & receives) • Each processor is a square in the grid (8-processor example shown)
Other Visualizations – Trace Events • Shows text output of all trace file events
Bottleneck Identification Test Suite • Testing metric: what did visualizations tell us (no manual instrumentation)? • Programs correctness not affected by instrumentation • CAMEL: PASSED • Space-time diagram & bandwidth utilization visualizations showed large number of small messages at beginning • Utilization graphs showed low overhead, few idle states • LU: PASSED • Space-time diagram showed large number of small messages • Kiviat diagram showed moving average of processor utilization low • Phase portrait showed large correlation between communication and low processor utilization • Big messages: PASSED • Utilization Gantt and space-time diagrams showed large amount of overhead at time of each send • Diffuse procedure: PASSED • Utilization Gantt showed one processor busy & rest idle • Need manual instrumentation to determine that one routine takes too long
Bottleneck Identification Test Suite (2) • Hot procedure: FAILED • Purely sequential code, so ParaGraph could not distinguish between idle and busy states • Intensive server: PASSED • Utilization Gantt chart showed all processors except first idle • Space-time chart showed processor 0 being inundated with messages • Ping-pong: PASSED • Space-time chart showed large # of small messages dependent on each other • Random barrier: TOSS-UP • Utilization count showed one processor busy through execution • Utilization Gantt chart showed busy processor randomly dispersed • However, “waiting for barrier” state shown as idle, so difficult to track down to barrier without extra manual instrumentation
Bottleneck Identification Test Suite (3) • Small messages: PASSED • Utilization Gantt chart showed lots of time spent in MPI code (overhead) • Space-time diagram showed large numbers of small messages • System time: FAILED • All processes show as busy, no distinction of user vs. system time • No communication = classification of processor states not really done at all, everything just gets attributed to busy time • Wrong order: PASSED • Space-time diagram showed messages being received in the reverse order they were sent • But, have to pay close attention to how the diagram is drawn
How to Best Use ParaGraph/MPICL • Don’t use MPICL • Better trace file formats and libraries are available now • We probably should look over the clock synchronization code, but this probably isn’t useful if high-resolution timers are available • Especially for shared-memory machines • Don’t use ParaGraph’s code directly • But, has a lot of neat visualizations we could copy • At the most we should scan the code to see how a visualization is calculated • In summary: just take the best ideas & visualizations
Evaluation (1) • Available metrics:2/5 • Only records communication, task entrance and exit • Approximates processor state by equating not communication = busy • Cost: 5/5 • Free! • Documentation quality:2/5 • ParaGraph has excellent manual • Very hard to find information on MPICL • MPICL installation instructions woefully inadequate • Extensibility: 2/5 • Can add custom visualizations, but must write code and recompile ParaGraph • Open source, but uses old X-Windows API & it’s own widget set • Dead project (no updates since 1999) • Filtering and aggregation: 1/5 • Not really performed • A few visualizations can be restricted to a certain processor • Can output summary statistics (other visualization -> stats)
Evaluation (2) • Hardware support: 5/5 • Cray X1, AlphaServer (Tru64), IBM SP (AIX), SGI Altix, 64-bit Linux clusters (Opteron & Itanium) • Support for a large number of vendor-specific MPI libraries • Would probably need a lot of effort to port to more modern architectures though • Heterogeneity support: 0/5(not supported) • Installation:1.5/5 • ParaGraph relatively easy to compile and install • MPICL installation is extremely difficult, especially with modern versions of MPIC/LAM • Interoperability: 0/5 • Does not interoperate with other tools • Learning curve: 2.5/5 • MPICL library easy to use • ParaGraph interface unintuitive, can get in the way
Evaluation (3) • Manual overhead: 1/5 • Can record all MPI calls by linking, but this requires the addition of trace control instructions in source code • Task visualizations depend on manual instrumentation • Measurement accuracy: 2/5 • CAMEL: ~18% overhead • Instrumentation adds a bit of runtime overhead, especially when many messages are sent • Multiple executions: 0/5 (not supported) • Multiple analyses & views: 5/5 • Many, many ways of looking at trace data • Performance bottleneck identification: 4/5 • Bottleneck identification must be performed manually • Many visualizations help with bottleneck detection, but no guidance is provided on which one you should examine first
Evaluation (4) • Profiling/tracing support: 3/5 • Only tracing supported • Profiling data can be shown in ParaGraph after processing trace file • Response time: 2/5 • Nothing reported until after program runs • Also need (computationally expensive) trace sort to be performed before you can view trace file • Large trace files take a while to load (ParaGraph must pass over entire trace before displaying anything) • Searching: 0/5 (not supported) • Software support: 3/5 • Can link against any library using MPI profiling interface, but will not be instrumented • Only MPI and some (very old, obsolete) vendor-specific message-passing libraries are supported
Evaluation (5) • Source code correlation: 0/5 • Not supported • Can do indirectly via manual instrumentation of tasks, but still hard to figure out exactly where things occur in source code • System stability: 3.5/5 • MPICL relatively stable after bugs were fixed during compilation • ParaGraph stable as long as you don’t try to do weird things (load the wrong file) • Not very robust with error handling • ParaGraph’s load/save window set doesn’t work • Technical support: 0/5 • Dead project • Project email addresses still seem valid, but not sure how much help we could get from the developers now