1 / 44

MPICL/ParaGraph Evaluation Report

MPICL/ParaGraph Evaluation Report . Adam Leko, Hans Sherburne UPC Group HCS Research Laboratory University of Florida. Color encoding key: Blue: Information Red: Negative note Green: Positive note. Basic Information. Name: MPICL/ParaGraph Developer:

mavis
Download Presentation

MPICL/ParaGraph Evaluation Report

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MPICL/ParaGraph Evaluation Report Adam Leko, Hans Sherburne UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative note Green: Positive note

  2. Basic Information • Name: MPICL/ParaGraph • Developer: • ParaGraph: University of Illinois, University of Tennessee • MPICL: ORNL • Current versions: • Paragraph (no version number, but last available update 1999) • MPICL 2.0 • Website:http://www.csar.uiuc.edu/software/paragraph/http://www.csm.ornl.gov/picl/ • Contacts: • ParaGraph • Michael Heath (heath@cs.uiuc.edu) • Jennifer Finger • MPICL • Patrick Worley (worleyph@ornl.gov) Note: Paragraph last updated 1999, MPICL last updated 2001 [both seem dead]

  3. MPICL/ParaGraph Overview • MPICL • Trace file creation library • Uses MPI profiling interface • Only records MPI commands • Support for “custom” events using manual instrumentation • Writes traces in documented ASCII PICL format • ParaGraph • PICL trace visualization tool • Very old tool (first written during 1989-1991) • Offers a lot of visualizations • Analog: MPICL -> MPE, Jumpshot -> ParaGraph

  4. MPICL Overview • Installation a nightmare • Requires knowledge of F2C symbol naming convention (!) • Had to edit and remove some code to work with new version of MPICH • Hardcoded values for certain field sizes had to be updated • One statement in the Fortran environment setup was causing a coredump of instrumented programs on startup • Automatic instrumentation of MPI programs offered via profiling interface • Once installed, very easy to use • Have to add 3 lines of code to enable creation of trace files • Calls to tracefiles(), tracelevel(), and tracenode() (see ParaGraph documentation) • Minor annoyance, could be done automatically • Manual instrumentation routines also available • Calls to tracedata() and traceevent() (see ParaGraph documentation) • Notion of program “phases” which allow crude form of source code correlation • Also has extra code to ensure accurate clock synchronization • Extra work is done to ensure consistent ordering of events • Helps prevent “tachyons” (showing messages received before they are sent) • Delays startup by several seconds (but is not mandatory) • After trace file is collected, it has to be sorted using tracesort

  5. MPICL Overhead • Instrumentation performed using MPI profiling interface • Used a 5MB buffer for trace files • On average, instrumentation relatively intrusive, but within 20% • Does not include overhead for synchronizing clocks • Note: Benchmarks marked with * have high variability in runtimes

  6. ParaGraph Overview • Uses its own widget set • Probably necessary when it was first written in 1989 • Widgets look extremely crude by today’s standards • Button = square with text in the middle • Uses its own conventions, takes a bit getting used to • Once you adjust to interface, becomes less of an issue, but at times conventions used become cumbersome • Example: closing any child window shuts down entire application • ParaGraph philosophy • Provide as many different types of visualizations as possible • 4 categories: Utilization, communication, tasks, other • Use a tape player abstraction for viewing trace data • Similar to Paraver, cumbersome for trying to maneuver to specific times • All visualizations use a form of animation • Trace data is drawn as fast as possible • This creates problems on modern machines • “Slow motion” option available, but doesn’t work that well • Supports application-specific visualizations • Have to write custom code and link against it during ParaGraph compilation

  7. ParaGraph Visualizations • Utilization visualizations • Display rough estimate of processor utilization • Utilization broken down into 3 states: • Idle – When program is blocked waiting for a communication operation (or it has stopped execution) • Overhead – When a program is performing communication but is not blocked (time spent within MPI library) • Busy – if execution part of program other than communication • “Busy” doesn’t necessarily mean useful work is being done since it assumes (not communication) := busy • Communication visualizations • Display different aspects of communication • Frequency, volume, overall pattern, etc. • “Distance” computed by setting topology in options menu • Task visualizations • Display information about when processors start & stop tasks • Requires manually instrumented code to identify when processors start/stop tasks • Other visualizations • Miscellaneous things • Can load/save a visualization window set (does not work)

  8. Utilization Visualizations – Utilization Count • Displays # of processors in each state at a given moment in time • Busy shown on bottom, overhead in middle, idle on top

  9. Utilization Visualizations – Gantt Chart • Displays utilization state of each processor as a function of time

  10. Utilization Visualizations – Kiviat Diagram • Shows our friend, the Kiviat diagram • Each spoke is a single processor • Dark green shows moving average, light green shows current high watermark • Timing parameters for each can be adjusted • Metric shown can be “busy” or “busy + overhead”

  11. Utilization Visualizations – Streak • Shows “streak” of state • Similar to winning/losing streaks of baseball teams • Win = overhead or busy • Loss = idle • Not sure how useful this is

  12. Utilization Visualizations – Utilization Summary • Shows percentage of time spent in each utilization state up to current time

  13. Utilization Visualizations – Utilization Meter • Shows percentage of processors in each utilization state at current time

  14. Utilization Visualizations – Concurrency Profile • Shows histograms of # processors in a particular utilization state • Ex: Diagram shows • Only 1 processor was busy ~5% of the time • All 8 processors were busy ~90% of the time

  15. Communication Visualizations – Color Code • Color code controls colors used on most communication visualizations • Can have color indicate message sizes, message distance, or message tag • Distance computed by topology set in options menu

  16. Communication Visualizations – Communication Traffic • Shows overall traffic at a given time • Bandwidth used, or • Number of messages in flight • Can show single node or aggregate of all nodes

  17. Communication Visualizations – Spacetime Diagram • Shows standard space-time diagram for communication • Messages sent from node to node at which times

  18. Communication Visualizations – Message Queues • Shows data about message queue lengths • Incoming/outgoing • Number of bytes queued/number of messages queued • Colors mean different things • Dark color shows current moving average • Light color shows high watermark

  19. Communication Visualizations – Communication Matrix • Shows which processors sent data to which other processors

  20. Communication Visualizations – Communication Meter • Show percentage of communication used at the current time • Message count or bandwidth • 100% = max # of messages / max bandwidth used by the application at a specific time

  21. Communication Visualizations – Animation • Animates messages as they occur in trace file • Can overlay messages over topology • Available topologies • Mesh • Ring • Hypercube • User-specified • Can layout each node as you want • Can store to a file and load later on

  22. Communication Visualizations – Node Data • Shows detailed communication data • Can display • Metrics • Which node • Message tag • Message distance • Message length • For a single node, or aggregate for all nodes

  23. Task Visualizations – Task Count • Shows number of processors that are executing a task at the current time • At end of run, changes to show summary of all tasks

  24. Task Visualizations – Task Gantt • Shows Gantt chart of which task each processor was working on at a given time

  25. Task Visualizations – Task Speed • Similar to Gantt chart, but displays “speed” of each task • Must record work done by task in instrumentation call (not done for example shown above)

  26. Task Visualizations – Task Status • Shows which tasks have started and finished at the current time

  27. Task Visualizations – Task Summary • Shows % time spent on each task • Also shows any overlap between tasks

  28. Task Visualizations – Task Surface • Shows time spent on each task by each processor • Useful for seeing load imbalance on a task-by-task basis

  29. Task Visualizations – Task Work • Displays work done by each processor • Shows rate and volume of work being done • Example doesn’t show anything because no work amounts recorded in trace being visualized

  30. Other Visualizations – Clock, Coordinates • Clock • Shows current time • Coordinate information • Shows coordinates when you click on any visualization

  31. Other Visualizations – Critical Path • Highlights critical path in space-time diagram in red • Longest serial path shown in red • Depends on point-to-point communication (collective can screw it up)

  32. Other Visualizations – Phase Portrait • Shows relationship between processor utilization and communication usage

  33. Other Visualizations – Statistics • Gives overall statistics for run • Data • % busy, overhead, idle time • Total count and bandwidth of messages • Max, min, average • Message size • Distance • Transit time • Shows max of 16 processors at a time

  34. Other Visualizations – Processor Status • Shows • Processor status • Which task each processor is executing • Communication (sends & receives) • Each processor is a square in the grid (8-processor example shown)

  35. Other Visualizations – Trace Events • Shows text output of all trace file events

  36. Bottleneck Identification Test Suite • Testing metric: what did visualizations tell us (no manual instrumentation)? • Programs correctness not affected by instrumentation  • CAMEL: PASSED • Space-time diagram & bandwidth utilization visualizations showed large number of small messages at beginning • Utilization graphs showed low overhead, few idle states • LU: PASSED • Space-time diagram showed large number of small messages • Kiviat diagram showed moving average of processor utilization low • Phase portrait showed large correlation between communication and low processor utilization • Big messages: PASSED • Utilization Gantt and space-time diagrams showed large amount of overhead at time of each send • Diffuse procedure: PASSED • Utilization Gantt showed one processor busy & rest idle • Need manual instrumentation to determine that one routine takes too long

  37. Bottleneck Identification Test Suite (2) • Hot procedure: FAILED • Purely sequential code, so ParaGraph could not distinguish between idle and busy states • Intensive server: PASSED • Utilization Gantt chart showed all processors except first idle • Space-time chart showed processor 0 being inundated with messages • Ping-pong: PASSED • Space-time chart showed large # of small messages dependent on each other • Random barrier: TOSS-UP • Utilization count showed one processor busy through execution • Utilization Gantt chart showed busy processor randomly dispersed • However, “waiting for barrier” state shown as idle, so difficult to track down to barrier without extra manual instrumentation

  38. Bottleneck Identification Test Suite (3) • Small messages: PASSED • Utilization Gantt chart showed lots of time spent in MPI code (overhead) • Space-time diagram showed large numbers of small messages • System time: FAILED • All processes show as busy, no distinction of user vs. system time • No communication = classification of processor states not really done at all, everything just gets attributed to busy time • Wrong order: PASSED • Space-time diagram showed messages being received in the reverse order they were sent • But, have to pay close attention to how the diagram is drawn

  39. How to Best Use ParaGraph/MPICL • Don’t use MPICL • Better trace file formats and libraries are available now • We probably should look over the clock synchronization code, but this probably isn’t useful if high-resolution timers are available • Especially for shared-memory machines • Don’t use ParaGraph’s code directly • But, has a lot of neat visualizations we could copy • At the most we should scan the code to see how a visualization is calculated • In summary: just take the best ideas & visualizations

  40. Evaluation (1) • Available metrics:2/5 • Only records communication, task entrance and exit • Approximates processor state by equating not communication = busy • Cost: 5/5 • Free! • Documentation quality:2/5 • ParaGraph has excellent manual • Very hard to find information on MPICL • MPICL installation instructions woefully inadequate • Extensibility: 2/5 • Can add custom visualizations, but must write code and recompile ParaGraph • Open source, but uses old X-Windows API & it’s own widget set • Dead project (no updates since 1999) • Filtering and aggregation: 1/5 • Not really performed • A few visualizations can be restricted to a certain processor • Can output summary statistics (other visualization -> stats)

  41. Evaluation (2) • Hardware support: 5/5 • Cray X1, AlphaServer (Tru64), IBM SP (AIX), SGI Altix, 64-bit Linux clusters (Opteron & Itanium) • Support for a large number of vendor-specific MPI libraries • Would probably need a lot of effort to port to more modern architectures though • Heterogeneity support: 0/5(not supported) • Installation:1.5/5 • ParaGraph relatively easy to compile and install • MPICL installation is extremely difficult, especially with modern versions of MPIC/LAM • Interoperability: 0/5 • Does not interoperate with other tools • Learning curve: 2.5/5 • MPICL library easy to use • ParaGraph interface unintuitive, can get in the way

  42. Evaluation (3) • Manual overhead: 1/5 • Can record all MPI calls by linking, but this requires the addition of trace control instructions in source code • Task visualizations depend on manual instrumentation • Measurement accuracy: 2/5 • CAMEL: ~18% overhead • Instrumentation adds a bit of runtime overhead, especially when many messages are sent • Multiple executions: 0/5 (not supported) • Multiple analyses & views: 5/5 • Many, many ways of looking at trace data • Performance bottleneck identification: 4/5 • Bottleneck identification must be performed manually • Many visualizations help with bottleneck detection, but no guidance is provided on which one you should examine first

  43. Evaluation (4) • Profiling/tracing support: 3/5 • Only tracing supported • Profiling data can be shown in ParaGraph after processing trace file • Response time: 2/5 • Nothing reported until after program runs • Also need (computationally expensive) trace sort to be performed before you can view trace file • Large trace files take a while to load (ParaGraph must pass over entire trace before displaying anything) • Searching: 0/5 (not supported) • Software support: 3/5 • Can link against any library using MPI profiling interface, but will not be instrumented • Only MPI and some (very old, obsolete) vendor-specific message-passing libraries are supported

  44. Evaluation (5) • Source code correlation: 0/5 • Not supported • Can do indirectly via manual instrumentation of tasks, but still hard to figure out exactly where things occur in source code • System stability: 3.5/5 • MPICL relatively stable after bugs were fixed during compilation • ParaGraph stable as long as you don’t try to do weird things (load the wrong file) • Not very robust with error handling • ParaGraph’s load/save window set doesn’t work • Technical support: 0/5 • Dead project • Project email addresses still seem valid, but not sure how much help we could get from the developers now

More Related