240 likes | 358 Views
Synchronizing the timestamps of concurrent events in traces of hybrid MPI/ OpenMP applications. Cluster systems. Cluster systems represent majority of today’s supercomputers Availability of inexpensive commodity components Vast diversity Architecture Interconnect technology
E N D
Synchronizing the timestamps of concurrent events in traces of hybrid MPI/OpenMP applications
Cluster systems • Cluster systems represent majority of today’s supercomputers • Availability of inexpensivecommodity components • Vast diversity • Architecture • Interconnect technology • Software environment • Message-passing and shared-memory programming models for communication and synchronization
Event tracing • Application areas • Performance analysis • Time-line visualization • Wait-state analysis • Performance modeling • Performance prediction • Debugging • Events recorded at runtime to enable post-mortem analysis of dynamic program behavior • Event includes at least timestamp, location, and event type Send Barrier Recv Barrier record E S X E MX MX E R X E write … … … … E E MX MX E E S R X X merge (opt.) … … MX MX X E E E S R X
Clock synchronization • Query time from reference clocks synchronized at regular intervals • Mills Determine medial smoothing function based on send/receive differences Duda, Hofman, Hilgers Lamport, Mattern, Fidge, Rabenseifner Restore and preserve logical correctness Lamport, Mattern, Fidge, Rabenseifner Restore and preserve logical correctness Dunigan, Maillet, Tron, Doleschal Measure offset values and determine interpolation function
µmin Controlled logical clock X E S E X E R X
MPI semantics E E E E MX MX MX MX E E E MX MX MX E MX E E E MX MX MX E MX
Limitations of the CLC algorithm • Neither restores nor preserves clock condition in OpenMP event semantics • May introduce violations in locations that were previously intact S omp_barrier omp_barrier R R omp_barrier
Consider OpenMP constructs as composed of multiple logical messages Define logical send/receive pairs for each flavor Collective communication omp_barrier omp_barrier E OX E OX
OpenMP semantics • Tasking F E E OX J U OX E E OX U L OX E OX E OX U U U L
Happened-before relation • Operation may have multiple logical receive and send events • Multiple receives used to synchronize multiple clocks • Latest send event is the relevant send event OX E E OX E MX E OX
Replay communication Traverse trace in parallel Exchange data at synchronization points Use operation of same type MPI functions OpenMP constructs Parallelization • Correct local traces in parallel • Keep whole trace in memory • Exploit distributed memory & processing capabilities
Forward replay … … omp_barrier 1 1 1 … … omp_barrier 2 2 2 2 2 … … omp_barrier 3 3 3
Backward amortization • Avoid new violations • Do not advance send farther than matching receive S R S R
Backward replay • Data on sender side needed • Communication direction • Communication precedes in backward direction • Roles of sender and receiver are inverted • Traversal direction • Start at end of trace • Avoid deadlocks S R S R R S R S … … S R … … S R
Amortization interval Piece-wise correction min(LCk’(corr. receive event) - µ - LCib) R differences to LCib R R ∆t R R LCib R S S S S S
Significant percentage of messages was violated (up to 5%) After correction all traces were free of clock condition violations • Nicole cluster • JSC@FZJ • 32 compute nodes • 2 quad-core Opteron • running at 2.4 GHz • Infiniband • Applications • PEPC (4 threads per process) • Jacobi solver • (2 threads per process) Evaluation focused on frequency of clock violations, accuracy, and scalability of the correction Experimental evaluation
Event distance Larger relative deviations possible Impact on analysis results negligible Correction changed the length of local intervals only marginally Accuracy of the algorithm • Event position • Absolute deviations correspond tovalue clock condition violations • Relative deviations are negligible Correction only marginally changes the length of local intervals
Algorithm preserved OpenMP semantics Synchronizing hybrid codes • Only violated MPI semantics in original trace • Roughly half of the corrections correspond to OpenMP semantics S omp_barrier R R omp_barrier omp_barrier omp_barrier
Outlook • Exploit knowledge of MPI-internal messaging inside collective operations using PERUSE • Leverage periodic offset measurements at global synchronization points