Scheduling Considerations for building Dynamic Verification Tools for MPI

Scheduling Considerations for building Dynamic Verification Tools for MPI SarvaniVakkalanka, Michael DeLisi Ganesh Gopalakrishnan, Robert M. Kirby School of Computing, University of Utah, Salt Lake City Supported by Microsoft HPC Institutes, NSF CNS-0509379 http://www.cs.utah.edu/formal_verification

Background The scientific community is increasingly employing expensive supercomputers that employ distributed programming libraries…. (BlueGene/L - Image courtesy of IBM / LLNL) (Image courtesy of Steve Parker, CSAFE, Utah) …to program large-scale simulations in all walks of science, engineering, math, economics, etc.

Current Programming Realities Code written using mature libraries (MPI, OpenMP, PThreads, …) API calls made from real programming languages (C, Fortran, C++) How best to verify codes that will run on actual platforms? Runtime semantics determined by realistic Compilers and Runtimes

Classical Model Checking Finite State Model of Concurrent Program Extraction of Finite State Models for realistic programs is difficult. Check Properties

Dynamic Verification • Avoid model extraction which can • be tedious and imprecise • Program serves as its own model • Reduce Complexity through • Reduction of interleavings (and • other methods) Actual Concurrent Program Check Properties

Dynamic Verification • Need test harness in order to run the • code. • Will explore ONLY RELEVANT • INTERLEAVINGS (all Mazurkeiwicz traces) • for the given test harness • Conventional testing tools • cannot do this !! • E.g. 5 threads, 5 instructions • each  1010 interleavings !! Actual Concurrent Program Check Properties One Specific Test Harness

Dynamic Verification Actual Concurrent Program • Need to consider all test • harnesses • FOR MANY PROGRAMS, • this number seems small • (e.g. HypergraphPartitioner) Check Properties One Specific Test Harness

Related Work • Dynamic Verification tool: • CHESS • Verisoft (POPL ’97) • DPOR (POPL’ 05) • JPF • ISP is similar to CHESS and DPOR

Dynamic Partial Order Reduction (DPOR) P0 P1 P2 L0 L0 U0 U0 lock(x) lock(x) lock(x) L1 L2 ………….. ………….. ………….. U1 U2 unlock(x) unlock(x) unlock(x) L1 L2 U1 U2

ISP Executable Proc1 Proc2 …… Procn Scheduler Run MPI Program • Manifest only/all relevant • interleavings (DPOR) Profiler MPI Runtime • Manifest ALL relevant • interleavings of the MPI • Progress Engine : • - Done by DYNAMIC • REWRITING of WILDCARD • Receives.

Using PMPI P0’s Call Stack Scheduler User_Function TCP socket MPI_Send P0: MPI_Send MPI_Send SendEnvelope PMPI_Send PMPI_Send In MPI Runtime MPI Runtime

DPOR and MPI • Implemented an Implicit deadlock detection technique form a single program trace. • Issues with MPI progress engine for wildcard receives could not be resolved. • More details can be found in our CAV’2008 paper: “Dynamic Verification of MPI Programs with Reductions in Presence of Split Operations and Relaxed Orderings”

POE Scheduler P0 P1 P2 Isend(1) sendNext Barrier Isend(1, req) Barrier Irecv(*, req) Barrier Isend(1, req) Barrier Wait(req) Wait(req) Recv(2) Wait(req) MPI Runtime

POE Scheduler P0 P1 P2 Isend(1) Barrier sendNext Isend(1, req) Irecv(*, req) Irecv(*) Barrier Barrier Barrier Barrier Isend(1, req) Wait(req) Recv(2) Wait(req) Wait(req) MPI Runtime

POE Scheduler P0 P1 P2 Isend(1) Barrier Barrier Isend(1, req) Irecv(*, req) Barrier Barrier Irecv(*) Barrier Barrier Barrier Isend(1, req) Barrier Wait(req) Recv(2) Wait(req) Wait(req) Barrier MPI Runtime

POE Scheduler P0 P1 P2 Isend(1) Irecv(2) Barrier Isend Wait (req) Isend(1, req) Irecv(*, req) Barrier No Match-Set Irecv(*) Barrier Barrier Isend(1, req) Barrier Recv(2) SendNext Wait(req) Recv(2) Wait(req) Wait(req) Barrier Deadlock! Isend(1) Wait Wait (req) MPI Runtime

MPI_Waitany + POE Scheduler P0 P1 P2 Isend(1, req[0]) sendNext sendNext Isend(2, req[0]) Isend(1, req[0]) Recv(0) Barrier Waitany(2, req) Barrier Recv(0) Isend(2, req[1]) Recv(0) Waitany(2,req) Barrier Barrier MPI Runtime

MPI_Waitany + POE Scheduler P0 P1 P2 Isend(1,req[0]) Isend(1, req[0]) Isend(2, req[0]) Isend(1, req[0]) Recv(0) Barrier Waitany(2, req) Recv Barrier Recv(0) Isend(2, req[1]) Recv(0) Barrier Waitany(2,req) Valid req[0] Barrier Error! req[1] invalid Barrier Invalid MPI_REQ_NULL req[1] MPI Runtime

MPI Progress Engine Issues Scheduler P0 P1 Irecv(1, req) sendNext Scheduler Hangs Barrier Irecv(1, req) Isend(0, req) Barrier Wait(req) sendNext Wait(req) Barrier Isend(0, req) Does not Return Wait PMPI_Wait PMPI_Irecv + PMPI_Wait MPI Runtime

Experiments • ISP was run on 69 examples of the Umpire test suite. • Detected deadlocks in these examples where tools like Marmot cannot detect these deadlocks. • Produced far smaller number of interleavings compared to those without reduction. • ISP run on Game of Life ~ 500 lines code. • ISP run on Parmetis ~ 14k lines of code. • Widely used for parallel partitioning of large hypergraphs • ISP run on MADRE • (Memory aware data redistribution engine by Siegel and Siegel, EuroPVM/MPI 08) • Found previously KNOWN deadlock, but AUTOMATICALLY within one second ! • Results available at: http://www.cs.utah.edu/formal_verification/ISP_Tests

Concluding Remarks • Tool available (download and try) • Future work • Distributed ISP scheduler • Handle MPI + Threads • Do large-scale bug hunt now that ISP can execute large-scale codes.

Implicit Deadlock Detection Scheduler P0 P1 P2 P0 : Irecv(*) P1 : Isend(P0) Irecv(*, req) Isend(0, req) Isend(0, req) No Matching Send P2 : Isend(P0) P0 : Recv(P2) Recv(2) Wait(req) Wait(req) P1 : Wait(req) P2 : Wait(req) Deadlock! Wait(req) P0 : Wait(req) MPI Runtime

Scheduling Considerations for building Dynamic Verification Tools for MPI

Scheduling Considerations for building Dynamic Verification Tools for MPI

Presentation Transcript

Dynamic User Task Scheduling for Mobile Robots

EFFICIENT DYNAMIC VERIFICATION ALGORITHMS FOR MPI APPLICATIONS

New Tools for Healthy building

RMA Considerations for MPI-3.1 (or MPI-3 Errata)

Dynamic Scheduling

Dynamic Scheduling

Dynamic scheduling

Tools for Automating Verification and Validation

Dynamic Verification for Hybrid Concurrent Programming Models

Verification Tools

Tools for Building Teams

Dynamic Scheduling

Dynamic Scheduling

Dynamic Multi Phase Scheduling for Heterogeneous Clusters

Custom Tools for Waitlist and Scheduling

Building Dynamic Instrumentation Tools with DynamoRIO

Tools for building compilers

An Integration of Dynamic MPI Formal Verification Within Eclipse PTP

Dynamic scheduling

Scheduling Considerations for Multi-User MIMO

Tools for Building Mobile App

Tools for Practical Irrigation Scheduling