1 / 22

Scheduling Considerations for building Dynamic Verification Tools for MPI

Scheduling Considerations for building Dynamic Verification Tools for MPI. Sarvani Vakkalanka , Michael DeLisi Ganesh Gopalakrishnan , Robert M. Kirby School of Computing, University of Utah, Salt Lake City Supported by Microsoft HPC Institutes, NSF CNS-0509379

lou
Download Presentation

Scheduling Considerations for building Dynamic Verification Tools for MPI

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scheduling Considerations for building Dynamic Verification Tools for MPI SarvaniVakkalanka, Michael DeLisi Ganesh Gopalakrishnan, Robert M. Kirby School of Computing, University of Utah, Salt Lake City Supported by Microsoft HPC Institutes, NSF CNS-0509379 http://www.cs.utah.edu/formal_verification

  2. Background The scientific community is increasingly employing expensive supercomputers that employ distributed programming libraries…. (BlueGene/L - Image courtesy of IBM / LLNL) (Image courtesy of Steve Parker, CSAFE, Utah) …to program large-scale simulations in all walks of science, engineering, math, economics, etc.

  3. Current Programming Realities Code written using mature libraries (MPI, OpenMP, PThreads, …) API calls made from real programming languages (C, Fortran, C++) How best to verify codes that will run on actual platforms? Runtime semantics determined by realistic Compilers and Runtimes

  4. Classical Model Checking Finite State Model of Concurrent Program Extraction of Finite State Models for realistic programs is difficult. Check Properties

  5. Dynamic Verification • Avoid model extraction which can • be tedious and imprecise • Program serves as its own model • Reduce Complexity through • Reduction of interleavings (and • other methods) Actual Concurrent Program Check Properties

  6. Dynamic Verification • Need test harness in order to run the • code. • Will explore ONLY RELEVANT • INTERLEAVINGS (all Mazurkeiwicz traces) • for the given test harness • Conventional testing tools • cannot do this !! • E.g. 5 threads, 5 instructions • each  1010 interleavings !! Actual Concurrent Program Check Properties One Specific Test Harness

  7. Dynamic Verification Actual Concurrent Program • Need to consider all test • harnesses • FOR MANY PROGRAMS, • this number seems small • (e.g. HypergraphPartitioner) Check Properties One Specific Test Harness

  8. Related Work • Dynamic Verification tool: • CHESS • Verisoft (POPL ’97) • DPOR (POPL’ 05) • JPF • ISP is similar to CHESS and DPOR

  9. Dynamic Partial Order Reduction (DPOR) P0 P1 P2 L0 L0 U0 U0 lock(x) lock(x) lock(x) L1 L2 ………….. ………….. ………….. U1 U2 unlock(x) unlock(x) unlock(x) L1 L2 U1 U2

  10. ISP Executable Proc1 Proc2 …… Procn Scheduler Run MPI Program • Manifest only/all relevant • interleavings (DPOR) Profiler MPI Runtime • Manifest ALL relevant • interleavings of the MPI • Progress Engine : • - Done by DYNAMIC • REWRITING of WILDCARD • Receives.

  11. Using PMPI P0’s Call Stack Scheduler User_Function TCP socket MPI_Send P0: MPI_Send MPI_Send SendEnvelope PMPI_Send PMPI_Send In MPI Runtime MPI Runtime

  12. DPOR and MPI • Implemented an Implicit deadlock detection technique form a single program trace. • Issues with MPI progress engine for wildcard receives could not be resolved. • More details can be found in our CAV’2008 paper: “Dynamic Verification of MPI Programs with Reductions in Presence of Split Operations and Relaxed Orderings”

  13. POE Scheduler P0 P1 P2 Isend(1) sendNext Barrier Isend(1, req) Barrier Irecv(*, req) Barrier Isend(1, req) Barrier Wait(req) Wait(req) Recv(2) Wait(req) MPI Runtime

  14. POE Scheduler P0 P1 P2 Isend(1) Barrier sendNext Isend(1, req) Irecv(*, req) Irecv(*) Barrier Barrier Barrier Barrier Isend(1, req) Wait(req) Recv(2) Wait(req) Wait(req) MPI Runtime

  15. POE Scheduler P0 P1 P2 Isend(1) Barrier Barrier Isend(1, req) Irecv(*, req) Barrier Barrier Irecv(*) Barrier Barrier Barrier Isend(1, req) Barrier Wait(req) Recv(2) Wait(req) Wait(req) Barrier MPI Runtime

  16. POE Scheduler P0 P1 P2 Isend(1) Irecv(2) Barrier Isend Wait (req) Isend(1, req) Irecv(*, req) Barrier No Match-Set Irecv(*) Barrier Barrier Isend(1, req) Barrier Recv(2) SendNext Wait(req) Recv(2) Wait(req) Wait(req) Barrier Deadlock! Isend(1) Wait Wait (req) MPI Runtime

  17. MPI_Waitany + POE Scheduler P0 P1 P2 Isend(1, req[0]) sendNext sendNext Isend(2, req[0]) Isend(1, req[0]) Recv(0) Barrier Waitany(2, req) Barrier Recv(0) Isend(2, req[1]) Recv(0) Waitany(2,req) Barrier Barrier MPI Runtime

  18. MPI_Waitany + POE Scheduler P0 P1 P2 Isend(1,req[0]) Isend(1, req[0]) Isend(2, req[0]) Isend(1, req[0]) Recv(0) Barrier Waitany(2, req) Recv Barrier Recv(0) Isend(2, req[1]) Recv(0) Barrier Waitany(2,req) Valid req[0] Barrier Error! req[1] invalid Barrier Invalid MPI_REQ_NULL req[1] MPI Runtime

  19. MPI Progress Engine Issues Scheduler P0 P1 Irecv(1, req) sendNext Scheduler Hangs Barrier Irecv(1, req) Isend(0, req) Barrier Wait(req) sendNext Wait(req) Barrier Isend(0, req) Does not Return Wait PMPI_Wait PMPI_Irecv + PMPI_Wait MPI Runtime

  20. Experiments • ISP was run on 69 examples of the Umpire test suite. • Detected deadlocks in these examples where tools like Marmot cannot detect these deadlocks. • Produced far smaller number of interleavings compared to those without reduction. • ISP run on Game of Life ~ 500 lines code. • ISP run on Parmetis ~ 14k lines of code. • Widely used for parallel partitioning of large hypergraphs • ISP run on MADRE • (Memory aware data redistribution engine by Siegel and Siegel, EuroPVM/MPI 08) • Found previously KNOWN deadlock, but AUTOMATICALLY within one second ! • Results available at: http://www.cs.utah.edu/formal_verification/ISP_Tests

  21. Concluding Remarks • Tool available (download and try) • Future work • Distributed ISP scheduler • Handle MPI + Threads • Do large-scale bug hunt now that ISP can execute large-scale codes.

  22. Implicit Deadlock Detection Scheduler P0 P1 P2 P0 : Irecv(*) P1 : Isend(P0) Irecv(*, req) Isend(0, req) Isend(0, req) No Matching Send P2 : Isend(P0) P0 : Recv(P2) Recv(2) Wait(req) Wait(req) P1 : Wait(req) P2 : Wait(req) Deadlock! Wait(req) P0 : Wait(req) MPI Runtime

More Related