280 likes | 299 Views
This research presents a formal model and analysis of MPI functions, aiming to eliminate concurrency bugs in HPC programs. The study focuses on checking and integrating litmus tests into VisualStudio, revealing omissions in standards to enhance library design. Challenges and modeling approaches for library semantics and SW model checking are discussed, emphasizing formal descriptions for deeper understanding and testing. The contributions include executable specifications of MPI operations, integration with Microsoft tools, and a customized MPI model checker. The study serves as a valuable resource for future library design.
E N D
An Approach to Formalization and Analysis of Message Passing Libraries Robert Palmer Intel Validation Research Labs, Hillsboro, OR (work done at the Univ of Utah as PhD student) Michael DeLisi (undergraduate; his first research paper) Ganesh Gopalakrishnan Robert M. Kirby School of Computing University of Utah Supported by: Microsoft HPC Institutes NSF CNS 0509379
MPI is the de-facto standard for programming cluster machines (BlueGene/L - Image courtesy of IBM / LLNL) (Image courtesy of Steve Parker, CSAFE, Utah) Our focus: Eliminate Concurrency Bugs from HPC Programs !An Inconvenient Truth: Bugs More CO2 , Bad Numbers !
So many ways to eliminate bugs … • Our Contribution: • A Formal Model of 50 (of the 300) MPI functions • An execution environment from which to • check simple “litmus tests” • VisualStudio Integration with Microsoft Phoenix • Compiler Front-end • Has spawned other research • (e.g. develop POR for MPI programs, and • In-Situ model checker for MPI Programs) • Formalization helped reveal omissions in standard • Can potentially help designer understand today’s • complex standards • Recommended for future libraries (APIs)
A Simple MPI /C Program 1/3/2020 /* Add-up integrals calculated by each process */ if (my_rank == 0) { total = integral; for (source = 0; source < p; source++) { MPI_Recv(&integral, 1, MPI_FLOAT,source, tag, MPI_COMM_WORLD, &status); total = total + integral; } } else { MPI_Send(&integral, 1, MPI_FLOAT, dest, tag, MPI_COMM_WORLD); }
Library Semantics Dictates Behaviore.g. mismatched send/recv causing deadlock p0:fr 0 p0:fr 1 p0:fr 2 p1:to 0 p2:to 0 p3:to 0 1/3/2020 /* Add-up integrals calculated by each process */ if (my_rank == 0) { total = integral; for (source = 0; source < p; source++) { MPI_Recv(&integral, 1, MPI_FLOAT,source, tag, MPI_COMM_WORLD, &status); total = total + integral; } } else { MPI_Send(&integral, 1, MPI_FLOAT, dest, tag, MPI_COMM_WORLD); }
Challenges for SW Model Checking • Build Debugging Tools that “understand” • Library Semantics • Perform Static Analysis and Model • Reductions modulo Library Semantics! • A new world-order where the embedding program serves as a ‘control scaffolding’ with the “action” being within library calls 1/3/2020
Library Semantics Modeling Approaches • Natural Language Documents • They alone don’t suffice (obvious drawbacks) • Formal Descriptions • - Use standard notations, ideally executable 1/3/2020
Practitioners must be able to benefit… They must be able to gain deeper understanding of the library thru the spec Must be able to submit “litmus tests” and see outcomes in familiar ways 1/3/2020
Example: Challenge posed by a 5-line MPI program… p0: { Irecv(rcvbuf1, from p1); Irecv(rcvbuf2, from p1); … } p1: { sendbuf1 = 6; sendbuf2 = 7; Issend(sendbuf1, to p0); Isend (sendbuf2, to p0); … } • In-order message delivery (rcvbuf1 == 6) • Can access the buffers only after a later wait / test • The second receive may complete before the first • When Issend (synch.) is posted, all that is guaranteed • is that Irecv(rcvbuf1,…) has been posted 1/3/2020
Our Contributions Formal Executable Spec of the point-to-point operations of MPI – written in TLA+ Simple MPI / C programs are compiled into TLA+ models and linked with Formal Semantics – all under Microsoft VisualStudio Errors in Litmus Tests generate error traces that can step the Visual-Studio debugger Same Framework includes a customized MPI model checker and soon a Dynamic Execution-based Model Checker with DPOR 1/3/2020
Executable Formal Specification and MPIC Model Checker Integration into VS Visual Studio 2005 Verification Environment Phoenix Compiler MPIC IR TLA+ MPI Library Model TLA+ Prog. Model MPIC Program Model TLC Model Checker MPIC Model Checker 1/3/2020
MPI Formal Specification Organization Requests Collective Context Group Communicator Point to Point Operations Collective Operations Constants MPI 1.1 API 1/3/2020
Related Work (Formalization and tool integration) • Use of TLA+ (or similar notations) to write executable specs is nothing new • Use to model a subset of MPI is new • Integration with VS and VS-debugger (or similar tools) may help designers become comfortable with formal specs
Related Work (MPI formalization) Siegel (VMCAI 2007) has proposed a Promela model for MPI Uses Promela constructions to mimic MPI behavior Uses the Promela / C interface Uses an elaborately hand-crafted state machine Is much faster, and rides on established technology The declarative reading (emphasizing “what”) is lost
Concluding Remarks (1 of 3) • Quote from Lynn Conway (quote in VLSI, paraphrased): “There are two realities that must be met – - the architecture of a microprocessor, and - the polygons of the layout. Everything in-between is a luxury to be availed depending on our resources.”
Concluding Remarks (2 of 3) • In our world: - The executable formal spec: the “what” (architecture) - An In-Situ Dynamic Partial Order Reduction (ISP) model checker is the “how” (or “polygons”) (paper in EuroPVM / MPI 2007) - The “in-between” is a customized MPI model checker for some of its constructs (PADTAD 2007)
Concluding Remarks (3 of 3) • Formal Spec of Concurrency Libraries is essential for the development of a whole range of FV tools • It helps programmers avoid misunderstandings about the library • It can help during the platform testing of Library Implementations (think about multicores and transactions used in future library implementations The Model Checking Community and the Formal Spec Community must work hand-in-hand in addressing the issues in tomorrow’s Parallel and Distributed Programs
Questions ? The verification environment is downloadable from http://www.cs.utah.edu/formal_verification/mpic It is at an early stage of development
Answers! • We are extending it to Collective Operations • lesson learned from de Supinski • We may perform Formal Testing of MPI Library Implementations based on the Formal Semantics • We plan to analyze mixed MPI / Threads • That is a very good question – let’s talk!