140 likes | 292 Views
Coupling Parallel Programs via MetaChaos. Alan Sussman Computer Science Dept. University of Maryland. With thanks to Mike Wiltberger (Dartmouth/NCAR). What is MetaChaos?.
E N D
Coupling Parallel Programs via MetaChaos Alan Sussman Computer Science Dept. University of Maryland With thanks to Mike Wiltberger (Dartmouth/NCAR)
What is MetaChaos? • A runtime meta-library that achieves direct data transfers between data structures managed by different parallel libraries • Runtime meta-library means that it interacts with data parallel libraries and languages used for separate programs (including MPI) • Can exchange data between separate (sequential or parallel) programs, running on different machines • Also manages data transfers between different libraries in the same application • This often referred to as the MxN problem in parallel programming (e.g. CCA Forum)
How does MetaChaos work? • It all starts with the Data Descriptor (ESMF state) • Information about how the data in each program is distributed across the processors • Usually supplied by the library/program developer • We are working on generalizing to work with complex data distributions • MetaChaos then uses a linearization (LSA) of the data to be moved (the regions) to determine the optimal method to move data from set of regions in A (SA ) to a set of regions in B (SB) • Moving the data is a three step process LSA = lProgX(SA)LSB = LSASB = l-1ProgY(LSB) • Only constraint on this operation is each region must have the same number of elements
MetaChaos goals • Main goal is minimal modification to existing programs • To enable a program to be coupled to others, add calls to: • describe data distribution across processors – build a data descriptor • describe data to be moved (imported or exported) – build set of regions • move the data – build a communication pattern/schedule, then use it • this is the part that requires interaction with the other program
MetaChaos goals • Other main goal is low overhead and efficient data transfers • Low overhead from building schedules efficiently • take advantage of characteristics of data descriptor • Efficient data transfers via customized all-to-all message passing between source and destination processes
More details • Bindings for C/C++, Fortran77, Fortran90 coming (data descriptor issues) • similar interface to MCEL, but get direct communication (no server) • Currently message passing and program interconnection via PVM • programs/components run on whatever • heading towards Globus and other Grid services • Each model/program can do whatever it wants internally (MPI, pthreads, sockets, …) – and startup by whatever mechanism it wants (CCSM)
A Simple Example: Wave Eq Using P++ #include <A++.h>main(int argc, char **argv) {Optimization_Manager::Initialize_Virtual_Machine("",iNPES,argc,argv); doubleArray daUnm1(iNumX+2,iNumY+2),daUn(iNumX+2,iNumY+2); doubleArray daUnp1(iNumX+2,iNumY+2); Index I(1,iNumX), J(1,iNumY); // Indices for computational domain for(j=1;j<iNumY+1;j++) { daUnm1(I,j) = sin(dW*dTime + (daX(I)*2*dPi)/dLenX); daUn(If,j) = sin(dW*0 + (daX(If)*2*dPi)/dLenX); } // Apply BC omitted for space // Evolve a step forward in time for(i=1;i<iNSteps;i++) { daUnp1(I,J) = ((dC*dC*dDT*dDT)/(dDX*dDX))* (daUn(I-1,J)-2*daUn(I,J)+daUn(I+1,J)) + 2*daUn(I,J) - daUnm1(I,J); // Apply BC Omitted for space }Optimization_Manager::Exit_Virtual_Machine();}
Split into two using MetaChaos #include <A++.h>main(int argc, char **argv) {Optimization_Manager::Initialize_Virtual_Machine("",NPES,argc,argv);this_pgm = InitPgm(pgm_name,NPES); other_pgm = WaitPgm(other_pgm_name,NPES_other); Sync2Pgm(this_pgm,other_pgm); BP_set = Alloc_setOfRegion(); left[0] = 4; right[0] = 4; stride[0] = 1; left[1] = 5; right[1] = 5; stride[0] = 1; reg = Alloc_R_Block(DIM,left,right,stride); Add_Region_setOfRegion(reg,BP_Set); BP_da = getPartiDescriptor(&daUn); sched = ComputeScheduleForSender(…,BP_da,BP_set,…); for(i=1;i<iNSteps;i++) { daUnp1(I,J) = ((dC*dC*dDT*dDT)/(dDX*dDX))* (daUn(I-1,J)-2*daUn(I,J)+daUn(I+1,J)) + 2*daUn(I,J) - daUnm1(I,J);iDataMoveSend(other_pgm,sched,daUn,getLocalArray().getDataPointer); iDataMoveRecv(other_pgm,sched,daUn,getLocalArray().getDataPointer); Sync2Pgm(this_pgm,other_pgm); }Optimization_Manager::Exit_Virtual_Machine();}
Space weather framework • A set of tools/services • not an integrated framework • To allow new models/programs to interoperate (exchange data) with ones that already use the tools/interfaces • Application builder plugs together various models, specifies how/when they interact (exchange data) • There are already at least 5 physical models currently, with more to come • from CISM (Center for Integrated Space Weather Modeling, let by Boston U.)
What are we working on now? • Adding generalized block data distributions and completely irregular, explicit distributions • Infrastructure for controlling interactions between programs • the tools for building coupled applications to run in the high performance, distributed, heterogeneous Grid environment – not just a coordination language • built on top of basic Grid services (Globus, NWS, resource schedulers/co-schedulers, etc.)
What is Overture? • A Collection of C++ Classes that can be used to solve PDEs on overlapping grids • Key Features • High level interface for PDEs on adaptive and curvilinear grids • Provides a library of finite differences operators • Conservative/NonConservative • 2nd and 4Th order • Uses A++/P++ array class for serial parallel array operations • Extensive grid generation capablities
Solvers Oges, Ogmg, OverBlown Operators div, grad, bc's Grid Generator Ogen Adaptive Mesh Refinement Mappings (geometry) MappedGrid GridCollection MappedGridFunction GridCollectionFunction A++P++ array class Graphics (OpenGL) Data base (HDF) Boxlib (LBL) Overture: A toolkit for solving PDEs