200 likes | 230 Views
Learn about parallel computing and how it can be useful, including different parallel paradigms, how to parallelize problems, and an overview of the Message Passing Interface (MPI) standard.
E N D
FLASH TutorialMay 13, 2004 Parallel Computing and MPI
What is Parallel Computing ?And why is it useful • Parallel Computing is more than one cpu working together on one problem • It is useful when • Large problem, could take very long • Data size too big to fit in the memory of one processor • When to parallelize • Problem could be subdivided into relatively independent tasks • How much to parallelize • While the speedup in computation relative to single processor is of the order of number of processors
Parallel paradigms • SIMD – Single instruction multiple data • Processors work in lock-step • MIMD – Multiple instruction multiple data • Processors do their own thing with occasional synchronization • Shared Memory • One way communications • Distributed Memory • Message passing • Loosely Coupled • When the process on each cpu is fairly self contained and relatively independent of processes on other cpu’s • Tightly Coupled • When cpu’s need to communicate with each other frequently
How to Parallelize • Divide a problem into a set of mostly independent tasks • Partitioning a problem • Tasks get their own data • Localize a task • They operate on their own data for the most part • Try to make it self contained • Occasionally • Data may be needed from other tasks • Inter-process communication • Synchronization may be required between tasks • Global operation • Map tasks to different processors • One processor may get more than one task • Task distribution should be well balanced
New Code Components • Initialization • Query parallel state • Identify process • Identify number of processes • Exchange data between processes • Local, Global • Synchronization • Barriers, Blocking Communication, Locks • Finalization
MPI • Message Passing Interface, standard for distributed memory model of parallelism • MPI-2 will support one-way communication, commonly associated with shared memory operations • Works with communicators; a collection of processors • MPI_COMM_WORLD default • Has support for lowest level communication operations and composite operations • Has blocking and non-blocking operations
Communicators COMM1 COMM2
Low level Operations in MPI • MPI_Init • MPI_Comm_size • Find number of processors • MPI_Comm_rank • Find my processor number • MPI_Send/Recv • Communicate with other processors one at a time • MPI_Bcast • Global data transmission • MPI_Barrier • Synchronization • MPI_Finalize
Advanced Constructs in MPI • Composite Operations • Gather/Scatter • Allreduce • Alltoall • Cartesian grid operations • Shift • Communicators • Creating subgroups of processors to operate on • User-defined Datatypes • I/O • Parallel file operations
0 1 2 0 1 All to All 2 3 0 1 Point to Point 2 3 Collective 0 1 2 3 0 1 2 3 One to All Broadcast Shift Communication Patterns
Communication Overheads • Latency vs. Bandwidth • Blocking vs. Non-Blocking • Overlap • Buffering and copy • Scale of communication • Nearest neighbor • Short range • Long range • Volume of data • Resource contention for links • Efficiency • Hardware, software, communication method
Parallelism in FLASH • Short range communications • Nearest neighbor • Long range communications • Regridding • Other global operations • All-reduce operations on physical quantities • Specific to solvers • multi-pole method • FFT based solvers
Domain Decomposition P1 P0 P2 P3
Border Cells / Ghost Points • When splitting up solnData, need data from other processors. • Need a layer of cells from each processor • Need to update each time step
Border/Ghost Cells Short Range communication
MPI_Cart_create Create topology MPE_Decomp1d Domain decomp on topology MPI_Cart_shift Who’s on the left/right? MPI_SendRecv Ghost cells left MPI_SendRecv Ghost cells right MPI_Comm_rank MPI_Comm_size Manually decompose grid over processors Calculate left/right MPI_Send/MPI_Recv Carefully to avoid deadlocks Two MPI Methods for doing it
Adaptive Grid Issues • Discretization not uniform • Simple left-right guard cell fills inadequate • Adjacent grid points may not be mapped to the nearest neighbors in processors topology • Redistribution of work necessary
Regridding • Change in number of cells/blocks • Some processors get more work than others • Load imbalance • Redistribute data to even out work on all processors • Long range communications • Large quantities of data moved
Other parallel operations in FLASH • Global max/sum etc (Allreduce) • Physical quantities • In solvers • Performance monitoring • Alltoall • FFT based solver on UG • User defined datatypes and file operations • Parallel I/O