90 likes | 208 Views
AstroBEAR Parallelization Options. Areas With Room For Improvement. Ghost Zone Resolution MPI Load-Balancing Re-Gridding Algorithm Upgrading MPI Library. Ghost Zone Resolution. Can exceed 30% of total program execution time. Affects fixed grid as well as AMR
E N D
Areas With Room For Improvement • Ghost Zone Resolution • MPI Load-Balancing • Re-Gridding Algorithm • Upgrading MPI Library
Ghost Zone Resolution • Can exceed 30% of total program execution time. • Affects fixed grid as well as AMR • For runs using >2 processors, 98-99% of ghost zone execution time is MPI processing.
Ghost Zone Resolution Options • Duplex Transmission • Old version swaps ghost zone data serially between two processors. • Duplex transmission would have the two processors handle sending, receiving and copying concurrently. Pros: • Reduces the amount of duplicated overhead. • Makes more efficient use of worker processors. Cons: • Little reduction in the amount of MPI overhead. • Still has a high computation cost relative to the number of nodes. Status: In progress
Alternate option: Ghost Zone broadcast • Use the MPI Broadcast routines to have a grid send all its ghost zones to its neighbors at once, who then process that data and broadcast their own ghost zones when it is their turn. • Pros: • Eliminates need for pairwise iteration over level (i.e., transfer would only be done once per grid). • Cons: • Potential congestion if all a grid’s neighbors are on the same processor. • No guarantee that it’s an improvement over pairwise duplex transmission. • Status: Speculative
Load Balancing • Does it need to be done as often? • Ramses code only rebalances every ten frames. • Re-gridding happens locally as usual, but it is assumed that the AMR structure does not change enough between two iterations to warrant a load-rebalance. Pros: • Significant reduction in MPI overhead (BalanceLoads() gets called a lot). • Non-MPI overhead will likely be reduced as well, as the current load-balancing scheme recalculates the load across the entire Forest. Cons: • “patch-based AMR” vs. “tree-based AMR”; can it be adapted to AstroBEAR? • Requires implementation of some Hilbert-space algorithm—how complex/computationally intensive? Status: Speculative
Re-Gridding Parallelization • Parallelization of re-gridding is handled using MPI and OpenMP • Problem: MPI-1 limits thread usage • Only one thread for the worker processors and two for the master processor. • Only one thread on each processor is MPI-capable. • Performance bottlenecks happen if one processor gets tied up.
Advantage of Multiple Threads MPI with OpenMP, single thread MPI with OpenMP, multi-thread 0 1 0 1 2 3 2 3
Unfortunately... • LAM MPI is not thread-safe. • You can write multi-threaded applications using LAM MPI, but it is explicitly not thread-safe and so we would be responsible for maintaining MPI exclusion. • In a collaborative development environment like AstroBEAR, this is a bad idea. • LAM is making noise about supporting this eventually, but they're not there yet. • Alternatives: • Improve efficiency of pairwise message passing. • Offload more re-gridding computation to worker processors. Status: We're looking at it.