1 / 23

Application-specific Topology-aware Mapping for Three Dimensional Topologies

Abhinav Bhatelé Laxmikant V. Kalé. Application-specific Topology-aware Mapping for Three Dimensional Topologies. Outline. Motivation The Mapping Problem Static Mapping: 3D Stencil Load Balancing: NAMD Future Work. The network latency for wormhole routing is (L f /B)*D + L/B

amal
Download Presentation

Application-specific Topology-aware Mapping for Three Dimensional Topologies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Abhinav Bhatelé Laxmikant V. Kalé Application-specific Topology-aware Mapping for Three Dimensional Topologies

  2. Outline • Motivation • The Mapping Problem • Static Mapping: 3D Stencil • Load Balancing: NAMD • Future Work

  3. The network latency for wormhole routing is (Lf/B)*D + L/B Lf = Length of each flit, B = bandwidth D = number of hops, L = length of message Lionel M. Ni and Philip K. McKinley, “A Survey of Wormhole Routing Techniques in Direct Networks”, Computer, Volume 26, Issue 2, pages 62-76, 1993

  4. Message Latencies NN = Near Neighbor, RND = Random

  5. Hardware Latencies • Blue Gene/L • Near neighbor: < 1 µs • Worst case: 7 µs • Blue Gene/P • Near neighbor: < 1 µs • Worst case: 5 µs • Corresponding differences for MPI messages

  6. Topology-aware mapping • Problem: Given a object communication graph and a processor graph, find an optimal mapping • Minimizes communication • Ensure load balance • Metric for communication traffic • Hop-bytes = number of links (hops) traversed X message size

  7. Machine Topology • Information required at runtime • No. of processors in the allocated partition • No. of processors along each dimension • Physical coordinates of each processor

  8. Communication Graph • Static • 3D Stencil: regular communication graph • Dynamic • Molecular dynamics application • Changes as atoms migrate from one processor to another

  9. Static Graph - 3D Stencil

  10. Performance

  11. Hop counts

  12. Dynamic Graph - NAMD • Molecular Dynamics (MD) application • Simulation box is a 3D cell full of atoms

  13. Load Balancing in NAMD • Measurement-based (Charm++) • Principle of persistence • Patches are statically mapped • Orthogonal recursive bisection • Computes can be migrated • Load balancing framework gathers the communication information • Goal • Minimize communication • Maximize load balance

  14. Old strategy • Greedy approach • Pick the heaviest compute • Place it on a processor with one of the patches OR • On a processor which already has a compute for this patch

  15. Hop-bytes ~17 %

  16. Future Work • Reason for contention • Heavy communication exceeding bandwidth • Link contention (such as in deterministic routing) • Use UPC/PAPI on Blue Gene/L and P

  17. Future Work • Automatic Mapping • Initial Static Mapping • Use case – meshing applications • Extend work on the Charm++ load balancers • Section-multicast aware load balancers • Useful in matrix multiplication

  18. Future Work • Optimization on other topologies • SiCortex (Kautz Graph) • Infiniband clusters (Fat-tree)

  19. Summary • Topology mapping helps! • Especially heavily communication bound applications • Static mapping • Dynamic mapping during load balancing • Automatic mapping to relieve the user

More Related