Ernie Chan

Collective Communication on Architectures that Support Simultaneous Communication over Multiple Links Ernie Chan

Ernie Chan Robert van de Geijn Department of Computer Sciences The University of Texas at Austin William Gropp Rajeev Thakur Mathematics and Computer Science Division Argonne National Laboratory Authors

Testbed Architecture • IBM Blue Gene/L • 3D torus point-to-point interconnect network • One rack • 1024 dual-processor nodes • Two 8 x 8 x 8 midplanes • Special feature to send simultaneously • Use multiple calls to MPI_Isend

Outline • Testbed Architecture • Model of Parallel Computation • Sending Simultaneously • Collective Communication • Generalized Algorithms • Performance Results • Conclusion

Model of Parallel Computation • Target Architectures • Distributed-memory parallel architectures • Indexing • p computational nodes • Indexed 0 … p - 1 • Logically Fully Connected • A node can send directly to any other node

Model of Parallel Computation • Topology • N-dimensional torus 0 1 2 3 4 5 6 7 9 10 11 8 12 14 15 13

Model of Parallel Computation • Old Model of Communicating Between Nodes • Unidirectional sending or receiving

Model of Parallel Computation • Old Model of Communicating Between Nodes • Simultaneous sending and receiving

Model of Parallel Computation • Old Model of Communicating Between Nodes • Bidirectional exchange

Model of Parallel Computation • Communicating Between Nodes • A node can send or receive with 2N other nodes simultaneously along its 2N different links

Model of Parallel Computation • Communicating Between Nodes • Cannot perform bidirectional exchange on any link while sending or receiving simultaneously with multiple nodes

Model of Parallel Computation • Cost of Communication α + nβ • α: startup time, latency • n: number of bytes to communicate • β: per data transmission time, bandwidth

Sending Simultaneously • Old Cost of Communication with Sends to Multiple Nodes • Cost to send to m separate nodes (α + nβ) m

Sending Simultaneously • New Cost of Communication with Simultaneous Sends (α + nβ) m can be replaced with (α + nβ) + (α + nβ) (m - 1)

Sending Simultaneously • New Cost of Communication with Simultaneous Sends (α + nβ) m can be replaced with (α + nβ) + (α + nβ) (m - 1) τ Cost of one send Cost of extra sends

Sending Simultaneously • New Cost of Communication with Simultaneous Sends (α + nβ) m can be replaced with 0 ≤τ ≤ 1 (α + nβ) + (α + nβ) (m - 1) τ Cost of one send Cost of extra sends

Sending Simultaneously • Benchmarking Sending Simultaneously • Logarithmic-Logarithmic timing graphs • Midplane – 512 nodes • Sending simultaneously with 1 – 6 neighbors • 8 bytes – 4 MB

Sending Simultaneously

Sending Simultaneously • Cost of Communication with Simultaneous Sends (α + nβ) (1 + (m - 1) τ)

Sending Simultaneously

Collective Communication • Broadcast (Bcast) • Motivating example Before After

Generalized Algorithms • Short-Vector Algorithms • Minimum-Spanning Tree • Long-Vector Algorithms • Bucket Algorithm

Generalized Algorithms • Minimum-Spanning Tree

Generalized Algorithms • Minimum-Spanning Tree • Divide p nodes into N+1 partitions

Generalized Algorithms • Minimum-Spanning Tree • Disjointed partitions on N-dimensional mesh 0 1 2 3 4 5 6 7 9 10 11 8 12 14 15 13

Generalized Algorithms • Minimum-Spanning Tree • Divide dimensions by a decrementing counter from N+1 0 1 2 3 4 5 6 7 9 10 11 8 12 14 15 13

Generalized Algorithms • Minimum-Spanning Tree • Now divide into 2N+1 partitions 0 1 2 3 4 5 6 7 9 10 11 8 12 14 15 13

Performance Results Single point-to-point communication

Performance Results my-bcast-MST

Conclusion • IBM Blue Gene/L supports functionality of sending simultaneously • Benchmarking along with model checking verifies this claim • New generalized algorithms show clear performance gains

Conclusion • Future Directions • Room for optimization to reduce implementation overhead • What if not using MPI_COMM_WORLD? • Possible new algorithm for Bucket Algorithm • Questions? echan@cs.utexas.edu

Ernie Chan

Ernie Chan

Presentation Transcript

Bill Chan

Ernie Barnes

Chan Li

Runtime Data Flow Scheduling of Matrix Computations Ernie Chan

Billy Chan

Ernie Dingo

Jackie Chan

Questions? Please contact Ernie Saunders at Ernie-CPAC@cpowners

Ernie Chan

Alan Chan

EASON CHAN

Jackie Chan

Ol ’ Ernie Hemingway

Ernie

Jackie Chan

CHAN

Jackie chan

Jackie Chan

chan

suzette chan