7.93k likes | 8.06k Views
Tutorial Collective Communication: Theory and Practice David Payne Lance Shuler Robert van de Geijn Jerrell Watts. Acknowledgements. This work was sponsored in part by the Intel Research Council and Intel Scalable Systems Division.
E N D
TutorialCollective Communication:Theory and PracticeDavid PayneLance ShulerRobert van de GeijnJerrell Watts
Acknowledgements This work was sponsored in part by the Intel Research Council and Intel Scalable Systems Division. Experiments were performed on the Intel Paragon system operated by the California Institute of Technology on behalf of the Concurrent Supercomputing Consortium. Access to this facility was provided by Intel SSD and Caltech.
Meet the Team James Overfelt • Graduate student at UT-Austin David Payne • Scalable Systems Division, Intel Corporation Lance Shuler • Masters in Math from UT-Austin. Now with SNL. Robert van de Geijn • Associate Professor of C.S. at UT-Austin Jerrell Watts • B.S. in C.S. from UT-Austin. Now Ph.D. student at Caltech
Former Contributors Mike Barnett • Ph.D. in C.S. from UT-Austin. Assistant Professor of C.S. at Univ. of Idaho Satya Gupta • Scalable Systems Division, Intel Corporation Rik Littlefield • PNL Prasenjit Mitra • Masters in C.S. from UT-Austin. Now with Oracle
Outline Part I: Theory • Model of parallel computation • Collective communications • A building block approach to library implementation Part II: Practice • Implementation on the Paragon • Performance results • Applications
Outline Part I: Theory • Model of parallel computation • Collective communications • A building block approach to library implementation Part II: Practice • Implementation on the Paragon • Performance results • Applications
Model of Parallel Computation • p nodes • physical two dimensional mesh • r rows, c columns • nodes have physical indices (i,j) • often logically viewed as a linear array • indexed 0, ... , p-1 • nodes are numbered in row-major order
0,0 0,1 0,2 0,3 1,0 1,1 1,2 1,3 2,0 2,1 2,2 2,3 • physical two dimensional mesh • r rows, c columns • nodes have physical indices (i,j)
0 1 2 3 4 5 6 7 8 9 10 11 • often logically viewed as a linear array • indexed 0, ... , p-1 • nodes are numbered in row-major order
0 1 2 3 4 5 6 7 8 • often logically viewed as a linear array • indexed 0, ... , p-1
Model of Parallel Computation • x-y routing • a node can send directly to any other node • a node can simultaneously receive and send • cost of communication • sending a message of length n between any two nodes
Model of Parallel Computation • x-y routing • a node can send directly to any other node • a node can simultaneously receive and send • cost of communication • sending a message of length n between any two nodes +n
The Cost of Communication • send a message of length n over d links • packetize the message • Example: d=6
The Cost of Communication • send a message of length n over d links • k packets • Cost:
The Cost of Communication • send a message of length n over d links • k packets • Cost: • Example revisited ...