90 likes | 157 Views
Parallel Scaling of parsparsecircuit3.c. Tim Warburton. 1 process per node. In these tests we only use one out of two processors per node. blackbear: 16 processors, 16 nodes. blackbear: 16 processors, 16 nodes.
E N D
Parallel Scaling of parsparsecircuit3.c Tim Warburton
1 process per node • In these tests we only use one out of two processors per node.
blackbear: 16 processors, 16 nodes Apart from the mpi_allreduce calls, this is an almost perfect picture of parallelism
2 Processes Per Node • We use both processors on each node
blackbear 8 nodes, 16 processes Notice, the prevelance of waitany. Clearly this code is not working as well as itdoes when running with 1 process per node.
blackbear 8 nodes, 16 processes(zoom in) I suspect that the threaded mpi communicators for the unblockedisend and irecv are competing for cpu time with the user code. Also – there could be competition for the memory bus and the network busbetween the processors.
Timings for Two Processes Per Nodes on Los Lobos Timings courtesy of Zhaoxian Zhou