1 / 9

Parallel Scaling of parsparsecircuit3.c

Parallel Scaling of parsparsecircuit3.c. Tim Warburton. 1 process per node. In these tests we only use one out of two processors per node. blackbear: 16 processors, 16 nodes. blackbear: 16 processors, 16 nodes.

Download Presentation

Parallel Scaling of parsparsecircuit3.c

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallel Scaling of parsparsecircuit3.c Tim Warburton

  2. 1 process per node • In these tests we only use one out of two processors per node.

  3. blackbear: 16 processors, 16 nodes

  4. blackbear: 16 processors, 16 nodes Apart from the mpi_allreduce calls, this is an almost perfect picture of parallelism

  5. 2 Processes Per Node • We use both processors on each node

  6. blackbear 8 nodes, 16 processes Notice, the prevelance of waitany. Clearly this code is not working as well as itdoes when running with 1 process per node.

  7. blackbear 8 nodes, 16 processes(zoom in) I suspect that the threaded mpi communicators for the unblockedisend and irecv are competing for cpu time with the user code. Also – there could be competition for the memory bus and the network busbetween the processors.

  8. Timings for M=1024 (N=1024^2)(blackbear –O3)

  9. Timings for Two Processes Per Nodes on Los Lobos Timings courtesy of Zhaoxian Zhou

More Related