1 / 14

A Look at Application Performance Sensitivity to the Bandwidth and Latency of Infiniband Networks

A Look at Application Performance Sensitivity to the Bandwidth and Latency of Infiniband Networks. Darren J. Kerbyson Performance and Architecture Laboratory ( PAL ) http://www.c3.lanl.gov/pal Computer and Computational Sciences Division Los Alamos National Laboratory.

eshe
Download Presentation

A Look at Application Performance Sensitivity to the Bandwidth and Latency of Infiniband Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Look at Application Performance Sensitivity to the Bandwidth and Latencyof Infiniband Networks Darren J. Kerbyson Performance and Architecture Laboratory (PAL) http://www.c3.lanl.gov/pal Computer and Computational Sciences Division Los Alamos National Laboratory

  2. Performance and Architecture Lab • Performance analysis team at Los Alamos • Measurement • Modeling • Simulation • Large-scale: • systems (10,000s to 100,000s processors) • applications • Analyze existing systems (or near-to-market systems) • Examine possible future systems • e.g. IBM PERCS (DARPA HPCS), next generation Blue Gene, … • Recent work includes: • Modeling and optimization of ASCI Q (SC03 best paper) • Comparison of systems: e.g. Earth Simulator, & Top 5 (CCPE05) • Blue Gene/L (SC04) • Large-scale Optical Circuit Switch network (SC05)

  3. Assessing impact of network performance • Context • What would be the performance improvement we had: • a network with higher bandwidth ? • a network with lower latency ? • Is it worth procuring an enhanced configuration ? • Approach • Use application performance models • Application abstraction encapsulating performance related features • Compute factors: single processor / node performance • Parallel factors: boundary exchanges, collectives etc. • Parameterized in terms of system characteristics • Node characteristics • Network characteristics (inc. bandwidth and latency)

  4. Applications • Three applications of interest to Los Alamos: • Sweep3D: kernel application representing the heart of a deterministic SN transport calculation • SAGE: AMR hydrocode for shock propagation • Partisn: Deterministic SN transport code • Performance models previously developed • Validated on large-scale systems including: • Blue Gene/L (Lawrence Livermore) up to 32K nodes • Red Storm (Sandia) up to 8K processors • ASCI Q (Los Alamos) up to 8K processors • Typical ~10% error • Once validated can be used to explore performance on new systems

  5. Application characteristics

  6. Network Characteristics • Latency of 4µs seems optimistic (currently) • Latency of 1.5µs is close to PathScale (1.29µs) • Achievable bandwidth assumed is ~80% of peak • Infiniband fabric assumed to be a 12-ary fat-tree with switch latency of 200ns.

  7. Performance studies • Sensitivity to network bandwidth and latency • 4x, 8x, & 12x bandwidths • 4µs, & 1.5µs near-neighbor latency • Effect of node size • Varying the number of processors in a node • Assumes single-core but applicable to multi-core • Assumes node: 2GHz AMD Opterons • Use of measured single processor performance • Vary system size • From 1 processor up to 8,192 processors • Concentrate on 256, 512 and 1024 processor clusters

  8. Communication cost - example Sweep3D SAGE • 4x, 8x, 12x IB with near-neighbor latency of 4µs • 4-way nodes

  9. Performance sensitivity: Partisn • Relative to a baseline configuration: • 4-way, 4x IB with 4µs latency • X-axis indicates node-size sensitivity (1 to 8 way) • Bar height indicates bandwidth sensitivity • 4x = lowest bar value • 12x = highest bar value • 8x = white ‘mid’ line • Difference in solid and shaded bars indicates latency sensitivity (4µs & 1.5µs)

  10. Performance sensitivity: Partisn • 512 processor cluster • Highest sensitivity to node-size • Multiple processors sharing NIC • More sensitive to bandwidth than latency

  11. Performance sensitivity: Sweep3D • 512 processor cluster • Highest sensitivity to latency • Most messages are small (~1KB) • Similar sensitivity to bandwidth and to node-size

  12. Performance sensitivity: SAGE • 512 processor cluster • Similar sensitivity to bandwidth and node size (1 to 4-way) • No change from 4 to 8-way due to application effect • Little sensitivity to latency

  13. Sensitivity summary • Says nothing about cost, or relative workload usage

  14. Conclusions • Performance improvements due to enhanced network is application dependent • Bandwidth on SAGE • Latency on Sweep • Mixture (Node-size and bandwidth) on Partisn • Compute performance dampens any performance enhancement of network • Faster processors would increase performance sensitivity to network • Performance modeling can be used to assess configurations prior to procurement

More Related