220 likes | 455 Views
An Analysis of Node Sharing on HPC Clusters using XDMoD/ TACC_Stats. Joseph P White, Ph.D Scientific Programmer - Center for Computational Research University at Buffalo, SUNY XSEDE14 JULY 13– 18, 2014. Outline. Motivation Overview of tools (XDMOD, tacc_stats ) Background Results
E N D
An Analysis of Node Sharing on HPC Clusters using XDMoD/TACC_Stats Joseph P White, Ph.D Scientific Programmer - Center for Computational Research University at Buffalo, SUNYXSEDE14 JULY 13– 18, 2014
Outline • Motivation • Overview of tools (XDMOD, tacc_stats) • Background • Results • Conclusions • Discussion
CoAuthors • Robert L. DeLeon (UB) • Thomas R. Furlani(UB) • Steven M. Gallo (UB) • Matthew D Jones (UB) • Amin Ghadersohi (UB) • Cynthia D. Cornelius (UB) • AbaniK. Patra (UB) • James C. Browne (UTexas) • William L. Barth (TACC) • John Hammond (TACC)
Motivation • Node sharing benefits: • increases throughput by up to 26% • increases energy efficiency by up to 22% (Breslow et al.) • Node sharing disadvantages: • resource contention • Number of cores per node increasing • Ulterior motive: • Prove toolset • A. D. Breslow, L. Porter, A. Tiwari, M. Laurenzano, L. Carrington, D. M. Tullsen, and A. E. Snavely. The case for colocation of hpc workloads. Concurrency and Computation: Practice and Experience, 2013 http://dx.doi.org/10.1002/cpe.3187
Tools • XDMoD • NSF funded open source tool that provides a wide range of usage and performance metrics on XSEDE systems • Web-based interface • Powerful charting features • tacc_stats • low-overhead collection of system-wide performance data • Runs on every node on a resource collects data at job start, end and periodically during job • CPU usage • Hardware performance counters • Memory usage • I/O usage
Background • CCR's HPC resource "Rush" • 8000+ cores • Heterogeneous cluster 8, 12, 16 or 32 cores per node • InfiniBand • Panasas parallel filesystem • SLURM resource manager • node sharing enabled by default • cgroup plugin to isolate jobs • Academic computing center: higher % of smaller jobs than large XSEDE resources • All data from Jan - Feb 2014 (~370,000 jobs)
Results • Exclusive jobs: where no other jobs ran concurrently on the allocated node(s) (left hand side of plots) • Shared jobs: where at least one other job was running on the allocated node(s) (right hand side) • Process memory usage • Total OS memory usage • LLC read miss rates • Job exit status • Parallel filesystem bandwidth • InfiniBand interconnect bandwidth
Memory usage per core • (MemUsed - FilePages - Slab) from/sys/devices/system/node/node0/meminfo Memory usage per core GB Exclusive jobs Memory usage per core GB Shared jobs
Total memory usage per core(4GB/core nodes) Total memory usage per core GB Exclusive jobs Total memory usage per core GB Shared jobs
Last level cache (LLC) read miss rate per socket • UNC_LLC_MISS:READ on Intel Westmereuncore • Gives upper bound estimate of DRAM bandwidth LLC read miss rate 106/s Exclusive jobs LLC read miss rate 106/s Shared jobs
Panasas parallel filesystem write rate per node Write rate per node B/s Exclusive jobs Write rate per node B/s Shared jobs
InfiniBand write rate per node • Peaks truncated: • ~45,000 for Exclusive jobs • ~80,000 for shared jobs Write rate Log10(B/s) Exclusive jobs Write rate Log10(B/s) Shared jobs
Conclusions • Little difference on average between the shared and exclusive jobs on Rush • Majority of jobs have resource usage much less than max available • Have created data collection/processing software that facilitates easy evaluation of system usage
Discussion • Limitations of current work • Unable to determine impact (if any) on job wall time • Comparing overall average values for jobs • Shared node job statistics are convolved • Exit code not reliable way to determine failure
Future work • Use Application Kernels to get detailed analysis of interference • Many more metrics now available: • FLOPS • CPU clock cycles per instruction (CPI) • CPU clock cycles per L1D cache load (CPLD) • Add support for per job metrics on shared nodes. • Study classes of applications
Questions • BOF: XDMoD: A Tool for Comprehensive Resource Management of HPC Systems • 6:00pm - 7:00pm tomorrow. Room A602 • XDMoD • https://xdmod.ccr.buffalo.edu/ • tacc_stats • http://github.com/TACCProjects/tacc_stats • Contact info – xdmod-help@ccr.buffalo.edu
Acknowledgments • This work is supported by the National Science Foundation under grant number OCI 1203560 and grant number OCI 1025159 for the technology audit service (TAS) for XSEDE