The HPC Challenge (HPCC) Benchmark Suite

The HPC Challenge (HPCC)Benchmark Suite Piotr Luszczek The MathWorks, Inc. http://icl.cs.utk.edu/hpcc/

HPCC: Components Ax=b ------------------------------------------------------- name kernel bytes/iter FLOPS/iter ------------------------------------------------------- COPY: a(i) = b(i) 16 0 SCALE: a(i) = q*b(i) 16 1 SUM: a(i) = b(i) + c(i) 24 1 TRIAD: a(i) = b(i) + q*c(i) 24 2 ------------------------------------------------------- • HPL (High Performance LINPACK) • STREAM • PTRANS A ←AT+B • RandomAccess • FFT • Matrix-matrix multiply • b_eff (effective bandwidth/latency) 64 bits T[k] (+) ai T: +1 zk=xjexp(-2-1 jk/n) -1 C ← s*C + t * A*B ping pong

HPC Challenge Benchmarks Select Applications HPCC: Motivation and measurement FFT High DGEMM HPL Generated by PMaC @ SDSC Temporal locality Mission partner applications PTRANS RandomAccess STREAM Low Spatial locality High Concept Measurement Spatial and temporal data locality here is for one node/processor - i.e., locally or “in the small.”

M M M M P P P P P P P P HPL STREAM FFT ... RandomAccess(1 m) HPL (25%) system CPU thread G EP S Network Single M M M M P P P P P P P P OpenMP Vectorize Network CPU core(s) Softwaremodules Embarrassingly Parallel Computationalresources M M M M P P P P P P P P MPI Memory Interconnect Network Global HPCC: Scope and naming conventions

HPCC: Hardware probes HPC Challenge Benchmark Corresponding Memory Hierarchy HPCS Performance Targets (improvement) • Top500: solves a system Ax = b • STREAM: vector operations A = B + s x C • FFT: 1D fast Fourier transform Z = FFT(X) • RandomAccess: random updates T(i) = XOR( T(i), r ) Registers 2 Petaflops (8x) Instr. Operands 6.5 Petabytes (40x) Cache Blocks 0.5 Petaflops (200x) Local memory bandwidth latency Messages 64,000 GUPS (2000x) Remote memory Pages Disk • HPCS program has developed a new suite of benchmarks (HPC Challenge). • Each benchmark focuses on a different part of the memory hierarchy. • HPCS program performance targets will flatten the memory hierarchy, improve real application performance, and make programming easier.

HPCC: Official submission process • Download • Install • Run • Upload results • Confirm via @email@ • Tune • Run • Upload results • Confirm via @email@ Prerequisites: • C compiler • BLAS • MPI Provide detailed installation and execution environment. • Only some routines can be replaced. • Data layout needs to be preserved. • Multiple languages can be used. Results are immediately available on the Web site: • Interactive HTML • XML • MS Excel • Kiviat charts (radar plots) Optional

HPCC: Submissions over time Sum Sum HPL [Tflop/s] FFT [Gflop/s] #1 #1 STREAM [GB/s] Sum Sum RandomAccess [GUPS] #1 #1

HPCC: Comparing three interconnects Kiviat chart (radar plot) • 3 AMD Opteron clusters • Clock: 2.2 GHz • 64-processor cluster • Interconnect types • Vendor • Commodity • GigE • G-HPL • Matrix-matrix multiply • Cannot be differentiated based on • G-HPL • Matrix-matrix multiply • Available on HPCC Web site • http://icl.cs.utk.edu/hpcc/

HPCS ~102 HPC ~104 Clusters ~106 HPCC: Analysis of sample results Peta • All results in words/second • Highlights memory hierarchy • Clusters • Hierarchy steepens • HPC systems • Hierarchy constant • HPCS goals • Hierarchy flattens • Easier to program Tera Effective Bandwidth (words/second) Giga Mega Kilo Systems (in Top500 order)

TOP500 rating Data provided by HPCC database HPCC: Augmenting June TOP500

Contacts Piotr Luszczek The MathWorks, Inc.(508) 647-6767luszczek@eecs.utk.edu

The HPC Challenge (HPCC) Benchmark Suite

The HPC Challenge (HPCC) Benchmark Suite

Presentation Transcript

HPCC Systems Flavio Villanustre VP, Products and Infrastructure HPCC Systems

HPCC Best Practices Exchange Report to COPC November 16, 2010

High Performance Compute Cluster

HPCC Review

Chapel: HPCC Benchmarks

IEEE HPCC, IEEE ICESS, and CSS

HPCC Best Practices Exchange Report to CSAB April 13-14, 2010

2010 @ HPCC A Year in Review

HPCC Systems for Big Data Analytics

HPCC Review

HPCC Review

The ImmeraDesk: An Immersive NGI

Shujia Zhou, Zhiguang Wang, Ran Qi, Yelena Yesha IAB Meeting Research Report June 13, 2013

Harrop-Procter Community Forest Ideals into action

A Climate Data Portal An FY2000 HPCC Proposal

IT/HPCC

HPCC Mid-Morning Break

High Performance Compute Cluster

Immersive Collaborative Virtual Environment

Towards an Agent enabled Gird environment

High Performance Computing Cluster (HPCC)

OceanShare An FY 1999 HPCC Proposal