110 likes | 213 Views
The HPC Challenge (HPCC) Benchmark Suite. Piotr Luszczek. http://icl.cs.utk.edu/hpcc/. HPCC Components. Ax=b. ------------------------------------------------------- name kernel bytes/iter FLOPS/iter -------------------------------------------------------
E N D
The HPC Challenge (HPCC)Benchmark Suite Piotr Luszczek http://icl.cs.utk.edu/hpcc/
HPCC Components Ax=b ------------------------------------------------------- name kernel bytes/iter FLOPS/iter ------------------------------------------------------- COPY: a(i) = b(i) 16 0 SCALE: a(i) = q*b(i) 16 1 SUM: a(i) = b(i) + c(i) 24 1 TRIAD: a(i) = b(i) + q*c(i) 24 2 ------------------------------------------------------- • HPL (High Performance LINPACK) • STREAM • PTRANS A ← AT+B • RandomAccess • FFT • Matrix-matrix multiply • b_eff (effective bandwidth/latency) 64 bits T[k] (+) ai T: +1 zk=xjexp(-2-1 jk/n) -1 C ← s*C + t * A*B ping pong
HPC Challenge Benchmarks Select Applications HPCC: Motivation and Measurement FFT High DGEMM HPL Generated by PMaC @ SDSC Temporal Locality Mission Partner Applications PTRANS RandomAccess STREAM Low Spatial Locality High Concept Measurement Spatial and temporal data locality here is for one node/processor - i.e., locally or “in the small”.
M M M M P P P P P P P P HPL STREAM FFT ... RandomAccess(1m) HPL(25%) system CPU thread G EP S Network Single M M M M P P P P P P P P OpenMP Vectorize Network CPU core(s) Softwaremodules Embarissingly Parallel Computationalresources M M M M P P P P P P P P MPI Memory Interconnect Network Global HPCC: Scope and Naming Conventions
HPCC: Hardware Probes HPC Challenge Benchmark Corresponding Memory Hierarchy HPCS Performance Targets (improvement) • Top500: solves a system Ax = b • STREAM: vector operations A = B + s x C • FFT: 1D Fast Fourier Transform Z = FFT(X) • RandomAccess: random updates T(i) = XOR( T(i), r ) Registers 2 Petaflops (8x) Instr. Operands 6.5 Petabyte/s (40x) Cache Blocks 0.5 Petaflops (200x) Local Memory bandwidth latency Messages 64,000 GUPS (2000x) Remote Memory Pages Disk • HPCS program has developed a new suite of benchmarks (HPC Challenge) • Each benchmark focuses on a different part of the memory hierarchy • HPCS program performance targets will flatten the memory hierarchy, improve real application performance, and make programming easier
HPCC: Official Submission Process • Download • Install • Run • Upload results • Confirm via @email@ • Tune • Run • Upload results • Confirm via @email@ Prequesites: • C compiler • BLAS • MPI Provide detailed installation and execution environment • Only some routines can be replaced • Data layout needs to be preserved • Multiple languages can be used Results are immediately available on the web site: • Interactive HTML • XML • MS Excel • Kiviat charts (radar plots) Optional
HPCC Submissions over Time Sum Sum HPL [Tflop/s] FFT [Gflop/s] #1 #1 STREAM [GB/s] Sum Sum RandomAccess [GUPS] #1 #1
HPCC: Comparing 3 Interconnects Kiviat chart (radar plot) • 3 AMD Opteron clusters • Clock: 2.2 GHz • 64-processor cluster • Interconnect types • Vendor • Commodity • GigE • G-HPL • Matrix-matrix multiply • Cannot be differentiated based on: • G-HPL • Matrix-matrix multiply • Available on HPCC website • http://icl.cs.utk.edu/hpcc/
HPCS ~102 HPC ~104 Clusters ~106 HPCC: Sample Results’ Analysis Peta • All results in words/second • Highlights memory hierarchy • Clusters • Hierarchy steepens • HPC systems • Hierarchy constant • HPCS Goals • Hierarchy flattens • Easier to program Tera Effective Bandwidth (words/second) Giga Mega Kilo Systems (in Top500 order)
TOP500 rating Data provided by HPCC database HPCC Augmenting June’07 TOP500
Contacts Piotr Luszczek http://icl.cs.utk.edu/hpcc/