30 likes | 171 Views
SystemG Stats. 325 Mac Pro Computer nodes, each with two 4-core 2.8 gigahertz (GHZ) Intel Xeon Processors. Each node has eight gigabytes (GB) random access memory (RAM). Each core has 6 MB cache. Mellanox 40Gb/s end-to-end InfiniBand adapters and switches.
E N D
SystemG Stats • 325 Mac Pro Computer nodes, each with two 4-core 2.8 gigahertz (GHZ) Intel Xeon Processors. • Each node has eight gigabytes (GB) random access memory (RAM). Each core has 6 MB cache. • Mellanox 40Gb/s end-to-end InfiniBand adapters and switches. • LINPACK result: 22.8 TFLOPS (trillion operations per sec) • Over 10,000 power and thermal sensors • Variable power modes: DVFS control, Fan-Speed control, Concurrency throttling, Dynamic system temperature control. • Intelligent Power Distribution Unit: Dominion PX Amplified phase at function level. Energy Profiling And Analysis Of The HPC Challenge Benchmarks Scalable Performance Laboratory Department of Computer Science Virginia Tech Shuaiwen Song, Hung-ching Chang, Rong Ge†, Xizhou Feng†, Dong Li, and Kirk W. Cameron s562673@vt.edu, hcchang@vt.edu, lid@cs.vt.edu, rong.ge@marquette.edu, xizhou.feng@gmail.com, cameron@vt.edu † Also affiliated with Marquette University. • Spatio-temporal locality vs. Avg Power Use • HPCC is designed to stress all the aspects of a high-performance • system including CPU, memory, disk, and network. We characterized • HPCC results based on data locality. • Since lower temporal and spatial locality imply higher average • memory access delay times, applications with (low, low) • temporal-spatial locality use less power on average. • Since higher temporal and spatial locality imply lower average • memory access delay times, applications with (high, high) • temporal-spatial locality use more power on average. • Mixed temporal and spatial locality implies mixed • results that fall between the average power ranges of • (high, high) and (low, low) temporal-spatial locality codes. Energy Analysis of the HPC Challenge Benchmarks System G and PowerPack 2.0 What makes System G soGreen? Key Findings (1) This work identifies power profiles by system component and application function level. (2) This work reveals the correlation between spatio-temporal locality and energy use for these benchmarks. (3) This work explores the relationship between scalability and energy use for high-end systems. System G (Green) : System G provides a research platform for the development of high-performance software tools and applications with extreme efficiency at scale. About the HPC Challenge Benchmarks HPC Challenge (HPCC) benchmarks are specifically designed to stress aspects of application and system design ignored by NAS Benchmarks and LINPACK to aid in system procurements and evaluations. HPCC organizes the benchmarks into four categories; each category Represents a type of memory access pattern characterized by the Benchmarks’ memory access spatial and temporal locality. We use a classification scheme to separate performance phases that make up the HPCC benchmark suites as shown in the table: 1: Local (single processor) 2. Star (Embarrassingly parallel ) 3.Global(explicit parallel data communications) System G provides a research platform for the development of high-performance software tools and applications. Results II: Detailed Function-level Analysis Results I: Power Profiling and Analysis Detailed power/energy/performance profiling and analysis of various global benchmarks of HPCC including scalability tests, parallel efficiency and power-function mapping. • Analysis: • 1) Each test in the benchmark suite stresses processor and memory power relative to their use. For example, as Global HPL and Star DGEMM have high temporal and spatial locality, they spend little time waiting on data and stress the processor's floating point execution units intensively consuming more processor power than other tests. • Changes in processor and memory power profiles correlate to communication to computation ratios. Power varies for global tests such as PTRAN, HPL, and MPI_FFT because of their computation and communication phases. • Disk power and motherboard power are relatively stable over all tests. • 4) Processors consume more power during GLOBAL and STAR tests since they use all processor cores in the computation. LOCAL tests use only one core per node and thus consume less energy. • Conclusions: • Each application has a unique power profile characterized by power distribution among major system components. • The power profiles of the HPCC benchmark suite reveal power boundaries for real applications. • Energy efficiency is a critical issue in high performance computing that requires further study since the interactions between hardware and application affect power usage dramatically. • The PowerPack 2.0 Framework • Components: • Hardware power/energy profiling • Software power/energy profiling control • Software system power/energy control • Data collection/fusion/analysis • System under test • Main features: • a) Direct measurements of the power consumption of a system’s major components (i.e. CPU, Memory, and disk, etc) and /or an entire computing unit. • Automatic logging of power profiles and synchronization to application source code. • Scalable, fast, and accurate. Detailed power profiles for four Global HPCC benchmarks across eight computing nodes with 32 cores. HPCC Power Profile of Full Benchmark Run The power signatures of each application are unique. In the figure below, power consumption is separated by major computing components including CPU, Memory, Disk and Motherboard. These four components capture nearly all the dynamic power usage of the system. Detailed power-function mapping of MPI_FFT in HPCC. features Energy Profiling and Efficiency Under Strong Scaling and Weak Scaling of HPCC analyze profile The figure above shows that parallel computation changes the locality of data accesses and impacts the major computing components’ power profiles over the execution of the benchmarks. A snapshot of the HPCC power profile. The entire run of HPCC consists of seven micro benchmark tests in the order as follows. 1. PTRANS, 2 HPL, 3. Star DGEMM + single DGEMM, 4. Star STREAM, 5. MPI_RandomAccess, 6. Star_RandomAccess, 7. Single_RandomAccess, 8. MPI_FFT, Star_ FFT, single FFT and latency/bandwidth. PowerPack Framework Strong Scaling Weak Scaling Portions of this work have appeared in the following publications: Shuaiwen Song, Rong Ge, Xizhou Feng, Kirk W. Cameron, “Energy Profiling and Analysis of HPC Challenge Benchmarks,” International Journal of High Performance Computing Applications, Vol. 23, No. 3, 265-276 (2009). Rong Ge, Xizhou Feng, Shuaiwen Song, Hung-Ching Chang, Dong Li, Kirk W. Cameron, "PowerPack: Energy Profiling and Analysis of High-Performance Systems and Applications," IEEE Transactions on Parallel and Distributed Systems, to appear (2009). The authors would like to thank the National Science Foundation for support of this work under grants CCF #0848670, CNS #0720750, and CNS #0709025.