200 likes | 405 Views
Extending PAPI to Multiple Measurement Domains. Jack Dongarra, Kevin London, Shirley Moore, Philip Mucci, Daniel Terpstra, and Haihang You University of Tennessee and Oak Ridge National Laboratory. Motivation. Increasing cpu speeds and densities places greater importance on:
E N D
Extending PAPI toMultiple Measurement Domains Jack Dongarra, Kevin London, Shirley Moore, Philip Mucci, Daniel Terpstra, and Haihang You University of Tennessee and Oak Ridge National Laboratory
Motivation • Increasing cpu speeds and densities places greater importance on: • Thermal health and management • Power consumption • Higher processor counts make communications metrics more critical: • Bandwidth • Latency • Dropped packets • Bytes transferred • Industry standard interfaces don’t exist to measure these metrics. • Hybrid machines require simultaneous access to multiple processor counter substrates.
PAPI High Level PAPI Low Level Portable Layer • Hardware Independent Layer PAPI Machine DependentSubstrate Machine Specific Layer KernelExtension Operating System Hardware Performance Counters PAPI 3.0 Design
PAPI High Level PAPI High Level PAPI Low Level PAPI Low Level Portable Layer Portable Layer • Hardware Independent Layer • Hardware Independent Layer PAPI CPU DependentSubstrate PAPI Machine DependentSubstrate PAPI Network DependentSubstrate Machine Specific Layer Machine Specific Layer KernelExtension KernelExtension KernelExtension Operating System Operating System Operating System Hardware Performance Counters Hardware Performance Counters Off-Processor Hardware Counters PAPI 4.0 Multiple Substrate Design
Multiple Measurements • HPCC HPL benchmark on Opteron with 3 performance metrics: • FLOPS, Temperature, Network Sends/Receives • Temperature is from an on-chip thermal diode
Multiple Measurements • HPCC HPL benchmark on Opteron with 3 performance metrics: • FLOPS, Temperature, Network Sends/Receives • Temperature is from an on-chip thermal diode
For More Information • http://icl.cs.utk.edu/papi/ • Software and documentation • Reference materials • Papers and presentations • Third-party tools • Mailing lists • Team members: • Jack Dongarra, Kevin London, Shirley Moore, Philip Mucci, Daniel Terpstra, Haihang You
Correlating Temperature and PAPI Events • Can Multi-Substrate PAPI be used to correlate temp with PAPI presets? • Measure temperature & all 42 PAPI presets on Opteron cluster across HPCC suite. • Statistically examine results for correlations using cluster analysis and principal component analysis.
Dendrogram of temperature and PAPI events • Cluster analysis shows 8 PAPI preset events with similar behavior to the temperature. • Half are L2 cache related. • Also: • Resource stalls • Hardware interrupts • TLB misses • Total cycles ACPI_TEMP PAPI_TLB_TL PAPI_TOT_CYC PAPI_HW_INT PAPI_RES_STL PAPI_L2_TCM PAPI_L2_STM PAPI_L2_DCM PAPI_L2_DCR
Principal Component Analysis • Simplifies a dataset by transforming to a new coordinate system. • The principal component contains the greatest variance. • In this example, the first two components contain the bulk of the temperature variance.
First Principal Component • Inversely • Proportional: • PAPI_TLB_TL • PAPI_L2_STM • PAPI_RES_STL • Inversely • Proportional: • PAPI_TLB_DM • PAPI_L2_STM • PAPI_FPU_IDL • Proportional: • PAPI_L1_TCA • PAPI_L1_TCH • PAPI_L1_ICR • PAPI_L1_ICA • PAPI_L1_DCH • PAPI_FML_INS • PAPI_L1_DCA • PAPI_FAD_INS • PAPI_FP_OPS • PAPI_L1_ICH • Proportional: • ACPI_THERM • PAPI_TOT_INS • PAPI_FP_INS
Proportional PAPI_L1_ICH PAPI_L1_ICR PAPI_L1_DCH PAPI_L1_DCA PAPI_TOT_INS PAPI_VEC_INS PAPI_FML_INS PAPI_FP_INS PAPI_FAD_INS PAPI_FPU_IDL PAPI_TLB_DM PAPI_TLB_TL PAPI_HW_INT PAPI_RES_STL PAPI_L2_TCM PAPI_L2_DCM PAPI_L1_TCM PAPI_L1_DCM Inversely Proportional First vs. Second Principal Component
Temperature Correlation • Multi-Substrate PAPI made it easy to collect data needed to analyze and reduce the number of performance metrics required • Found approximately 10 events that are either directly or inversely proportional • Redundancy suggests using as few as 4-5 events to estimate temperature • Potential for automated search for relevant performance metrics on new hardware
PAPI 4.0 Status • Multi-substrate development complete • Some CPU platforms not yet ported • Substrates available for • ACPI (Advanced Configuration and Power Interface) • Myrinet MX • Substrates under development for • Infiniband • GigE • Friendly User release available now for CVS checkout • Release target: Q3, 2006 Acknowledgement: This work was supported by the U.S. Department of Energy Los Alamos Computer Science Institute under subcontract R7A827-79200 through Rice University.
PAPI 4.0 • Multi-substrate work complete • Substrates available for • ACPI (Advanced Configuration and Power Interface ) • Myrinet MX • Substrates under development for • Infiniband • GigE • Friendly User release available now for CVS checkout • PAPI 4.0 Beta release expected Q3, 2006
Support Slide:Setting up the counters • Test is run on 1.4 GHz AMD Opteron • Supports 42 PAPI preset events • 4 hardware counters • HPCC calls function to setup PAPI events and uses a timer • Run on 1 processor, interested in temperature of 1 processor • Multiway multiplexing • Need 11 eventsets to monitor all events • Each eventset gets a 20 ms timeslice • Randomized order of eventsets • After 5 iterations log results • Resulted in 1631 logged results of 43 different performance metrics (42 PAPI presets & 1 temperature)