1 / 53

Evaluating Performance Information for Mapping Algorithms to Advanced Architectures

This research paper delves into methodologies for gathering performance data in High-Performance Computing (HPC) systems and evaluating the impact of mapping algorithms on advanced architectures. It addresses how factors like compiler optimizations and parallel interactions affect system performance, presenting a comprehensive tuning methodology. The study involves mapping processes, performance analysis tools, and the utilization of instrumentation and libraries to optimize algorithms. By conducting experiments and employing statistical analyses, the study aims to reduce the programmer's burden and enhance system-software interactions. The integration of performance data and code facilitates informed decision-making for programmers, ultimately improving the efficiency of HPC systems.

Download Presentation

Evaluating Performance Information for Mapping Algorithms to Advanced Architectures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evaluating Performance Information for Mapping Algorithms to Advanced Architectures Nayda G. Santiago, PhD, PE Electrical and Computer Engineering Department University of Puerto Rico, Mayaguez Campus Sept 1, 2006

  2. Outline • Introduction • Problems • Methodology • Objectives • Previous Work • Description of Methodology • Case Study • Results • Conclusions • Future Work

  3. Introduction • Problem solving on HPC facility • Conceptualization • Instantiation • Mapping • Parallel Implementation • Goal • Can we obtain metrics to characterize what is happening in the HPC system? • Test a methodology for obtaining information from HPC system. • Compare with current results.

  4. Introduction Mapping Process Source Code Compiler Linker Executable File Running Program Instrumentation Libraries Measurement

  5. Introduction • Application Programmer Decisions • Programming paradigm • Language • Compiler • Libraries • Advanced architecture • Programming style • Algorithms

  6. Problems • Different factors affect computer performance of an implementation. • Information of high-level effects is lost in the mapping process • Out of order execution • Compiler optimizations • Complex interactions of parallel code and systems • Current performance analysis tools not appealing

  7. Current Tuning Methodology System Configuration Programming Style Programming Paradigm Languages High-level Code Computer System Instrumentation Tools Libraries Algorithms Performance Data Use Programmer Analysis and Evaluation Tools Evaluation Experience Knowledge On Tools In-depth Knowledge On Computer System Understand Relations Between Performance Data and Code Burden on Programmer

  8. New Tuning Methodology Computer System High-level Code Instrumentation Tools Experimentation Alternatives Performance Data Problem Solving Environment Modify Statistical Data Analysis Programmer Knowledge-Based System Suggestions Information

  9. My work Computer System High-level Code Instrumentation Tools Experimentation Alternatives Performance Data Modify Statistical Data Analysis Programmer Knowledge-Based System Suggestions Information Proposed Tuning Methodology

  10. Integrative Performance Analysis Measurement – low level information is collected Abstraction – low-level information is hidden Metrics Problem Translation Mapping back to user’s view? User’s View System Levels • Machine • OS • Node • Network • Tools • High-level Language • Domain Factors

  11. Objectives • Obtain information on the relation between low-level performance information and factors that affect performance. • Lessen the burden of the programmer to incorporate experience and knowledge into the tuning process. • Identify the most important metrics describing system-software interactions. • Identify how many metrics convey most of the information of the system.

  12. Methodology Preliminary Problem Analysis Design of Experiment Data Collection Data Analysis

  13. Preliminary Problem Analysis Design of Experiment Data Collection Data Analysis Methodology Computer System High-level Code Instrumentation Tools Experimentation Alternatives Performance Data Modify Statistical Data Analysis Programmer Knowledge-Based System Suggestions Information

  14. Preliminary Problem Analysis Preliminary Problem Analysis -Application -Performance goal -Potential factors affecting performance -Evaluation of alternatives -Screening experiment Factors for experimentation -Understanding of problem • Results • Profiling is useful for preliminary analysis • Contribution • Screening is required to limit number of factors in experiment • Feasibility • Significance • Due to the large number of factors affecting performance and the long running times, experimentation has not been commonly used for performance data evaluation. Screening will make it feasible.

  15. Design of Experiment (DOE) Design of Experiment -Levels of each factor -Response variable -Choice of design -Order of treatments -Design Factors • Systematic planning of experiments • Most information • Minimize effect extraneous factors • Causal relations • Correlational relations

  16. Design of Experiment • Three basic principles • Replication • Estimate experimental error • Precision • Randomization • Independence between observations • Average out effect extraneous factors • Blocking • Block – Set homogeneous experimental conditions

  17. Design of Experiment • Results • The appropriate randomization scheme, number of replications, and treatment order for the experimental runs. • Contributions • Innovative use of DOE for establishing causal relations for application tuning • The appropriate design of experiment should be selected according to the performance analysis problem • Significance • The use of DOE and ANOVA will determine the cause of the performance differences in the results

  18. Data Collection • Instrumentation • Dependent on the system • Observable computing system • Metrics can be observed in the system Data Collection -Executable File -System -Instrumentation Tool -Raw data -Sampling -Profiles

  19. Data Collection • Instrumentation • Software • Hardware • Instrumentation tool setup • Experimental runs and data collection

  20. Data Collection -Tool Configuration -Order of runs -Crontab file -System Data Collection Raw data (metrics) • Results • Measurements of the metrics observed from the system • Particular to this case study • Between 36 and 52 metrics

  21. Data Analysis Data Analysis Raw Data Information • Statistical Analysis • Correlation matrix • Multidimensional methods • Dimensionality estimation • Subset Selection • Entropy cost function • ANOVA • Post hoc comparisons

  22. Data Analysis Convert Format Raw Data Correlation Matrix Performance Data Matrix Normalize Dimension Subset Selection Information Anova Post Hoc Comparisons

  23. Data Analysis: Data Conversion • Raw data • Sampling • Profiling • Performance Data Matrix • Random process • Average • Random variable Convert Format Raw Data Performance Data Matrix

  24. Data Analysis: Data Conversion • Performance data matrix … ma[0,0] ma[0,1] ma[0,P-1] … ma[1,0] ma[1,1] ma[1,P-1] M = … … … … … ma[K-1,0] ma[K-1,1] ma[K-1,P-1] a: abs or avg k: experimental run p: metric identification number Multidimensional ma(k,p), where:

  25. Data Analysis: Data Conversion • Performance data matrix example … ExecTime[0] Pgfaults/s[0] IdleTime[0] Run 0 … Run 1 ExecTime[1] Pgfaults/s[1] IdleTime[1] M = … … … … … Run K-1 ExecTime[K-1] Pgfaults/s[K-1] IdleTime[K-1] Metric 0 Metric 1 Metric P-1

  26. Data Analysis: Correlation Study Convert Format Raw Data Correlation Matrix Performance Data Matrix Normalize Dimension Subset Selection Information Anova Post Hoc Comparisons

  27. Performance Data Matrix Correlations Correlation Matrix Data Analysis: Correlation Study • Correlation • Measure of linear relation among variables • No causal

  28. Data Analysis: Correlation Study Example

  29. Data Analysis: Correlation Study • Correlation formula • Which metrics were most correlated with execution time • Results of correlation analysis • Collinearity • Software instrumentation Where Sx and Sy are the sample estimate of the standard deviation

  30. Data Analysis: Normalization Convert Format Raw Data Correlation Matrix Performance Data Matrix Normalize Dimension Subset Selection Information Anova Post Hoc Comparisons

  31. Normalization Log normalization Min-max normalization Dimension normalization Scales of metrics vary widely Data Analysis: Normalization Performance Data Matrix Normalized Performance Data Matrix na[k,p]=log(ma[k,p]) Normalize ma[k,p]-min(mpa[k]) na[k,p] = max(mpa[k])-min(mpa[k]) ma[k,p] na[k,p]= EuclNorm(mpa[k])

  32. Data Analysis: Normalization • Normalization Evaluation • Artificially assign classes to data set • Long execution time • Short execution time • Used visual separability criteria • Principal Component Analysis (PCA) • Project data along principal components • Visualized separation of data

  33. Data Analysis: Normalization Not Normalized

  34. Data Analysis: Normalization Not Normalized

  35. Data Analysis: Normalization Min-max normalization

  36. Data Analysis: Normalization Normalizing to range (0,1)

  37. Data Analysis: Normalization Normalizing with Euclidean Norm

  38. Data Analysis: Normalization Normalizing with Euclidean Norm

  39. Data Analysis: Normalization • Results • Appropriate normalization scheme • Euclidean Normalization • Contribution • Usage of normalization schemes for performance data • Significance • Due to the effect of differences in scale, some statistical methods may be biased. By normalizing, results obtained will be due to the true nature of the problem and not caused by scale variations.

  40. Data Analysis: Dimension Estimation Convert Format Raw Data Correlation Matrix Performance Data Matrix Normalize Dimension Subset Selection Information Anova Post Hoc Comparisons

  41. Data Analysis: Dimension Estimation • Dimensionality estimation • How many metrics will explain the system’s behavior? • Scree test • Plot of eigenvalues of correlation matrix • Cumulative Percentage of Total Variation • Keep components explaining variance of data • Kaiser-Guttman • Eigenvalues of correlation matrix greater than one. Dimension P metrics K metrics K << P

  42. Data Analysis: Dimension Estimation • Example

  43. Data Analysis: Dimension Estimation • Results • Dimension reduction to approximately 18% of the size • All three methods have similar results • Contribution • Estimation of performance data sets dimension • Significance • Provides the minimum set of metrics that contain the most amount of information needed to evaluate the system

  44. Data Analysis: Metric Subset Selection Convert Format Raw Data Correlation Matrix Performance Data Matrix Normalize Dimension Subset Selection Information Anova Post Hoc Comparisons

  45. Subset Selection K metrics P metrics K << P Data Analysis: Metric Subset Selection • Subset Selection • Sequential Forward Search • Entropy Cost Function where is the similarity value of two instances

  46. Data Analysis: Metric Subset Selection • Results • Establishment of most important metrics • For case study • For experiment 1: Paging Activity • For experiment 2: Memory faults • For experiment 3: Buffer activity • For experiment 4: Mix of metrics

  47. Data Analysis: Metric Subset Selection • Contributions • The usage of: • Feature subset selection to identify the most important metrics • Entropy as a cost function for this purpose • Significance • The system is viewed as a source of information. If we can select metrics based on the amount of information they provide, we can narrow down the search for sources of performance problems.

  48. Data Analysis: ANOVA Convert Format Raw Data Correlation Matrix Performance Data Matrix Normalize Dimension Subset Selection Information Anova Post Hoc Comparisons

  49. Data Analysis: ANOVA • Analysis of Variance (ANOVA) • Cause of variations • Null hypothesis • Post Hoc Comparisons • After null hypothesis is rejected Raw Data Anova Factors If factors Cause Variations Which level? How? Significant Differences? Post Hoc Comparisons

  50. Data Analysis: ANOVA • Results • Set of factors affecting metric values and the values • Contribution • Use of ANOVA for analysis of performance metrics • Significance • ANOVA allows to identify whether the variations of the measurements are due to the random nature of the data or the factors. Incorrect conclusions may be reached if personal judgment is used.

More Related