440 likes | 616 Views
Predicting performance of applications and infrastructures. Tania Lorido 27th May 2011. Problem definition. Objective Predicting utilization of resources (memory , CPU, ...) on different computing systems in order to determine application behavior.
E N D
Predicting performance of applications and infrastructures Tania Lorido 27th May 2011
Problem definition • Objective • Predicting utilization of resources (memory, CPU, ...) on different computing systems in order to determine application behavior. • To predict performance if the available resources change • To change available resources in elastic infrastructures • Threescenarios • Benchmark traces on a simulator (INSEE) • NAS Parallel Benchmarks • Real applications on real systems (Data from U. of Florida) • Applications running in the cloud (Arsys)
What is INSEE? • Interconnection Network Simulation and Evaluation Environment • Input: Traces containing messages sent among nodes. • Output: Execution time • And many other network-related figures
Objectives • Get a dataset running several traces on the simulator • Create different models -> execution time prediction • Learn about ML techniques
Input traces • NAS Parallel Benchmark suite • Scientific codes implemented in Fortran + MPI • Can run in systems of different sizes • Tested with 16 or 64 tasks • Run on a real system (Kalimero-like cluster) • Captured the whole list of point-to-point messages sent between every pair of tasks.
Topologies 2D mesh 2D torus
We have… • ... a set of tasks: 16 or 64 • … a set of nodes: 256 (16x16 torus) How to assign tasks to nodes?
Partitioning • Selecting a set of nodes • Three options: random, band & quadrant • An example: • We need 4 nodes • Topology: mesh
Random Quadrant Band
Mapping • Assigning each task to one of the nodes in the set • Two options: random & consecutive • Example… • … with band partitioning
Random Consecutive
Background noise • In a real environment, several applications compete for the network. • We emulate that with random messages sent among nodes: background noise • Different levels
Experiment • A model for each trace type (7 types) • Class variable: execution time discretized in 3 bins • Width • Height (equal frequency) • Classifiers: KNN, Naive Bayes, J48 tree • 10 repeated, 5 cross-validation • Accuracy
Interpretation for results • Quite good results (80-100% of accuracy) • Background noise doesn’t affect (information gain = 0.00015) • … learning about ML techniques.
Second scenario: parallel application data from the U. of Florida
What have they done? • Run a couple of real applications on real systems to obtain datasets • Apply several regression techniques to predict execution time and other parameters related to resource usage. • KNN, LR, DT, SVM, … • Propose a new algorithm and compare it with “classical ones”
Objectives • Repeat the experiment – same results? • Discretize variables and apply classification techniques. • Multidimensional prediction
Real applications • Bioinformatics applications: • BLAST: Basic Local Alignment Search Tool • RAxML: Randomized Axelerated Maximum Likelihood
Datasets are available BLAST RAxML • 6592 data points • Two class variables • Execution time (seconds) • Output size (bytes) • 487 data points • Two class variables • Execution time (seconds) • Resident Set Size, RSS (bytes)
Attribute selection • Different sets chosen by the authors
First experiment - Regression • 10 repeated, 10 cross-validation • Classifier evaluation: Percentage error where fi = forecast value, ai = actual value Mean Percentage Error
Second experiment – Classification • Output variable discretized in 4 bins • Width • Height (equal frequency) • Predictive variables discretized applying Fayyad Irani • Makes groups trying to minimize entropy • Same classifiers, except Linear Regression and SVM • Classifier evaluation criterion: Accuracy
Interpretation • Height-based discretization: • 65 – 75% accuracy • Width-based discretization • 92 – 96% accuracy … BUT…
Attribute selection • Information gain with respect to the class is 0 (or close to) for some variables • Previous attribute selection is done based on author criterion So… we apply: • Attribute Evaluator: CfsSubsetEval • Search Method: BestFirst And the results….
Conclusions • Regression experiment repeated with the same results • Width-based discretization discarded • “Same results” after attribute selection And next… • Multidimensional prediction: • BLAST: Execution time & output size • RAxML: Execution time & memory size (RSS)
Third scenario: prediction of resource demands in cloud computing Thisisfuturework
What does Arsys offer? (I) • Traditional application and web hosting • An IaaS cloud computing platform
What does Arsys offer? (II) • A toolfortheclienttocreate and managehisownVMs: • RAM • Number of cores • Disk space • Theoretically, no limits in resource usage • Resources can be changed dynamically Elasticity
What do they want? • A tool that: • Monitors resource utilization by a user’s VM… • … and predicts future utilization to… • … proactively modify resource reservations… • … to optimize application performance… • … and cost • Initially we will focus on the prediction part
Variables to predict (an example) • Used amount of RAM. MB. • Used amount of SWAP. MB. • Amount of free disk space. MB. • Disk performance. KB/s • Processor load. MHz • Processor use percentage. • Network bandwidth usage. Kb/s
Approaches • 1/0 predictionsbasedonthreshold • Will a variable reach a certainvalue? • Interval-basedpredictions • Regression • Time series • Prediction based on trends
Predicting performance of applications and infrastructures Tania Lorido 27th May 2011