András Pataricza pataric@mit.bme.hu

Advanced Data Processing Methods for Dependability Benchmarking and Log AnalysisBEHIND THE SCENE: A COLLECTION OF OBSERVATIONS NOT DESCRIBED IN OUR PAPERS AndrásPataricza pataric@mit.bme.hu

TrendsinIT • Evolution • Environment • Specification • Technology • Adaptivity • Drivers: • Run-timetaskallocation (virtualization) • Extendstocyber-physicalsystems • Functional: adaptivetaskecosystem • Context-sensitive, on-demand • Optimization of resourceuse • Computational • Communication • Energy • Traditionalpre-operationphase design movestorun-time • Off-lineassessment -> run-timecontrolconfigurationparametrization+ run-timeassessment • Assessmentcriteria • Generalizable • Reusable • Parametrizable • Coversall main aspectsneededforoperationdecisions and control • Informationsources • In vitro: benchmarking • In vivo: • fieldmesurement • Log analysis • Reusabilityconstraints • Differentlevels of detail • Anytimealgorithms • Incrementalalgorithms • Complexity • Go continous?

Typicalusecase: control of infrastructure Source: AMBER teachingmaterial

Self* Computing • Controlled computing • Autonomic • Virtualization • Cloud • Self-* properties • Emphasizes control loop • Relation to • control theory • signal processing Obstacle: wedealwithnetworks of (practically) blackboxes!

System Management as a Control Problem Control theory applied to IT Infrastructures Collect and store data about the state of the infrastucture Controlled Plant Monitoring Sensors Service Control Objective provides Controller Decision Making Software Component Control Policy Actuator Provisioning Based on human expertise or automation deployed on Effectuate changes in the infrastucture Supervised Node Monitoring / Control Node

Performability Management QoS requirementobjective metric (e.g. response time < 3 sec) metric Service metric - + metric reference provides Decision Making Set a reference value (e.g: 2.5 sec) „have some margin but do not overperform” Software Component Provisioning deployed on Reconfigure the service provider

A Simple Performance Management Pattern • A very common pattern • Simplicity • Platform support • Control/rule design? • that is practical Load Service Response Spare pool or other service Load-balanced cluster

A Simple Performance Management Pattern Forthe IT system management expert Service Sparepoolorother service Load-balancedcluster

A Simple Performance Management Pattern Forthecontrolexpert Service Sparepoolorother service Load-balancedcluster

Objective: Proactive Qualitative Performance Control Predictstate Decideonaction Service Sparepoolorother service Load-balancedcluster

Empiricaldependabilitycharacterization Operation: RTdata acquisition and monitoring Design time: modeling, analysis, testing Op. decisions Design decisions Validation Validation Service (human) Service (J2EE) Services Service (Web) Service (e-mail) IMS transaction

Empiricaldependabilitycharacterization Design time: modeling, analysis, testing • Challenges: • Incompleteness • Environmentsensitivity • Changetolerance • CORE ISSUE: • frominstanceassessmenttoprediction • Assessmentcriteria • Generalizable • Reusable • Parametrizable • KNOWLEDGE EXTRACTION

Empiricaldependabilitycharacterization Operation: RTdata acquisition and monitoring ? • Somechallenges: • Tresholdconfiguration • Embeddingdiagnosis • Embeddingforecasting • Over/undermonitoring

Approach

Workflow

Examples – pilot components Apache ~ Loadbalancer – UA (task) Tomcat (applicationspecific platform independent + implementationdep.) Linux OS Agent (platform + task) MySQL ~ VI Agent – add-on (platform + task)

Faultstakeninto account - sidenote SRDS WS 2008. 5. 10. • Source: qualitativedynamic modelling of Apache Separatework: representativeness HOW TO GENERALIZE MEASUREMENTS?

TPC-W Workload • A standard benchmark for multi-tier systems • Models an e-bookshop • Customer behavioral models: • 14 different web pages • Varying load on the system • 3 standard workload mix • Highly non-deterministic • ABSOLUTELY INAPPROPRIATE AS A PLATFORM BENCHMARK Representativeness Synthetic/naturalbenchmarks

The Problem of Over-Instrumentation • Overly complex rule set/model • V&V? • Maintenance? • Control design? • A few of variables significant w.r.t. a management goal • „control theory for IT” works do not tackle this provides metric metric metric Service Service Service metric metric metric metric metric metric Software Component Software Component Software Component Software Component Software Component metric metric metric metric metric metric metric metric metric metric metric metric metric metric metric deployed on metric metric metric metric metric metric metric metric metric metric metric metric metric metric metric metric metric metric metric metric metric metric metric metric WHAT TO MEASURE: Measurable≠ To be measured VariableSelectionProblem

Design phase - measurement • Objectives: • Design time: all candidate control variables • Runtime: few (selection) • Stress the system (scalability) to reveal operation domains and dynamics EDCC-5: PintérG., Madeira H., Vieira M., Pataricza A. and Majzik I."A Data Mining Approach to Identify Key Factors in Dependability Experiments"

Component Metrics Gathered Database + IBM Tivoli Monitoring Agent • Phenomenological service metrics: • Average response time • Failed SQL statements % • Number of active sessions • … • „Causal” metrics: • DB2 status • Buffer pool hit ratio • Average pool read/write time • Average locks held • Rows read/write rate • … No. of database metrics: MySQL: 12, Oracle: 640, DB2: 880. • Phenomenological resource metrics: • Average CPU usage • Average disk I/O usage • …

Qualitative State Definition for Prediction • Coarse control: intuitively, interval aggregate defines state • High freq. „jitter”: noise; lower level means • Aggregation interval: match prediction horizon! • Alternatively: explicitly filter out „noise” • The same intent • Presented: • Amplitude filter • Medianfiltering 0% - 25% - 45% - 100% THROUGHPUT DATA FILTERING CLASSIFICATION

Design phase – variable selection • Objective: • Control variables • As few as possible, as much as needed • mRMR (minimum Redundancy Maximum Relevance ) feature selection • Cf. AUTONOMICS 2009 paper

Variable Selection 160+ metrics METRICS: FULL DATASET Simplestatistics is insufficent – signalprocessing FILTERING VARIABLE SELECTION VARIABLES GOAL METRIC DATASET FILTERING 12 metricschosen Algorithm: mRMR

Example Selected Metrics: Median Filtering • First 7 of 12 („value” decreases) • Tomcatload/CPU alwaysin top 3 – bottleneck • Clustercharacterization: ongoingwork

Operationphase – measurement Decide on the system state based on the samples

1 Minute Prediction for Median Filtering • Qualitativepredictionaccuracy: >90% • (multipleruns; 4 hourvalidationset)

Operational domains? • Normal operational state • Internal relationships tend to be linear (with some „noise”) • Saturation (over-loaded) • Objective metrics behave linear again • Physical limits of the system • Degrading state • The point of interest! • Seemingly non-linear behaviour • mRMR metric selection better • For the specific case

Minimum MeanSquareError Shouldhavebeenmonotonicdecreasing

Conclusions

Concluding remarks • Assessmentforpredictive systemmanagement needs SIGNAL PROCESSING(at the moment more than control theory) • Shannon’s law is in there ? • Asynchronoussamplingproblem • Our experiment: design flaws • TPC-W: closed loop • Result: coupling of workload and transfer characteristics • Too strong autocorrelation of client behavior • Methodology still valid • Introducing dependability: „easy”

András Pataricza pataric@mit.bme.hu

András Pataricza pataric@mit.bme.hu

Presentation Transcript

Product-Line Architectures

Software Configuration Management

Modeling Chemical Reactions with CFD Reacting Flows - Lecture 10

How we might be able to Understand the Brain

Greek Vocabulary 5

Andragogy

Institutions, innovation and growth

Art of the Mexican Revolution

Miroslav Jursík Jiří Andr Veronika Venclová Josef Soukup E-mail: jursik@af.czu.cz

MODERN 1 st Year Review ENIAC-120003 MODERN Ref. Technical Annex MODERN_PartB Rev2 v2.4

Towards a Digital Library Theory: A Formal Digital Library Ontology

The idea

Malawi Inc.

Operating Systems

Snowbird: Interactive Resource-Intensive Applications Made Easy

MRI Acquisition Methods for Brain Morphometry

Generic and Meta-Transformations for Model Transformation Engineering

Latin America’s Bicentenary and the Maddison Legacy Maddison Memorial Conference