420 likes | 539 Views
Proactivity = Observation + Analysis + Knowledge extraction + Action planning ?. András Pataricza, Budapest University of Technology and Economics. Contributors. Prof. G. Horváth (BME) I. Kocsis (BME) Z. Micskei (BME) K. Gáti (BME) Zs . Kocsis (IBM) I. Szombath (BME)
E N D
Proactivity = Observation + Analysis + Knowledge extraction + Action planning? András Pataricza, Budapest University of Technology and Economics
Contributors Prof. G. Horváth (BME) I. Kocsis (BME) Z. Micskei (BME) K. Gáti (BME) Zs. Kocsis (IBM) I. Szombath (BME) And manyothers
Whatcantraditionalsignalprocessinghelpforproactivity Proactivestance: Buildson foreknowledge (intelligence) and creativity to anticipate the situation as an opportunity, regardless of how threatening or how bad it looks; influence the system constructively instead of reacting
Reactivity vs. proactivity • Reactive control • „acting in response to a situation rather than creating or controlling it:” • Proactive control • „controllinga situation rather than just responding to it after it has happened:”
Test configuration Virtualdesktopinfrastructure ~ a few of tens of VM/host ~ a few of tens of host/cluster VSphere monitoring and supervisory control • Objective: • VM level SLA control • Capacityplanning, • Proactivemigration • „CPU-ready” metrics: • VM ready to run, but lack of resources to start
Performance monitoring Detecting a possibleproblemon VM orhostlevel Failureindicatoraswell
Actionstoprevent performance issue Add limitsneighbouringVMs
Actionstoprevent performance issue Livemigrate VM toother (underutilized) host
Aggregation over population Statisticalclusterbehavior versus QoS over the VM population
Mean of thegoalVM-metric (VM_CPU_READY) VM application: • readytorun • Resourcelack-> Performance bottleneck-> Availabilityproblem Vmwarerecommended threshold: • 5% watching • 10% typicallyaction is needed
The twotraps Visual processing: Youbelieveyoureyes Automatedprocessing: youbelieveyour computer
Mean of the goal VM-metric • Statistics: • Mean: 0.007 -> a goodsystem • Only 2/3 of thesamplesareerror-free • -> A badsystem • Aftereliminatingfailure-freecasesbelowthethreshold • Mean: 0.023 • -> a goodsystem Visual inspection: Lotof badvalues This is a badsystem
Hostshared and usedmemoryalongthetime • Noisy… • Highfrrequencycomponentsdominate • Buttheycorrelate (93%!) • YOU DON’T SEE IT
… and a host of more mundaneobservations • Computingpoweruse = CPU use × • CPU clkrate (const.) • Should be pureproportional • Correlationcoefficient: • 0.99998477434137 • Well-visible, butnumericallysuppressed • Origin???
Most importantfactor: host CPU usagemean • Host CPU usagevs • VM ratio: „bad” vCPUready
Impacts of temporalresultion • Nyquist–Shannon sampling theorem: • 2× sampling frequency = bandwidth • Samplingperiod = 20 sec-> Samplingfrequency = 5 Hz-> Bandwidth = 2.5 Hz • Additionaly: • Samplingclockjitter (SW sampling) • Clock skew (distributed system) • Precision Time Protocol (PTP)(IEEE 1588-2008) • No finegranularprediction
Proactivity • Proactivity needs: • Situation recognitionbased on historical experience • What is to be expected ? • Identification of the principal factors • Singlefactor /multiplefactors • Operationdomainsleadingtofailures • Boundaries • Predictor design • High failure coverage • Temporal lookahead sufficient for reaction • Design of reaction
Situationsto be covered • Single VM: applicationdemand > resourceallocated • VM-host:overcommisioning, overloadduetootherVMs • VM-host-cluster
Data preparation Data cleaning Data reduction
Data reduction • Huge initial set of samples • Reduction • Objectsampling: Represenative measurement objects • Parameterselection/reduction: • Aggregation • Relevance • Redundancy • Temporal • Sampling • Relevance
Objectsampling Inpursuit of discoveringfine-grainedbehavior and thereasonsforoutliers
Subsample: ratio > 0 + random subsampling • Forpresentationpurposesonly • - Reduction of thesamplesizeto 400 • Manageability Real-life analysis: - keepenoughdatatomaintain a propercorrelationwiththeoperation
Visual multifactoranalysis Visual analyticsfor an arbitrarynumber of factors • Inselberg, A: Parallel Coordinates, Visual Multidimensional Geometry and Its Applications, Springer 2009 • You can do much, much more • Redundancy reduction • Correlation analysis • Clustering • Data mining • Approximation • Optimization
Predictionattheclusterlevel What ratio of theVMswillbecomeproblematic?
Pinpointedintervalforone VM Situation of interest Trainingtime > Predictiontime
Classificationerror (simplestpredictor) False alarm rateis low (dominantpattern) Featuresetselectionis criticaltodetection More is less (PROPER selectionis needed – cf. PFARM 2010) Caseseparationfordifferentsituations Long termpredictionis hard (automatedreactions)
Case study – Connectivity testing in Large Networks Indynamicinfrastructurestheactiveinternodetopology has to be discoveredaswell…
Large Networks • not known explicitly • too complex forconventional algorithms • Social network graph • Yahoo! Instant Messenger friendconnectivity graph * • 1.8M nodes ~4M edges • Serve as a model ofLarge Infrastructures • Typical power law network • 75% of the friendships are related to 35% of users Yahoo! Research Alliance Webscope program *ydata-yim-friends-graph-v1_0 http://research.yahoo.com/Academic_Relations
Typical Model: Random graphs Random order: Ordered by degree: Limit: Graphon Yahoo! Instant Messenger dataset – Adjacency Matrix Preferential attachment graph
Approx. edge density by subgraph sampling Sample size k = 35 Repeated n = 20 times 2% error 4% of the graph examined Relative error White:error < 5% Sample size (k) Number of samples (n) Random, k=4 sample • Graph with 800 nodes 320000 edges • Subgraph sampling method • Random induced subgraph • Take krandom nodes • Repeat n times
Neighborhood sampling: Fault Tolerant Services Root node Redundancy? • Neighborhood sampling • take random nodes • explore neighborhood to a given depth (m) Fault Tolerant Domain Trends • No. of 3 and 4 cycles = possible redundancy • High node has many substitute nodes (e.g. load balancer) • Distribution approximated from samples are very close!
Summary: proactivityneeds Thankyouforyourattention • Observations • Allrelevantcases(Stress test) • Analysis • Check of input data • Visual analysis • UNDERSTANDING • Automatedmethodsforcalculation • Knowledge extraction • Clustering (situationrecognition) • Predictor • (generalization) Action planning • Situationdefiningprincipalfactorsareindicative