240 likes | 445 Views
Efficient and Accurate Anomaly Identification Using Reduced Metric Space in Cloud Computing Systems. Qiang Guan, Ziming Zhang and Song Fu University of North Texas. Introduction. Anomaly detection is a vital element of operations in large scale datacenter.
E N D
Efficient and Accurate Anomaly Identification Using Reduced Metric Space in Cloud Computing Systems Qiang Guan, Ziming Zhang and Song Fu University of North Texas
Introduction • Anomaly detection is a vital element of operations in large scale datacenter. • Detecting patterns in a given data set that do not conform to an established normal behavior.
Challenges • Continuous monitoring and large system scale lead to the overwhelming volume of data collected by health monitoring tool. • The large number of metrics that are measured make the data model extremely complex. • High metric dimensionality will cause low detection accuracy and high computational complexity.
This paper • Presents a metric selection framework for online anomaly detection in utility cloud. • Select most essential metrics by applying metric selection and extraction methods. • Identify anomalies using an incremental clustering approach. • Implement a prototype and evaluate the performance.
Dimensionality Reduction • Transforms the collected health-related performance data to a new metric space with only the most important metrics preserved. • In this paper: • Metric selection using mutual information. • Metric extraction by metric space combination and separation.
Metric Selection • Select the best subset of the original metric set based on mutual information. • The mutual information of two random variables is a quantity that measures the mutual dependence of the two random variables.
Metric Selection(Cont.) • However, finding the optimal metric subset id NP-hard. =>
Incremental Search Method • Given Sk-1, try to select the kth metric that maximizes dependency() from the remaining metrics in (M-Sk-1). • →S1 ⊂ S2 ⊂ ... ⊂ Sn
Incremental Search Method(Cont.) • Sn* • Find the range of i, where the cross-validation error erri has small mean and small variance. • err* = Min(erri) • n* equals to the smallest i, for which Si has err*.
Metric Extraction • Creates new metrics by transformation or combination of the original metrics. • Two methods: • Metric space combination • Metric space separation
Metric Space Combination • Dataset D = [x1, x2, …, xL] • Record xi = [x1,i, x2,i, …, xn,i] T • Covariance matrix of D: V=DDT • Calculate the eigenvalues {λi} of V and sort them in descending order. • Choose n’ by:
Metric Space Combination(Cont.) • The corresponding n’ eigenvectors are the new metrics. • Apply Gram-Schmidt orthogonalization process to compute eigenvectors {ej}.
Metric Space Separation • Separate desired data from mixed data. • Record x = [x1, x2, …, xL] T • Component e =[e1, e2, …, en’] T • x = Ae → e = Wx • Find an optimal transformation matrix W so that {ej} are maximally independent.
Metric Space Separation(Cont.) • Independent component analysis (ICA) • A computational method for separating a multivariate signal into additive subcomponents. • A special case of blind source separation.
Incremental Clustering • Data points are considered one at a time, and assigned to existing groups without affecting the existing group significantly. • “A data point goes into the nearest group if the Euclidean distance between this point and the centroid of the group smaller than δ; else create a new group.” • Update centroid after new point comes in. • Adjust δ if cloud operators find false-negative. • Normal but assigned to anomaly.
Experiment Setting • 362 servers. • Each server hosts up to ten VMs. • Benchmarks: • RUBiS distributed online service benchmark • MapReduce jobs • Fault injection • CPU, memory, disk, and network faults.
Experiment Setting(Cont.) • Monitoring tools • sysstat: runtime performance data in Dom0 • Modified perf: performance counters from hypervisor. • Total 518 metrics. • 182 + 336 • However, only 406 non-constant. • Monitor every minute from 2011/01/20 to 2011/08/11.
Metric SelectionResult • 406→14 • Metric space reduced by 96.6%
Metric Extraction Results • Metric extraction and metric selection v.s. Metric extraction only.
Conclusion • Anomaly detection is important. • self-managing cloud resources and enhancing system dependability. • They present a metric selection framework with metric selection and extraction mechanisms. • The selected and extracted metric set contributes to highly efficient and accurate anomaly detection.