290 likes | 459 Views
Wookyeon Hwang Univ. of South Carolina George Runger Industrial Engineering Industrial, Systems, and Operations Engineering School of Computing, Informatics, and Decision Systems Engineering Arizona State University Eugene Tuv Intel.
E N D
Wookyeon HwangUniv. of South Carolina George RungerIndustrial Engineering Industrial, Systems, and Operations Engineering School of Computing, Informatics, and Decision Systems Engineering Arizona State University Eugene TuvIntel Process Monitoring with Supervised Learning and Artificial Contrasts runger@asu.edu
Statistical Process Control/Anomaly Detection Objective is to detect change in a system Transportation, environmental, security, health, processes, etc. In modern approach, leverage massive data Continuous, categorical, missing, outliers, nonlinear relationships Goal is a widely-applicable, flexible method Normal conditions and fault typeunknown Capture relationships between multiple variables Learn patterns, exploit patterns Traditional Hotelling’s T2 captures structure, provides control region (boundary), quantifies false alarms runger@asu.edu 2
Traditional Monitoring Traditional approach is Hotelling’s (1948) T-squared chart Numerical measurements, based on multivariate normality Simple elliptical pattern (Mahalanobis distance) • Time-weighted extensions, exponentially weighted moving average, and cumulative sum • More efficient, but same elliptical patterns runger@asu.edu runger@asu.edu 3
Transform to Supervised Learning • Process monitoring can be transformed to a supervised learning problem • One approach--supplement with artificial, contrasting data • Any one of multiple learners can be used, without pre-specified faults • Results can generalize monitoring in several directions—such as arbitrary (nonlinear) in-control conditions, fault knowledge, and categorical variables • High-dimensional problems can be handled with an appropriate learner runger@asu.edu
Learn Process Patterns Learn pattern compared to “structureless” alternative Generate noise, artificial data without structure to differentiate For example, f(x) = f1(x1)… f2(x2) joint distribution as product of marginals (enforce independence) Or f(x) = product of uniforms Define & assign y = +/–1 to “actual” and “artificial” data, artificial contrast Use supervised (classification) learner to distinguish the data sets Only simple examples used here runger@asu.edu 5
Learn Pattern from Artificial Contrast runger@asu.edu 6
Regularized Least Squares (Kernel Ridge) Classifier with Radial Basis Functions Model with a linear combination of basis functions Smoothness penalty controls complexity Tightly related to Support Vector Machines (SVM) Regularized least squares allows closed form solution, trades it for sparsity, may not want to trade! Previous example: challenge for a generalized learner--multivariate normal data! f(x) x2 x1 runger@asu.edu 7
RLS Classifier where with parameters g, s Solution runger@asu.edu 8
Patterns Learned from Artificial Contrast RLSC True Hotelling’s 95% probability bound Red: learned contour function to assign +/-1 Actual: n = 1000 Artificial: n = 2000 Complexity: 4/3000 Sigma2 = 5 runger@asu.edu 9
More Challenging Example withHotelling’s Contour runger@asu.edu 10
Patterns Learned from Artificial Contrast RLSC Actual: n = 1000 Artificial: n = 2000 Complexity: 4/3000 Sigma2 = 5 runger@asu.edu 11
Patterns Learned from Artificial Contrast RLSC • Actual: n = 1000 Artificial: n = 1000 • Complexity: 4/2000 • Sigma2 = 5 runger@asu.edu 12
RLSC for p = 10 dimensions runger@asu.edu 13
Tree-Based Ensembles p = 10 • Alternative learner • works with mixed data • elegantly handle missing data • scale invariant • outlier resistance • insensitive to extraneous predictors • Provide an implicit ability to select key variables runger@asu.edu 14
Nonlinear Patterns • Hotelling’s boundary—not a good solution when patterns are not linear • Control boundaries from supervised learning captures the normal operating condition runger@asu.edu
Tuned Control • Extend to incorporate specific process knowledge of faults • Artificial contrasts generated from the specified fault distribution • or from a mixture of samples from different fault distributions • Numerical optimization to design a control statistic can be very complicated • maximizes the likelihood function under a specified fault (alternative) runger@asu.edu
Tuned Control Fault: means of both variables x1 and x2 are known to increase Artificial data (black) are sampled from 12 independent normal distributions Mean vectors are selected from a grid over the area [0, 3] x [0, 3] Learned control region is shown in the right panel—approx. matches the theoretical result in Testik et al., 2004. runger@asu.edu
Incorporate Time-Weighted Rules • What form of statistic should be filtered and monitored? • Log likelihood ratio • Some learners provide call probability estimates • Bayes’ theorem (for equal sample size) gives • Log likelihood ratio for an observation xt estimated as • Apply EWMA (or CUSUM, etc.) to lt runger@asu.edu
Time-Weighted ARLs • ARLs for selected schemes applied to ltstatistic • 10-dimensional, independent normal runger@asu.edu
Example: 50 Dimensions runger@asu.edu runger@asu.edu 20
Example: 50 Dimensions Hotelling’s: left Artificial contrast: right runger@asu.edu runger@asu.edu 21
Example: Credit Data (UCI) 20 attributes: 7 numerical and 13 categorical Associated class label of “good” or “bad” credit risk Artificial data generated from continuous and discrete uniform distributions, respectively, independently for each attribute Ordered by 300 “good” instances followed by 300 “bad” runger@asu.edu runger@asu.edu 22
Artificial Contrasts for Credit Data Plot of ltover time runger@asu.edu runger@asu.edu 23
Diagnostics: Contribution Plots • 50 dimensions: 2 contributors, 48 noise variables (scatter plot projections to contributor variables) runger@asu.edu
Contributor Plots from PCA T2 runger@asu.edu
Contributor Plots from PCA SPE runger@asu.edu
Contributor Plots from Artificial Contrast Ensemble (ACE) • Impurity importance weighted by Dmeans of split variable runger@asu.edu
Contributor Plots for Nonlinear System • Contributor plots from SPE, T2 and ACE in left, center, right, respectively runger@asu.edu
Conclusions Can/must leverage the automated-ubiquitous, data-computational environment Professional obsolesce Employ flexible, powerful control solution, for broad applications: environment, health, security, etc., as well as manufacturing “Normal” sensors not obvious, patterns not known Include automated diagnosis Tools to filter to identify contributors Computational feasibility in embedded software This material is based upon work supported by the National Science Foundation under Grant No. 0355575. runger@asu.edu 29