1 / 18

Feature Extraction for Outlier Detection in High-Dimensional Spaces

Feature Extraction for Outlier Detection in High-Dimensional Spaces. Hoang Vu Nguyen Vivekanand Gopalkrishnan. Motivation. Outlier detection techniques Compute distances between points in full feature space Curse of dimensionality Solution: feature extraction Feature extraction techniques

milla
Download Presentation

Feature Extraction for Outlier Detection in High-Dimensional Spaces

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Feature Extraction for Outlier Detection in High-Dimensional Spaces Hoang Vu Nguyen Vivekanand Gopalkrishnan

  2. Motivation • Outlier detection techniques • Compute distances between points in full feature space • Curse of dimensionality • Solution: feature extraction • Feature extraction techniques • Do not consider class imbalance • Not suitable for asymmetric classification (and outlier detection!) 2 Feature Extraction for Outlier Detection

  3. Overview • DROUT • Dimensionality Reduction/Feature Extraction for OUTlier Detection • Extract features for the detection process • To be integrated with outlier detectors Features Training set DROUT Testing set Outliers Detector 3 Feature Extraction for Outlier Detection

  4. Background • Training set: • Normal class ωm: cardinality Nm, mean vector μm, covariance matrix ∑m • Anomaly class ωa: cardinality Nm, mean vector μa, covariance matrix ∑a • Nm >> Na • Total number of points: Nt = Nm + Na ∑w = (Nm/Nt) . ∑m + (Na/Nt) . ∑a ∑b = (Nm/Nt) . (μm – μt) (μm – μt)T + (Na/Nt) . (μa – μt)(μa – μt)T ∑t = ∑w + ∑b 4 Feature Extraction for Outlier Detection

  5. Background (cont.) • Eigenspace of scatter matrix ∑ : (spanned by eigenvectors) • Consists of 3 subspaces: principal, noise, and null space • Solving eigenvalue problem and obtain d eigenvalues v1 ≥ v2 ≥ … ≥ vd • Noise and null subspaces are caused by noise and mainly by the insufficient training data • Existing methods: discard the noise and null subspaces  loss of information • Jiang et al. 2008: regularize all 3 subspaces before performing feature extraction Ø P N Plot of eigenvalues 0 1 m r d 5 Feature Extraction for Outlier Detection

  6. DROUT Approach • Weight-adjusted Within-Class Scatter Matrix • ∑w = (Nm/Nt) . ∑m + (Na/Nt) . ∑a • Nm >> Na ∑a is far less reliable than ∑m • Weighing ∑m and ∑a according to (Nm/Nt) and (Na/Nt) •  when doing feature extraction on ∑w (using PCA etc.), dimensions (eigenvectors) specified mainly by small eigenvalues of ∑m unexpectedly removed •  dimensions extracted are not really relevant for the asymmetric classification task • Xudong Jiang: Asymmetric principal component and discriminant analyses for pattern classification. IEEE Trans. Pattern Anal. Mach. Intell., 31(5), 2009 • Solutions • ∑w = wm . ∑m + wa . ∑a • wm < wa and wm + wa = 1 • more suitable for asymmetric classification 6 Feature Extraction for Outlier Detection

  7. DROUT Approach (cont.) • Which matrix to regularize first? • Goal: extract features that minimize the within-class and maximize the between-class variances • Within-class variances are estimated from limited training data •  small variances estimated tend to be unstable and cause overfitting •  proceed with regularizing 3 subspaces of the adjusted within-class scatter matrix 7 Feature Extraction for Outlier Detection

  8. DROUT Approach (cont.) • Subspace decomposition • Solving eigenvalue problem on (weight-adjusted) ∑w and obtain eigenvectors {e1, e2, …, ed} with corresponding eigenvalues v1 ≥ v2 ≥ … ≥ vd • Identify m: • vmed = mediani ≤ r {vi} • vm+1 = maxi≤ r {vi | vi < 2vmed – vr} Ø P N Plot of eigenvalues 0 1 m r d 8 Feature Extraction for Outlier Detection

  9. DROUT Approach (cont.) • Subspace regularization • a = v1 . vm . (m – 1)/(v1 – vm) • b = (mvm – v1)/(v1 – vm) • Regularize: • i ≤ m: xi = vi • m < i ≤ r: xi = a/(i + b) • r < i ≤ d: xi = a/(r + 1 + b) • A = [ei . wi]1 ≤ i ≤ d where wi = 1/sqrt(xi) Ø P N 0 1 m r d 9 Feature Extraction for Outlier Detection

  10. DROUT Approach (cont.) • Subspace regularization • pT = AT . p with p being a data point • Form new (weight-adjusted) total scatter matrix (slide 4) and solve the eigenvalue problem using it • B = matrix of c resulting eigenvectors with largest eigenvalues •  feature extraction done only after regularization  limit loss of information • Xudong Jiang, Bappaditya Mandal, and Alex ChiChung Kot: Eigenfeature regularization and extraction in face recognition. IEEE Trans. Pattern Anal. Mach. Intell., 30(3):383–394, 2008 • Transform matrix: M = A . B 10 Feature Extraction for Outlier Detection

  11. DROUT Approach (cont.) • Summary: • Let ∑w = wm . ∑m + wa . ∑a • Compute A from ∑w • Transform the training set using A • Compute the new total scatter matrix ∑t • Compute B by solving the eigenvalue problem on ∑t • M = A . B • Use M to transform the testing set 11 Feature Extraction for Outlier Detection

  12. Related Work • APCDA • Xudong Jiang: Asymmetric principal component and discriminant analyses for pattern classification. IEEE Trans. Pattern Anal. Mach. Intell., 31(5), 2009 • Uses weight-adjusted scatter matrices for feature extraction • Discards noise and null subspaces  loss of information • ERE • Xudong Jiang, Bappaditya Mandal, and Alex ChiChung Kot: Eigenfeature regularization and extraction in face recognition. IEEE Trans. Pattern Anal. Mach. Intell., 30(3):383–394, 2008 • Performs regularization before feature extraction • Ignores class imbalance  not suitable for outlier detection • ACP • David Lindgren and Per Spangeus: A novel feature extraction algorithm for asymmetric classification. IEEE Sensors Journal, 4(5):643–650, 2004 • Consider neither noise-null subspaces nor class imbalance 12 Feature Extraction for Outlier Detection

  13. Outlier Detection with DROUT • Detectors: • ORCA • Stephen D. Bay and Mark Schwabacher: Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In KDD, pages 29–38, 2003 • BSOUT • George Kollios, Dimitrios Gunopulos, Nick Koudas, and Stefan Berchtold: Efficient biased sampling for approximate clustering and outlier detection in large data sets. IEEE Trans. Knowl. Data Eng., 15(5):1170–1187, 2003 13 Feature Extraction for Outlier Detection

  14. Outlier Detection with DROUT (cont.) • Datasets: • KDD Cup 1999 • Normal class (60593 records) vs. U2R class (246) • d = 34 (7 categorical attributes are excluded) • Training set: 1000 normal recs. vs. 50 anomalous recs. • Ann-thyroid 1 • Class 3 vs. class 1 • d = 21 • Training set: 450 normal recs. vs. 50 anomalous recs. • Ann-thyroid 2 • Class 3 vs. class 2 • d = 21 • Training set: 450 normal recs. vs. 50 anomalous recs. • Parameter settings: • wm = 0.1 and wa = 0.9 • Number of extracted features b ≤ d/2 14 Feature Extraction for Outlier Detection

  15. Results 15 Feature Extraction for Outlier Detection

  16. Results (cont.) 16 Feature Extraction for Outlier Detection

  17. Conclusion • Summary of contributions • Explore the effect of feature extraction on outlier detection • Results on real datasets and two detection methods are promising • A novel framework for ensemble outlier detection. Experiments on real data sets seem to be promising • Future work • More experiments on larger datasets • Examine other possibilities of dimensionality reduction 17 Feature Extraction for Outlier Detection

  18. Last words… Thank you FOR YOUR ATTENTION!!!

More Related