1 / 18

Dataset Shift Detection in Non-Stationary Environments using EWMA Charts

Dataset Shift Detection in Non-Stationary Environments using EWMA Charts. Prof. Girijesh Prasad Co-authors: Haider Raza, Yuhua Li School of Computing & Intelligent Systems @ Magee , Faculty of Computing & Engineering, Derry~Londonderry . g.prasad@ulster.ac.uk. Outline. Motivation

Download Presentation

Dataset Shift Detection in Non-Stationary Environments using EWMA Charts

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dataset Shift Detection in Non-Stationary Environments using EWMA Charts Prof.Girijesh Prasad Co-authors: Haider Raza, Yuhua Li School of Computing & Intelligent Systems @ Magee, Faculty of Computing & Engineering, Derry~Londonderry. g.prasad@ulster.ac.uk

  2. Outline • Motivation • Background • Proposed contribution • Future work and Conclusion

  3. Motivation • Classical learning systems are built upon the assumption that the input data distribution for the trainingand testing are same. • Real-world environments are often non-stationary(e.g., EEG-based BCI) • So, learning in real-time environments is difficult due to the non-stationarity effects and the performance of system degrades with time. • So, predictors need to adapt online. However, online adaptation particularly for classifiers is difficult to perform and should be avoided as far as possible and this requires performing in real-time: • Non-stationary shift-detection test.

  4. Background • Supervised learning • Non-stationary environments • Dataset shift Dataset shift-detection(Shewhart 1939), (Page 1954), (Roberts 1959), (Alippi et al. 2011b), (Alippi & Roveri 2008a; Alippi & Roveri 2008b) Dataset shift (Torres et al. 2012), Non-stationary environments (M Krauledat 2008), (Sugiyama 2012). • Dataset shift-detection Supervised learning (Mitchell, 1997) (Sugiyama et al. 2009) Proposed Work Shift-Detection • Proposed Work

  5. Supervised Learning • Training samples: Input and output ( • Learn input-output rule: • Assumption: “Trainingand test samples are drawn from same probability distribution” i.e., Is this assumption really true? Reason :- Non-StationaryEnvironments ! No….!!! Not always true 

  6. Non-Stationarity For examples: • Learning from past only is of limited use  • Brain-computer interface • Robot control • Remote sensing application • Network intrusion detection What is the challenge?

  7. Dataset Shift Dataset Shift appears when training and testjoint distributions are different. That is, when (Torres, 2012) *Note : Relationship between covariates (x) and class label (y) XY: Predictive model (e.g., spam filtering) YX: Generative model (e.g., Fault detection ) Types of Dataset Shift • Covariate Shift • Prior Probability Shift • Concept Shift Prior probability shift appears only in YX problems Concept shifts appears • Covariate shift appears only in XYproblems

  8. Dataset Shift-Detection Detecting abrupt and gradual shifts in time-series data is called the data shift-detection. Types of Shift-Detection • Retrospective/offline-detection: (i.e., Shift-point analysis) • Real-time/online-detection: (i.e., Control charts) Types of Control Charts • Shewart Chart (Shewart, 1939) • Cumulative Sum(CUSUM) (E S Page, 1954) • Exponentially Weighted Moving Average (EWMA) (S W Roberts, 1959) • Computational Intelligence CUSUM (CI-CUSUM) (Alippi et al., 2008) • Intersection of Confidence Interval (ICI) (Alippi et al., 2011 )

  9. Proposed Contribution • We have proposed dataset shift-detection test. • Shift-Detection based on Exponentially Weight Moving Average (SD-EWMA) model

  10. Shift-Detection based on Exponentially Weight Moving Average (SD-EWMA) (1) where λ is the smoothing constant (0<λ≤1). It is a first-order integrated moving average (ARIMA) model. (2) Where is a sequence of i.i.d random signal with zero mean and constant variance. Equation (1) with , is the optimal 1-step-ahead prediction for this process The 1-step-aheaderror are calculated as (3) IF the 1-step-ahead erroris normally distributed, then UCL LCL

  11. Proposed Algorithm: SD-EWMA

  12. Datasets Synthetic Data Dataset 1-Jumping Mean (D1): where is a noise with mean and standard deviation 1.5. The initial values are set as. A change point is inserted at every 100 time steps by setting the noise mean at time as where is a natural number such that. Dataset 2-Scaling Variance (D2): The change point is inserted at every 100 time steps by setting the noise standard deviation at time as where is a natural number such that Dataset 3-Positive-Auto-correlated (D3): The dataset is consisting of 2000 data-points, the non stationarity occurs in the middle of the data stream, shifting from to, where denotes the normal distribution with mean and standard deviation respectively.

  13. Dataset 4-Auto-correlated (D4): The dataset is a time-series consisting of 2000 data-points using 1-D digital filter from matlab. The filter function creates a direct form II transposed implementation of a standard difference equation. In the filter, the denominator coefficient is changed from 2 to 0.5 after producing 1000 number of points. Real-world Dataset: EEG Based Brain Signals The real-world data used here are from BCI competition-III dataset (IV-b). This dataset, contains 2 classes, 118 EEG channels (0.05-200Hz), 1000Hz sampling rate which is down-sampled to 100Hz, 210 training trials, and 420 test trials. Figure : pdf plot of 3 different sessions’ data taken from the training dataset. It is clear from the plot that, in each session the distribution is changed by shifting the mean from session-to-session transfer.

  14. Figure: Shift detection based on SD-EWMA: Dataset 1 (jumping mean): (a) the shift point is detected at every 100th point. (b) Zoomed view of figure a: shift is detected at 401st sample by crossing the upper control limit. (a) (b) Figure : Shift detection based on SD-EWMA: (a) Dataset 2 (scaling variance): the shift is detected at 3 points. (b) Dataset 3 (positive auto-correlated): detects the shift after producing 1000 observations. (c) Dataset 4 (Auto-correlated): detects the shift after producing 1000 observations.

  15. Table :SD-EWMA shift detection in time-series data Table : Simulation results on different tests

  16. Figure 4: A window of 2000 samples obtained from real-world dataset. Table 4: SD-EWMA shift detection in BCI data

  17. Conclusion and Future Work • The drawback of classical supervised learning techniques in non-stationary environments and the motivation behind the dataset shift-detection were discussed. • The background of non-stationary environments and dataset shift-detection were presented. • A proposed SD-EWMA method is presented and the results are discussed. • In future, the SD-EWMAwill be combined into an adaptive learning framework for non-stationary learning.

  18. Questions Thank You !

More Related