350 likes | 503 Views
Change Detection in Data Streams by Testing Exchangeability. Shen-Shyang Ho JPL/Caltech. The research is part of the author’s PhD dissertation (in computer science) at George Mason University Conference travel is partially sponsored by NASA Postdoctoral Program (NPP) Travel Grant. Outline.
E N D
Change Detection in Data Streams by Testing Exchangeability Shen-Shyang Ho JPL/Caltech The research is part of the author’s PhD dissertation (in computer science) at George Mason University Conference travel is partially sponsored by NASA Postdoctoral Program (NPP) Travel Grant.
Outline • Introduction • Previous Work (Statistics and Machine Learning/Data Mining/Computer Vision) • Intuition • Background (Exchangeability/Martingale) • Methodology • Comparison and Experimental Results • Application I: Adaptive Support Vector Machine (Classification Model) • Application II: Video Shot Change Detection (Cluster Model)
Introduction Letbe a sequence of independent p-dimensional random vectors with parameters Test the following hypothesis: Assumption: Data vectors are observed sequentially.
Previous Work • Statistics :- Sequential Analysis is statistical inference with the assumption that the number of observations/samples required is not pre-determined. • Sequential Probability Ratio Test – A. Wald (1945) • Application: Quality Control (Military/Manufacturing) • CUSUM (Cumulative Sum) – E. S. Page (1954) • Refer to “Sequential Analysis: Design Methods and Applications” Journal for recent research. • Most recent issue (vol 27, no 2, 2008) – papers on structural change/minimax method for change-point detection problems/multidecision quickest change-point detection – 3 out of 6 papers. • Machine Learning/Data Mining: • Applications: Concept Drift Problem, Adaptive classifier, Anomaly in Internet Traffic, Video-shot change detection • Proposed methodology is usually problem-specific • Monitoring error, sliding window, weighted data, ensemble classifier … • Statistical method: Likelihood ratio method, Bayesian methods, Hypothesis Testing …
Related Data Mining/Machine Learning/Computer Vision Research • Xiuyao Song, Mingxi Wu, Christopher M. Jermaine, Sanjay Ranka: Statistical change detection for multi-dimensional data. KDD 2007: 667-676 • Kolter, J.Z. and Maloof, M.A. Dynamic Weighted Majority: An ensemble method for drifting concepts. Journal of Machine Learning Research 8:2755--2790, 2007. • Klinkenberg, Ralf and Joachims, Thorsten: Detecting Concept Drift with Support Vector Machines. Proceedings of the Seventeenth International Conference on Machine Learning (ICML): 487--494, 2000. • Bi Song, Namrata Vaswani, Amit K. Roy Chowdhury: Closed-Loop Tracking and Change Detection in Multi-Activity Sequences. CVPR 2007 • Paul L. Rosin: Thresholding for Change Detection. ICCV 1998: 274-279 • Balachander Krishnamurthy, Subhabrata Sen, Yin Zhang, Yan Chen: Sketch-based change detection: methods, evaluation, and applications. Internet Measurement Conference 2003: 234-247 • Tsuyoshi Idé, Keisuke Inoue: Knowledge Discovery from Heterogeneous Dynamic Systems using Change-Point Correlations. SDM 2005 • Tsuyoshi Idé, Koji Tsuda: Change-Point Detection using Krylov Subspace Learning. SDM 2007 • Daniel Kifer, Shai Ben-David, Johannes Gehrke, Detecting Changes in Data Streams, Proc. 30th VLDB Conference, 2004. • ... …
Motivation “Lack of Exchangeability” implies “Change in Data Distribution/Model” 9/20/2014 7
1 2 3 4 5 6 7 8 9 10 • 1 9 3 5 2 6 7 4 8 10 • 2 3 4 5 6 7 8 9 10 • 1 9 3 5 2 6 7 2 8 10 Identically Distributed but may be Dependent Intuition
Background • Vovk et al’s work on “Testing Exchangeability Online” (ICML 2003) and “Algorithmic Learning in a random world” (Springer) : - • Testing exchangeability assumption in an online mode. • Explicit Martingale for testing the hypothesis of exchangeability (Refer to http://www.vovk.net (conformal prediction) ) 9/20/2014 9
Background Let be a sequence of random variables. A finite sequence of random variable is exchangeable if , the joint distribution is invariant under any permutation of the indices of the random variables. A martingale is a sequence of random variables such that is a measurable function of for all (in particular, is a constant value) and the conditional expectation of given is equal to , i.e., 9/20/2014 10
Methodology - Strangeness • Strangeness measures how well one data point (for each data point seen so far) is represented by a data model compared to other points • Applicable to classification, regression or cluster model • measure diversity / disagreements, i.e. the higher the strangeness of a point, the less likely it comes from the model Condition for a valid strangeness measure: A strangeness value of a data point at a particular time instance should be independent of the order it is observed with respect to the other data points.
Classification Model Strangeness (K-NN): t = 1 to 1000 1001 to 2000 2001 to 3000 A B C t aaaaa…aaaaabbbbbb…….bbbbbccccc…cccccc Strangeness (SVM): Lagrange Multiplier
Classification Model Strangeness (SVM): Lagrange Multiplier 9/20/2014
Cluster Model Strangeness of a data vector in a cluster
Regression Model where is the regression function and is the error estimation function for at (Papadopoulos et al., Inductive Confidence Machines for Regression, ECML, LNAI 2430, pp 345-356, 2002)
Methodology p-value of a new point given previous seen data points: • where is the strangeness measure for • and is randomly chosen from [0,1] for each new point • : necessary so the sequence of p-values are uniformly distributed in [0,1] for any strangeness measure (Vovk, 2003)
Methodology Consider the null hypothesis against the alternative hypothesis The test for change continues as long as One rejects the null hypothesis when
Methodology 9/20/2014 21
Experimental Result – Performance Measure 9/20/2014 22
Experimental Result – Varying 9/20/2014 23
Experimental Result –Varying Linearly Non-separable Classification Model Linearly Separable Classification Model
Experimental Result Ringnorm/Twonorm (Change in dataset every 1000 points) Nursery Categorical Dataset (Change in class compositions every 1000 points) 9/20/2014 26
Experimental Result 9/20/2014 27
Application: Adaptive SVM Simulated USPS 3-Digit Image Data Stream t 01120120…0340033404…156556115…77789987… 9/20/2014 30
Application: Adaptive SVM A (blue): True Change Point Known to the SVM B(red): Adaptive SVM using martingale method C(magenta): SVM using sliding window of size 250 D(black): SVM using sliding window of size 500 E(green): SVM using sliding window of size 1000
Application: Video-Shot Change Detection Martingale Change Detection using multiple features (MVMT: Multiple-view martingale test)
Application: Video-Shot Change Detection • HI: Histogram Intersection • Chi-Square Measure • Euclidean Distance (ED) 9/20/2014 33
Reference • S.-S. Ho and H. Wechsler, Detecting Change-Points in Unlabeled Data Streams using Martingale, Proc. 20th Int. Joint. Conf. Artificial Intelligence (IJCAI 2007), Hyderabad, India, Jan. 6 - 12, 2007. • S-S Ho, A Martingale Framework for Concept Change Detection in Time-Varying Data Streams, Proc Int. Conf. on Machine Learning (ICML 2005), Bonn, Germany, Aug. 7 - 11, 2005 • S-S Ho and H. Wechsler, Adaptive Support Vector Machine for Time-Varying Data streams Using the Martingale, Proc. Int. Joint Conf. on Artificial Intelligence (IJCAI 2005), Edinburgh, Scotland, July 30 - Aug. 5, 2005 • S-S Ho and H. Wechsler, On the detection of concept change in time-varying data streams by testing exchangeability, Proc. Conference on Uncertainty in Artificial Intelligence (UAI 2005), Edinburgh, Scotland, July 26 - 29, 2005 • http://shenshyang.googlepages.com/codes (matlab codes + datasets) 9/20/2014 34
Acknowledgement • Harry Wechsler, PhD Advisor (George Mason University) • Volodya Vovk, (Royal Holloway, University of London) • Alexander Gammerman (Royal Holloway, University of London) • Oak Ridge Associated University (ORAU) 9/20/2014 35