230 likes | 241 Views
A framework to efficiently detect concept changes in a data streaming setting where concepts change frequently. Uses martingale tests and theoretical justifications for accuracy. Works well with high-dimensional, multiclass data streams.
E N D
A Martingale Framework for Concept Change Detection in Time-Varying Data Stream Ho Shen-Shyang sho@gmu.edu Department of Computer Science George Mason University
Preview: • Problem: In a data streaming setting, data points are observed one by one. The concepts to be learned from the data stream may change infinitely often. • How do we detect the changes efficiently? • Other Topics: Concept Drift, Anamoly detection, ... ... • Testing Exchangeability Online (Vovk et.al., ICML 2003)
Outline: • Background: Strangeness, Martingale, Exchangeability, • Martingale Framework - Two Tests • Theoretical Justifications • Additional Theoretical Results • Experimental Results
Strangeness Measure (Saunders et. al., IJCAI 1999) : scoring how a data point is different from the rest. α • Support Vector Machine: Value of Lagrange Multipler or Distance from the hyperplane (we use SVM/Lagrange Multiplier – incremental SVM (Cauwenberghs and Poggio, NIPS 2000)) • K-nearest-neighbor rule: A/B where A – Sum of the distance of a point from the k nearest points with the same label B – Sum of the distance of a point from the k nearest points with different label
Testing Exchangeability: Definitions Let { Zi : 1 ≤ i < ∞ } be a sequence of r.v. A finite sequence of r.v.Z1,...,Zn is exchangeable if the joint distribution p(Z1,...,Zn) is invariant under any permutation of the indices of the r.v. A martingale is a sequence of r.v. { Mi : 0 ≤ i < ∞ } such that Mn is a measurable function of Z1,...,Zn for all n = 0, 1, ... (M0 is a constant, say 1) and the conditional expectation of Mn+1 given M1,...,Mn is equal to Mn, i.e. E(Mn+1 | M1,...,Mn ) = Mn
Testing Exchangeability (Vovk et. al., ICML 2003) pn= V(Z U {zn}, θn) = where ε in [0,1] (say 0.92) and M0= 1
Performing Kolmogorov-Smirnov Test on the p-value distribution as data is observed one by one. Skewed p-value distribution: small p-values inflate the martingale values
Martingale Framework: Test for Change Detection Consider the simple null hypothesis H0: “no concept change in the data stream” against the alternative hypothesis H1: “concept change occurs in the data stream”
Martingale Framework: Test for Change Detection Martingale Test 1 (MT1) 0 < Mn(ε)< λ where λ is a positive number. One rejects the null hypothesis when Mn(ε) ≥ λ. Martingale Test 2 (MT2) 0 < | Mn(ε) - Mn-1(ε) |< t where t is a positive number. One rejects the null hypothesis when | Mn(ε) - Mn-1(ε) | ≥ t.
Justification for Martingale Test 1: Doob's Maximal Inequality Assuming that { Mi : 0 ≤ i < ∞ } is a nonnegative martingale, the Doob's Maximal Inequality states that for any λ > 0 and 0 ≤ n < ∞, Hence, if E(Mn) = E(M0) = 1, then
Justification for Martingale Test 2 Hoeffding-Azuma Inequality Let c1, ..., cm be positive constants and let Y1, ..., Ym be a martingale difference sequence with |Yk| ≤ ck for each k. Then for any t ≥ 0, At each n, the martingale difference is maximum and bounded when pn is 1/n for the deterministic martingale (θn=1 for all n)
Justification for Martingale Test 2: When m = 1, the Hoeffding-Azuma Inequality becomes Assuming that Mn-1(ε) = M0(ε) = 1,
Some Theoretical Results for Martingale Test 1 (Ho & Wechsler, UAI 2005) • Martingale Test based on the Doob's Inequality is an approximaton of the sequential probability ratio test. Where α is the desirable size (type I error) and β is the probability of the type II error • The mean delay time from the true change point is: where
Experiments Number of Correct Detections Number of Detections Precision = Recall = Number of Correct Detections Number of True Changes Precision: Probability that a detection is actually correct Recall: Probability that the system recognizes a true change Delay time (for a detected change): the number of time units from a true change point to the detected change point, if any
Experimental Results: Synthetic Data Stream with noise (10-D Rotating Hyperplane) – Precision and Recall
Experimental Results: Synthetic Data Stream – Mean and Median Delay Time
Experimental Results: Numerical (WaveNorm & TwoNorm) and Categorical data streams (Nursery)
Experimental Results: Multi-class data streams (Modified USPS data-set) Dataset: 10 classes, 256 dimensions, 7291 data points Data stream: 3 classes.
Experimental Results: Multi-class data streams (Modified USPS data-set)
Conclusions: • Our martingale approach is an efficient, one-pass incremental algorithm that • Does not require a sliding window on the data stream • Does not require monitoring the performance of a base classifier as data is streaming • Works well for high dimensional, multiclass data stream • Theoretically justified.
Conclusions/Future (Current) Work: • Previous works: Kifer et. al. (VLDB 2004), Fan et. al.(SDM 2004), Wald (1947), Page (1957) ...... • Extension to Unlabeled and One-class data streams • Application: Keyframe Extraction, Anomaly Detection, Adaptive Classifier (Ho and Wechsler, IJCAI 2005) • Comparison using different classifiers (i.e. Different strangeness measure, also weak classifiers) • Comparison with other change detection algorithms. • http://cs.gmu.edu/~sho/research/change_detection.html Acknowledgement: Vladimir Vovk, Harry Wechsler.