450 likes | 677 Views
Feature extraction for change detection. Ludmila I Kuncheva School of Computer Science Bangor University. Can you detect an abrupt change in this picture?. Answer – at the end. Plan Zeno says there is no such thing as change... If change exists, is it a good thing? Context or nothing!
E N D
Feature extraction for change detection Ludmila I Kuncheva School of Computer Science Bangor University Can you detect an abrupt change in this picture? Answer – at the end
Plan Zeno says there is no such thing as change... If change exists, is it a good thing? Context or nothing! Feature extraction for change detection – PCA backwards?
Zeno’s Paradox of the Arrow “If everything, when it occupies an equal space, is at rest, and if that which is in locomotion is always occupying such a space at any moment, the flying arrow is therefore motionless.” – as recounted by Aristotle, Physics VI:9, 239b5 Zeno of Elea (ca. 490–430 BC) No motion, no movement, NO CHANGE
Does change exist? Zeno says ‘no’...
Nonetheless... Change Types • Possible applications: • fraud detection • market analysis • medical condition monitoring • network traffic control • Univariatedetectors (Control charts): • Shewhart's method • CUSUM (CUmulative SUM) • SPRT (Wald's Sequential Probability Ratio Test)
2 approaches Use an adaptive algorithm (No need to identify the type of change or detect change explicitly) Detect change (Update/re-train the algorithm if necessary) Labelled data Unlabelled data
Classification Labels are available threshold Error rate Change/ NO change Data (all features) Distribution modelling Classifier Change statistic
Labels are available threshold Error rate Change/ NO change Data (all features) Distribution modelling Classifier Change statistic threshold Labels are NOT available Change statistic Feature EXTRACTOR Change/ NO change Features Data (all features) Distribution modelling multidimensional
Labels are available threshold Error rate Change/ NO change GMM HMM Parzen windows kernel methods martingales Data (all features) Classifier threshold Labels are NOT available Feature EXTRACTOR Change/ NO change clustering kernel methods GMM kd-trees Hotelling Features Data (all features)
Classification • A change in the (unconditional) data distribution will: • render the classifier useless • make no difference to the classification performance • improve the classification performance
Vote, please! • A change in the (unconditional) data distribution will: • render the classifier useless • make no difference to the classification performance • improve the classification performance
Vote, please! • A change in the (unconditional) data distribution will: • render the classifier useless • make no difference to the classification performance • improve the classification performance
Classification • No change in the (unconditional) data distribution will: • render the classifier useless • make no difference to the classification performance • improve the classification performance
Vote, please! • No change in the (unconditional) data distribution will: • render the classifier useless • make no difference to the classification performance • improve the classification performance
Literature Classifier ensembles Brain-computer interface My scope of interest MathWorks products
Shewhart with threshold 2 sigma Yes! mean (moving average) changes mean – 2std
Is there a change? Yes, for the purposes of “Spot the difference”. No, as this is a bee with a flower in the sun.
Is there a change? sin(10x) * randn sin(20x) * randn No! Yes!
ENTER Feature Extraction
Context: Amplitude variability Feature: AMPLITUDE
Context: Time series patterns in a fixed window. Feature: A PATTERN IN A FIXED WINDOW
Context: Children’s puzzle Feature: PIXEL B/W VALUE Context: Frequency variability sin(10x) * randn sin(20x) * randn Feature: FREQUENCY
Suppose that CONTEXT is not available. Principal Component Analysis (PCA) captures data variability. Then why not use PCA here? threshold Labels are NOT available Change statistic Feature EXTRACTOR Change/ NO change Features Data (all features) Distribution modelling
PCA intuition: The components corresponding to the largest eigen values are more important explains most of the variance of the data, hence it is more important
But is this the case for change detection? • Holds for “blind”: • Translation • Rotation • Variance change • ... Distributions are different (large sensitivity to change) PC2 PC1 Distributions are similar (small sensitivity to change) Kuncheva L.I. and W.J. Faithfull, PCA feature extraction for change detection in multidimensional unlabelled data, IEEE Transactions on Neural Networks and Learning Systems, 25(1), 2014, 69-80
Some experiments: • Take a data set with n features • Sample randomly “windows” W1 and W2 with K objects in each window. • Calculate PCA from W1. Choose a proportion of explained variance and use the remaining (low-variance) components. • Generate a random integer k between 1 and n • 4(a) Shuffle VALUES • Choose randomly k features. For each chosen feature, shuffle randomly the values for this feature in window W2. • 4(b) Shuffle FEATURES • Choose randomly k features. Randomly permute the respective columns in window W2. • Transform W2 using the calculated PC and keep the low-variance components. • Calculate the CHANGE DETECTION CRITERION between W1 and W2. Store as NEGATIVE INSTANCE (no change).
Some experiments: • Take a data set with n features • Sample randomly “windows” W1 and W2 with K objects in each window. • Calculate PCA from W1. Choose a proportion of explained variance and use the remaining (low-variance) components. • Generate a random integer k between 1 and n • 4(a) Shuffle VALUES • Choose randomly k features. For each chosen feature, shuffle randomly the values for this feature in window W2. • 4(b) Shuffle FEATURES • Choose randomly k features. Randomly permute the respective columns in window W2. • Transform W2 using the calculated PC and keep the low-variance components. • Calculate the CHANGE DETECTION CRITERION between W1 and W2. Store as POSITIVE INSTANCE (change).
Run 100 times for POS and 100 for NEG to get the ROC curve for a given data set. Run 100 times for POS and 100 for NEG without applying PCAto get the ROC curve for a given data set. Use the Area Under the Curve (AUC), however disputed this might have become recently... Larger AUC corresponds to better change detection
Conclusion Change detection may be harmful, beneficial or indifferent to classification performance Change does not exist out of context, therefore GENERIC algorithms for change detection are somewhat pointless... Feature extraction for change detection may not follow conventional intuition.
Remember my little puzzle? 1-3 Can you detect an abrupt change in this picture? 4-6