Feature extraction for change detection

Feature extraction for change detection Ludmila I Kuncheva School of Computer Science Bangor University Can you detect an abrupt change in this picture? Answer – at the end

Plan Zeno says there is no such thing as change... If change exists, is it a good thing? Context or nothing! Feature extraction for change detection – PCA backwards?

Zeno’s Paradox of the Arrow “If everything, when it occupies an equal space, is at rest, and if that which is in locomotion is always occupying such a space at any moment, the flying arrow is therefore motionless.” – as recounted by Aristotle, Physics VI:9, 239b5 Zeno of Elea (ca. 490–430 BC) No motion, no movement, NO CHANGE

Does change exist? Zeno says ‘no’...

Nonetheless... Change Types • Possible applications: • fraud detection • market analysis • medical condition monitoring • network traffic control • Univariatedetectors (Control charts): • Shewhart's method • CUSUM (CUmulative SUM) • SPRT (Wald's Sequential Probability Ratio Test)

2 approaches Use an adaptive algorithm (No need to identify the type of change or detect change explicitly) Detect change (Update/re-train the algorithm if necessary) Labelled data Unlabelled data

Classification Labels are available threshold Error rate Change/ NO change Data (all features) Distribution modelling Classifier Change statistic

Labels are available threshold Error rate Change/ NO change Data (all features) Distribution modelling Classifier Change statistic threshold Labels are NOT available Change statistic Feature EXTRACTOR Change/ NO change Features Data (all features) Distribution modelling multidimensional

Labels are available threshold Error rate Change/ NO change GMM HMM Parzen windows kernel methods martingales Data (all features) Classifier threshold Labels are NOT available Feature EXTRACTOR Change/ NO change clustering kernel methods GMM kd-trees Hotelling Features Data (all features)

Classification • A change in the (unconditional) data distribution will: • render the classifier useless • make no difference to the classification performance • improve the classification performance

Vote, please! • A change in the (unconditional) data distribution will: • render the classifier useless • make no difference to the classification performance • improve the classification performance

Classification • No change in the (unconditional) data distribution will: • render the classifier useless • make no difference to the classification performance • improve the classification performance

Vote, please! • No change in the (unconditional) data distribution will: • render the classifier useless • make no difference to the classification performance • improve the classification performance

Literature Classifier ensembles Brain-computer interface My scope of interest MathWorks products

Change may or may not cause trouble...

Is there a change ?

Is there a change?

Shewhart with threshold 2 sigma Yes! mean (moving average) changes mean – 2std

Is there a change? No!

Is there a change?

Is there a change? Yes, for the purposes of “Spot the difference”. No, as this is a bee with a flower in the sun.

Is there a change? No!

Is there a change? sin(10x) * randn sin(20x) * randn No! Yes!

change detection

Change does not exist out of context!

ENTER Feature Extraction

Context: Amplitude variability Feature: AMPLITUDE

Context: Time series patterns in a fixed window. Feature: A PATTERN IN A FIXED WINDOW

Context: Children’s puzzle Feature: PIXEL B/W VALUE Context: Frequency variability sin(10x) * randn sin(20x) * randn Feature: FREQUENCY

Suppose that CONTEXT is not available. Principal Component Analysis (PCA) captures data variability. Then why not use PCA here? threshold Labels are NOT available Change statistic Feature EXTRACTOR Change/ NO change Features Data (all features) Distribution modelling

PCA intuition: The components corresponding to the largest eigen values are more important explains most of the variance of the data, hence it is more important

But is this the case for change detection? • Holds for “blind”: • Translation • Rotation • Variance change • ... Distributions are different (large sensitivity to change) PC2 PC1 Distributions are similar (small sensitivity to change) Kuncheva L.I. and W.J. Faithfull, PCA feature extraction for change detection in multidimensional unlabelled data, IEEE Transactions on Neural Networks and Learning Systems, 25(1), 2014, 69-80

Some experiments: • Take a data set with n features • Sample randomly “windows” W1 and W2 with K objects in each window. • Calculate PCA from W1. Choose a proportion of explained variance and use the remaining (low-variance) components. • Generate a random integer k between 1 and n • 4(a) Shuffle VALUES • Choose randomly k features. For each chosen feature, shuffle randomly the values for this feature in window W2. • 4(b) Shuffle FEATURES • Choose randomly k features. Randomly permute the respective columns in window W2. • Transform W2 using the calculated PC and keep the low-variance components. • Calculate the CHANGE DETECTION CRITERION between W1 and W2. Store as NEGATIVE INSTANCE (no change).

Some experiments: • Take a data set with n features • Sample randomly “windows” W1 and W2 with K objects in each window. • Calculate PCA from W1. Choose a proportion of explained variance and use the remaining (low-variance) components. • Generate a random integer k between 1 and n • 4(a) Shuffle VALUES • Choose randomly k features. For each chosen feature, shuffle randomly the values for this feature in window W2. • 4(b) Shuffle FEATURES • Choose randomly k features. Randomly permute the respective columns in window W2. • Transform W2 using the calculated PC and keep the low-variance components. • Calculate the CHANGE DETECTION CRITERION between W1 and W2. Store as POSITIVE INSTANCE (change).

Run 100 times for POS and 100 for NEG to get the ROC curve for a given data set. Run 100 times for POS and 100 for NEG without applying PCAto get the ROC curve for a given data set. Use the Area Under the Curve (AUC), however disputed this might have become recently... Larger AUC corresponds to better change detection

VALUE shuffle

FEATURE shuffle

PCA - use the least relevant components!?

Conclusion Change detection may be harmful, beneficial or indifferent to classification performance Change does not exist out of context, therefore GENERIC algorithms for change detection are somewhat pointless... Feature extraction for change detection may not follow conventional intuition.

Remember my little puzzle? 1-3 Can you detect an abrupt change in this picture? 4-6

Feature extraction for change detection

Feature extraction for change detection

Presentation Transcript

Feature Extraction

Feature Detection

Feature extraction: Corners

Feature Detection

Feature Extraction for ASR

Feature detection

Feature extraction

Feature Extraction for Outlier Detection in High-Dimensional Spaces

Feature Extraction (I)

Feature Extraction

Feature Extraction for speech applications

Feature Detection

Feature extraction

Feature Selection, Feature Extraction

Feature Extraction

Feature Extraction for speech applications

Feature Detection

Feature Extraction for ASR

Feature Detection

Feature Extraction

Feature Extraction