230 likes | 461 Views
Fast Time Series Classification Using Numerosity Reduction. Xiaopeng Xi Eamonn Keogh Christian Shelton Li Wei Chotirat Ann Ratanamahatana xxi@cs.ucr.edu Department of Computer Science and Engineering University of California, Riverside June 28, 2006. Overview. Background Motivation
E N D
Fast Time Series Classification Using Numerosity Reduction Xiaopeng Xi Eamonn Keogh Christian Shelton Li Wei Chotirat Ann Ratanamahatana xxi@cs.ucr.edu Department of Computer Science and Engineering University of California, Riverside June 28, 2006 ICML 2006, CMU, Pittsburgh
Overview • Background • Motivation • Naïve rank reduction • Adaptive warping window in DTW • Experimental results • Conclusions
A B C 0 0 200 200 400 400 600 600 800 800 1000 1000 1200 1200 1400 1400 0 0 0 200 200 200 400 400 400 600 600 600 800 800 800 1000 1000 1000 1200 1200 1200 1400 1400 1400 0 200 400 600 800 1000 1200 1400 Time Series ECG Heartbeat Stock Flat-tailed Horned Lizard
Time Series Classification • Applications • Insect species, heart beat, etc • 1-Nearest Neighbor Classifier • Distance measures: Euclidean, LCSS, DTW…
Warping path w Dynamic Time Warping Euclidean warping window r Dynamic Time Warping Sakoe-Chiba Band
Flat-tailed Horned Lizard Alignment by Dynamic Time Warping Texas Horned Lizard
Observation I • 1-Nearest Neighbor with DTW distance is hard to beat in time series classification
Comparison of classification error rate (%) Control Chart 8 7 6 error rate (%) 5 4 3 2 1 0 1-NN DTW Multiple classifier First order logic rule Multi-scale histogram Perceptron neural network Super-kernal fusion scheme
Observation I • 1-NN DTW achieves high accuracy but slow • 1-NN needs O(N 2), N = dataset size • DTW is computationally expensive • Can we speed 1-NN DTW up?
C Q Band width d Observation II • As the data size decreases, larger warping window achieves higher accuracy - Numerosity Reduction • The accuracy peaks very early with small warping window - Accelerate DTW Computation
100 100 95 95 100 instances 100 instances 100 instances 90 90 50 instances 50 instances 50 instances 85 85 24 instances 24 instances 24 instances Accuracy(%) 80 80 12 instances 12 instances 12 instances 75 75 70 70 6 instances 6 instances 6 instances 65 65 60 60 0 0 10 10 20 20 30 30 40 40 50 50 60 60 70 70 80 80 90 90 100 100 Warping Window r(%) Gun-Point
DTW gives better accuracy than Euclidean distance, peaks very early with small warping window
C Q Band width d Speed Up 1-NN DTW • Numerosity reduction (data editing) • Heuristic order searching, prune worst examplar first – Naive Rank • Dynamically adjust warping window
Naïve Rank Reduction • Assign rank to each instance • Prune the one with lowest rank • Reassign rank for other instances • Repeat above steps until stop criterion
- + P + + + Naïve Rank Reduction break ties by Rank( P ) = 1 + 1 + 1 - 2 = 1
C Q Band width d Adaptive Warping Window • Basic ideas: • Adjust warping window dynamically during the numerosity reduction • Prune instances one at a time, increase the warping band by 1% if necessary 100 95 100 instances 100 instances 90 50 instances 50 instances 85 24 instances 24 instances 80 12 instances 12 instances 75 70 6 instances 6 instances 65 60 0 0 10 10 20 20 30 30 40 40 50 50 60 60 70 70 80 80 90 90 100 100
warping window searching (%) 2 2 3 4 5 data 97% 99% 99% 99% 99% 92% 98% 98% data pruning 97% 98% 97% 96% 95% 96% 94% 96%
Speeding-up DTW classification • Solution: • LB_Keogh lower bounding, amortized cost O(n) • Store DTW distance matrix and nearest neighbor matrix, update dynamically • Compute accuracy by looking up matrices • 4 or 5 orders of magnitude faster
Experimental Results Two Patterns, on training set DTW 100 Euclidean 90 80 70 accuracy(%) 1-NN Euclidean Random, Euclidean 60 Random, Fixed 1-NN DTW fix window NaiveRank , Fixed 1-NN DTW adaptive 50 NaiveRank , Adaptive 12% 6% 40 4% 13% 14% 11% 7% 10% 9% 8% 5% 4% 30 1000 0 0 100 200 300 400 500 600 700 800 900 data instances
Experiments Results Two Patterns, on test set Swedish Leaf, on test set 90 100 80 90 70 80 60 accuracy(%) 70 accuracy(%) 50 1-NN Euclidean 60 RT1 1-NN Euclidean RT2 RT1 40 RT3 50 RT2 1-NN DTW RT3 12% 6% 1-NN DTW 30 40 5% 4% 3% 13% 14% 11% 10% 6% 8% 4% 9% 5% 7% 30 20 0 100 200 300 400 500 600 700 800 900 1000 0 50 100 150 200 250 300 350 400 450 500 data instances data instances RT algorithms are introduced in Wilson, D.R. & Martinez, T.R. (1997). Instance Pruning Techniques. ICML’97
Conclusions • 1-NN DTW is very competitive in time series classification • We show novel observations of relationship between warping window size and dataset size • We produce an extremely fast accurate classifier
x108 brute force LB_Keogh fast DTW Two Patterns, 1,000 training / 4,000 test x105 brute force LB_Keogh fast DTW