180 likes | 388 Views
Online Discovery and Maintenance of Time Series Motifs. Abdullah Mueen Eamonn Keogh University of California, Riverside. Time Series Motifs. Repeated pattern in a time series Utility: Activity/Event discovery Summarization Classification Application
E N D
Online Discovery and Maintenance of Time Series Motifs Abdullah Mueen Eamonn Keogh University of California, Riverside
Time Series Motifs • Repeated pattern in a time series • Utility: • Activity/Event discovery • Summarization • Classification • Application • Data Center Chiller Management (Patnaik, et al. KDD 09) • Designing Smart Cane for Elders (Vahdatpour, et al. IJCAI 09) 30 20 10 0 2000 4000 6000 8000 0
Online Time Series Motifs • Streaming time series • Sliding window of the recent history • What minute long trace repeated in the last hour?
Problem Formulation Discovery 1 2 3 4 5 6 7 8 . . . 873 100 200 300 400 500 600 700 800 900 1000 The most similar pair of non-overlapping subsequences -7000 -7500 -8000 . . . 1 2 3 4 5 6 7 8 . . . 873 874 Maintenance time:1005 time:1003 time:1004 time:1002 time:1001 1 2 3 4 5 6 7 8 . . . 873 874 875 1 2 3 4 5 6 7 8 . . . 873 874 875 876 1 2 3 4 5 6 7 8 . . . 873 874 875 876 877 1 2 3 4 5 6 7 8 . . . 873 874 875 876 877 878 Update motif pair after every time tick time
Challenge • A subsequence is a high dimensional point • The dynamic closest pair of points problem • Closest pair may change upon every update • Naïve approach: Do quadratic comparisons. 3 3 3 4 4 4 1 1 7 7 7 Deletion Insertion 6 6 6 5 5 5 8 8 8 2 2 2 9
Our Approach • Goal: Algorithm with Linear update time • Previous method for dynamic closest pair (Eppstein,00) • A matrix of all-pair distances is maintained • O(w2) space required • Quad-tree is used to update the matrix • Maintain a set of neighbors and reverse neighbors for all points • We do it in O( ) space
Outline • Methods • Evaluation • Extensions • Case Studies • Conclusion
Maintaining Motif • Smallest nearest neighbor → Closest pair • Upon insertion • Find the nearest neighbor; Needs O(w) comparisons. • Upon deletion • Find the next NN of all the reverse NN 3 3 3 4 4 4 1 1 1 7 7 7 6 6 6 5 5 5 8 8 8 2 2 2 9 9 Insertion Deletion
Data Structure 7 8 6 4 5 1 3 2 Total number of nodes in R-lists is w R-lists 4 Points that should be updated 7 4 8 7 1 2 8 1 2 3 4 7 5 1 7 1 3 3 6 FIFO Sliding window (ptr, distance) 1 2 3 4 5 6 7 8 1 7 5 7 6 5 7 2 4 5 6 5 2 1 4 3 8 7 5 7 N-lists Still O(w2) space Neighbors in order of distances 3 4 8 2 4 5 2 1 8 2 8 6 1 8 3 1 8 3 6 3 5 6 6 4 6 4
Observations • While inserting • Updating NN of old points is not necessary • A point can be removed from the neighbor list if it violates the temporal order Average length O( )
Example 4 6 8 7 7 5 9 4 3 4 5 6 7 8 9 10 3 8 3 3 3 10 4 4 4 3 4 3 3 6 6 5 2 5 6 4 5 4 7 8 7 1 2 3 4 5 6 7 8 7 7 7 1 1 6 6 6 5 8 1 1 1 2 3 1 2 5 5 5 6 9 2 3 4 5 3 6 8 8 8 2 2 2 8 6 4 7 5 4 5 10 3 7 9 6 9 9 2 3 4 5 6 7 8 9 1 2 1 2 3 2 3 3 2 6 4 5 4 6 8 5 7 6
Evaluation 0.7 • Up to 8x speedup from general dynamic closest pair • Stable space cost per point with increasing window size RW 0.18 350 0.6 RW 8x 0.16 110 300 0.14 0.5 100 Fast Pair 250 EOG Insect 0.12 90 Insect Fast Pair 0.4 Average Update Time (Sec) RW Average Update Time (Sec) 0.1 200 80 EOG 0.3 EOG 0.08 70 RW 150 Average Length of N-list EEG EOG EEG 60 0.2 0.06 Average Length of N-list 100 Insect Insect 50 0.04 0.1 40 EEG 50 0.02 0 30 EEG 0 0 0 200 400 600 800 1000 20 0 0.5 1 1.5 2 2.5 3 3.5 4 Motif Length (m) 0 200 400 600 800 1000 10 x 104 Window Size (w) Motif Length (m) 0 0.5 1 1.5 2 2.5 3 3.5 4 x 104 Window Size (w)
Outline • Methods • Evaluation • Extensions • Case Studies • Conclusion
Extension • Multidimensional Motif • Maintain motif in Multiple Synchronous time series • Points in one series can have neighbors in others • Arbitrary Data Rate • Load shedding: Skip points if can’t handle 1.0 0.9 0.8 Fraction of Motifs Discovered 0.7 0.6 EOG Random Walk EEG 0.5 Insect 0.4 FIFO for each streams 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Fraction of Data Used
Raw PLA A MR B A B B B B B A 0 500 1000 1500 2000 Case Study 1: Online Compression • Replace all the occurrences of a motif by a pointer to that motif. • For signals with regular repetitions • Higher compression rate with less error. Dictionary
Case Study 2: Closing The Loop • Check if a robot closed a loop • Convert the stream of video frames to stream of color histograms. • Most similar color histograms are good candidates for loop detection. 16 12 8 4 16 Start Loop Closure Detected 0 12 0 400 800 1200 1600 8 4 0 0 400 800 1200 1600
Conclusion • First attempt to maintain time series motif online • Maintains minutes long repetition in hours long sliding window • Linear update time with less space cost than quad-tree based method • O( ) Vs O(w2) • Faster than general dynamic closest pair solution
Thank You Code and Data: http://www.cs.ucr.edu/~mueen/OnlineMotif