170 likes | 276 Views
Time Series Filtering. Matches Q11. Time Series. 1. 5. 9. 2. 6. 10. Given a Time Series T , a set of Candidates C and a distance threshold r , find all subsequences in T that are within r distance to any of the candidates in C. 11. 3. 7. 12. 4. 8. Candidates.
E N D
Time Series Filtering Matches Q11 Time Series 1 5 9 2 6 10 Given a Time Series T, a set of Candidates Cand a distance threshold r, find all subsequences in T that are within r distance to any of the candidates in C. 11 3 7 12 4 8 Candidates
Filtering vs. Querying Query (template) Database Database Matches Q11 Best match 1 5 9 6 1 7 2 6 10 2 8 3 11 3 7 9 4 12 4 8 10 5 Database Queries
C 0 10 20 30 40 50 60 70 80 90 100 Q Euclidean Distance Metric Given two time series Q = q1…qn and C = c1…cn , their Euclidean distance is defined as:
C Q calculation abandoned at this point 0 10 20 30 40 50 60 70 80 90 100 Early Abandon During the computation, if current sum of the squared differences between each pair of corresponding data points exceeds r 2, we can safely stop the calculation.
Classic Approach Time Series 1 5 9 2 6 10 Individually compare each candidate sequence to the query using the early abandoning algorithm. 11 3 7 12 4 8 Candidates
U L Wedge Having candidate sequences C1, .. , Ck , we can form two new sequences U and L : Ui = max(C1i , .. , Cki ) Li = min(C1i , .. , Cki ) They form the smallest possible bounding envelope that encloses sequences C1, .. ,Ck . We call the combination of U and L a wedge, and denote a wedge as W. W = {U, L} A lower bounding measure between an arbitrary query Q and the entire set of candidate sequences contained in a wedge W: C1 C2 U W L W Q
C1 (or W1 ) C2 (or W2 ) C3 (or W3 ) W(1, 2) Generalized Wedge • Use W(1,2) to denote that a wedge is built from sequences C1 and C2 . • Wedges can be hierarchally nested. For example, W((1,2),3) consists of W(1,2) and C3 . W((1, 2), 3)
H-Merge Time Series 1 5 9 • Compare the query to the wedge using LB_Keogh • If the LB_Keogh function early abandons, we are done • Otherwise individually compare each candidate sequences to the query using the early abandoning algorithm 2 6 10 11 3 7 12 4 8 Candidates
W3 W3 W3 W2 W(2,5) W(2,5) W5 W1 W1 W(1,4) W4 W4 W((2,5),3) W(((2,5),3), (1,4)) W(1,4) K = 5 K = 4 K = 3 K = 2 K = 1 Hierarchal Clustering C3 (or W3) C5 (or W5) C2 (or W2) C4 (or W4) C1 (or W1) Which wedge set to choose ?
Which Wedge Set to Choose ? • Test all k wedge sets on a representative sample of data • Choose the wedge set which performs the best
C1 (or W1 ) C2 (or W2 ) C3 (or W3 ) W(1, 2) Upper Bound on H-Merge • Wedge based approach seems to be efficient when comparing a set of time series to a large batch dataset. • But, what about streaming time series ? • Streaming algorithms are limited by their worst case. • Being efficient on average does not help. • Worst case Subsequence W((1, 2), 3)
W3 W3 W3 W2 W(2,5) W(2,5) W5 W1 W1 W(1,4) W4 W4 Triangle Inequality Ifdist(W((2,5),3), W(1,4)) >= 2 r Subsequence W((2,5),3) W((2,5),3) < r W(((2,5),3), (1,4)) >= 2r ? W(1,4) K = 5 K = 4 K = 3 K = 2 K = 1 W(1,4) fails cannot fail on both wedges
Experimental Setup • Datasets • ECG Dataset • Stock Dataset • Audio Dataset • We measure the number of computational steps used by the following methods: • Brute force • Brute force with early abandoning (classic) • Our approach (H-Merge) • Our approach with random wedge set (H-Merge-R)
Experimental Results: ECG Dataset • Batch time series • 650,000 data points (half an hour’s ECG signals) • Candidate set • 200 time series of length 40 • r = 0.5 9 x10 6 brute force 5 4 Number of Steps 3 2 1 classic H-Merge-R H-Merge 0 Algorithms
Experimental Results: Stock Dataset • Batch time series • 2,119,415 data points • Candidate set • 337 time series with length 128 • r = 4.3 10 brute force x 10 10 9 8 7 6 Number of Steps 5 4 3 classic H-Merge-R 2 H-Merge 1 0 Algorithms
Experimental Results: Audio Dataset • Batch time series • 46,143,488 data points (one hour’s sound) • Candidate set • 68 time series with length 101 • r = 4.14 • Sliding window • 11,025 (1 second) • Step • 5,512 (0.5 second) brute force 7 x 10 6 5 4 Number of Steps 3 2 1 H-Merge-R classic H-Merge 0 Algorithms
Experimental Results: Sorting • Wedge • with length 1,000 • Random walk time series • with length 65,536