220 likes | 487 Views
Exact indexing of Dynamic Time Warping. Eamonn Keogh Computer Science & Engineering Department University of California - Riverside Riverside,CA 92521 eamonn@cs.ucr.edu. Presented By: Ankit Hirdesh Piyush Goswami. INTRODUCTION. Time Series
E N D
Exact indexing of Dynamic Time Warping Eamonn Keogh Computer Science & Engineering DepartmentUniversity of California - RiversideRiverside,CA 92521eamonn@cs.ucr.edu Presented By: AnkitHirdesh PiyushGoswami
INTRODUCTION • Time Series • collection of observations made sequentially in time • Occur in Medical, business, scientific domain • Finding out similarities between two time series is required in many time series data mining applications
CHALLENGES • How do we define similarity ? • Need a method that allows elastic shifting of time axis to accommodate sequences that are similar but can be out of phase • Large Amount of data • How do we search quickly ?
SOLUTIONS • Euclidean distance • Aligned one to one • Cannot find similarity b/w out of phase signals • Dynamic Time Warping • Can be non-linearly aligned
WHAT IS TIME WARPING Q C Warping Path
DYNAMIC TIME WARPING • (i,j) = d(qi,cj) + min{ (i-1,j-1) , (i-1,j ) , (i,j-1) } • Three Basic Constraints of Time Warping • Path should include beginning and ending • Path should not have any jumps • Path cannot go back in time
Global Constraints for Speedy Calculations • Limit the warping path wk = (i,j)k close to diagonal i.e. j-rij+r where r is the “reach” • Speed up the calculations – from O(n2) to O (n) • Prevent pathological warpings Warping Window
Lower Bounding • Both Euclidean and DTW metric highly demanding in terms of CPU and I/O time • A lower bounding function can also speed up the similarity search by erasing sequences that could not possibly be a best match • Must be fast • Must be tightly bound
Existing Lower Bounding Techniques Lower bounding measure by Kim et al. The maximum squared difference between the two sequences first (A), last (D), minimum (B) and maximum points (C) is returned as the lower bound. Lower bounding measure by Yi et al. The sum of the squared length of gray lines is returned as the lower bounding measure
Proposed Lower Bounding Method • Let us define two sequences: where r is the reach, U and L stand for Upper and Lower respectively. • Also: A : Bounding Envelope for Sakoe – Chiba Band B: Bounding envelope for Ikatura parallelogram
Proposed Lower Bounding Method – LB_KEOGH • The query sequence Q is enclosed in the bounding envelope of U and L. • The squared sum of the distances from every part of the candidate sequence C not falling within the bounding envelope, to the nearest orthogonal edge of the bounding envelope is returned as the lower bound. • A and B mean same as previous slide. LB_KEOGH (Q,C) DTW (Q,C)
The tightness of the lower bound for each technique is proportional to the length of gray lines LB_Kim LB_Yi LB_Keogh Sakoe-Chiba LB_Keogh Itakura
How to index Dynamic Time Warping • Piecewise Aggregate Approximation (PAA) • Represent time series as sequence of box basis functions • Reduce dimensionality from n to N, as time series may include large number of items, degrading performance of indexing • Data divided into N equal sized frames • Extremely fast to calculate
PAA continued • PAA of U and L, denoted by Û and Ĺ .
hi h1 h2 li l1 l2 Indexing Dynamic Time Warping • There are two time series data sets (Q and C) in length n, both are being divided into N dimension. • C is a candidate sequence • Q is a query sequence. • Approximate the minimum bounding rectangle (R) in each dimension of candidate sequence C • MINDIST (Q,R) = MBR R = (L,H) L = {l1, l2, …, lN} H = {h1, h2, …, hN} MINDIST(Q,R)
K-Nearest neighbor search algorithm • Given query sequence Q and desired number of K time series neighbors from a set C • Priority queue is used for storing the index in an increasing order of distance from Q • Push root node of index into Q • At each step Pop from top of queue • If popped item is PAA point C, compute exact DTW(Q,C) and insert into temporary list ‘temp’ • If index node, compute distance of each children from Q and push them into queue • Move C from temp to result only when we are sure that it is one of K-NN of Q
Experimental Evaluation • Most comprehensive and detailed set of time series indexing experiments ever conducted • Sakoe – Chiba Band with 10% width was used • 32 datasets from various sources were taken. 50 sequences of length 256 were randomly extracted. • Tightness of lower bound functions was compared by taking one sequence at a time and comparing with 49 others
Experimental Evaluation Contd.. Pruning power of the lower bounding functions was also compared similarly LB_Keogh was also evaluated against Linear Scan on the basis of Normalized CPU Cost
Conclusion This paper provides a way to speed up DTW by indexing DTW allows us to do similarity matching between sequences which are out of phase. Euclidean space does not give us that privilege A new Lower Bounding function was proposed: LB_Keogh, which is superior than the ones seen previously Method to index time series using the proposed lower bounding function was showed
References Eamonn J. Keogh: Exact Indexing of Dynamic Time Warping. VLDB 2002: 406-417 Slides for the above paper by same author (All colored pictures in the presentation are from the author’s slides) Slides from following class web page: www.csis.hku.hk/~nikos/courses/CSIS7101/multimedia.ppt