FastDTW: Toward Accurate Dynamic Time Warping in Linear Time and Space

FastDTW: Toward Accurate Dynamic Time Warping in Linear Time and Space Department of Computer Sciences Florida Institute of Technology Stan Salvador and Philip Chan

Outline • Dynamic Time Warping (DTW) • Problem Statement • Related Work for Speeding up DTW • FastDTW Algorithm • Evaluation of FastDTW • Contributions • Limitations and Future Work

Dynamic Time Warping (DTW) • Aligns two time series by warping the time dimension • Warping - expanding/contracting the time dimension

The Dynamic Time Warping Algorithm • A dynamic programming approach • Solutions to slightly smaller problems used to find larger solutions

The DTW Cost Matrix

Distance of Min-Cost Warp Path

Finding Min-Cost Warp Path

Advantages of DTW • DTW is optimal • An intuitive distance measurement • Local variation in the time axis is common • Handwriting • Speech • “Events” that start after varying delays

Disadvantages of DTW • O(N2) time and space complexity • Only practical for small data sets (<3,000) • Time series are often very long • Data mining requires a scalable DTW algorithm

Problem Statement • We desire an efficient Dynamic Time Warping algorithm • Linear time complexity • Linear space complexity • Warp path is needed in addition to warp distance • Warp path must be nearly optimal

Does DTW Need to be Faster? “Myth 3: There is a need (and room) for improvements in the speed of DTW for data mining applications.” (Keogh today-9:45am) • Keogh: many time series • FastDTW: Long time series

Existing Methods to Speed Up DTW • Constraints – only fill in part of the cost matrix • Abstraction – sample the data before time warping

Constraints Sakoe-Chiba Band (Sakoe & Chiba 1978) Itakura Parallelogram (Itakura 1975) • Still O(N2) if the window width is a function of input size (linear if the width is constant) • Assumes a near-optimal warp path stays near the i=j axis • Accuracy depends on the domain

Abstraction (Keogh & Pazzani 2000), (Chu et al. 2002) • O(N) if N pts are sampled down to ≤ • Assumptions • Sampling preserves time series structure • Small deviations from the optimal path cause little increase in warp-path distance

Our FastDTW Algorithm • A multi-resolution approach inspired by a multi-level graph bisection algorithm (Karypis 1997) • 3 key operations • Coarsening – reduce the resolution of a time series • Projection – use a low-res warp path as an initial solution at a higher resolution • Refinement – Refine a projected warp path locally adjusting the path

Sample Run of FastDTW

FastDTW Algorithm • Set the resolution to be the coarsest • Find the initial path using regular DTW • Repeat • Double the resolution • Project the path onto the finer resolution • Find a path through the projected area (plus a small radius around the projected area) • Until the original resolution is reached

Complexity • O(N) time • O(N) space • Details in the paper

Evaluation Criteria • Accuracy The error of an approximate Time Warping algorithm: % error = where: approxDist – the warp path distance of the approximate algorithm optimalDist – the warp path distance of the DTW algorithm • Efficiency Runtime (measured in seconds)

Evaluation Procedure (Accuracy) • Data Sets – UCR Time Series Data Mining Archive (Keogh & Folias 2002), 3 groups used: • Random – 45 unrelated time series (earthquakes, random walk, eeg, speech, etc.) • Trace – 200 time series simulating nuclear power plant failure (4 classes) • Gun – 200 time series of a gun being drawn and pointed (2 classes) • Procedure • Run FastDTW, Constraints (Sakoe-Chiba Band), and Data Abstraction on all pairs within a data set group, also vary the radius • Record the average error of all three methods for a group of data and a radius

Average % Error (Accuracy)

Error in Different Data Sets

Evaluation Procedure (Execution-time) • Data Sets • Synthetic sine waves with Gaussian noise • 10 to 180,000 data points • Procedure • Run FastDTW and DTW on each data set, vary the radius for FastDTW • Compare the Execution times

Execution Time

Summary of Contributions • FastDTW – an approximation of DTW • O(N) time and space complexity • Scales well to long time series • Accurate, 8.6% error if radius=1, 0.8% error if radius=20

Limitations and Future Work • Limitations • FastDTW does not always find an optimal solution • Future Work • Examine using different step sizes between resolutions • Investigate search algorithms to help improve refinement • Examine # of cells evaluated vs. accuracy between the FastDTW, Abstraction, and Band algorithms.

Questions? Thanks to those who helped with this research: Matt Mahoney (Florida Institute of Technology), Brian Buckley, Walter Schiefele (Interface & Control Systems) This research is partially supported by NASA

FastDTW Pseudocode Input: X, Y, radius Output: 1) A minimum distance warp path between X and Y 2) The warped path distance between X and Y 1| // The min size of the coarsest resolution. 2| Integer minTSsize = radius+2 3| 4| IF (|X|≤minTSsize OR |Y|≤minTSsize) 5| { 6| // Base Case: for a very small time series run the full DTW algorithm 7| RETURN DTW(X, Y) 8| } 9| ELSE 10| { 11| // Recursive Case: Project the warp path from a coarser resolution onto the current current resolution. 12| // Run DTW only along theprojected path (and also radius cells from the projected path). 13| TimeSeries shrunkX = X.reduceByHalf() // Coarsening 14| TimeSeries shrunkY = Y.reduceByHalf() // Coarsening 15| 16| WarpPath lowResPath = FastDTW(shrunkX, shrunkY, radius) 17| 18| SearchWindow window = ExpandedResWindow(lowResPath, X, Y, radius) // Projection 19| 20| RETURN DTW(X, Y, window) // Refinement 21| }

FastDTW: Toward Accurate Dynamic Time Warping in Linear Time and Space