320 likes | 482 Views
Analysis of Constrained Time-Series Similarity Measures. Vladimir Kurbalija , Miloš Radovanović , Zoltan Geler , Mirjana Ivanović Department of Mathematics and Informatics Faculty of Science University of Novi Sad Serbia. Agenda. Introduction Related Work Experimental Evaluation
E N D
Analysis of Constrained Time-Series Similarity Measures Vladimir Kurbalija, MilošRadovanović, ZoltanGeler, MirjanaIvanović Department of Mathematics and Informatics Faculty of Science University of Novi Sad Serbia
Agenda • Introduction • Related Work • Experimental Evaluation • Computational Times • The Change of 1NN Graph • Conclusions and Future Work
Time Series • Time-series (TS) consists of sequence of values or events obtained over repeated measurements of time • Time-series analysis (TSA) comprises methods that attempt to understand such time series • To understand the underlying context of the data points, or to make forecasts
Applications and Task Types • Applications: • stock market analysis, • economic and sales forecasting, • observation of natural phenomena, • scientific and engineering experiments, • medical treatments etc. • Task Types • indexing, • classification, • clustering, • prediction, • segmentation, • anomaly detection, etc.
ImportantConcepts • Pre-processing transformation, • Time-series representation • Similarity/distance measure
Pre-processing Transformation • “Raw” time series usually contain some distortions • The presence of distortions can seriously deteriorate the indexing problem • Some of the most common pre-processing tasks are: • offset translation, • amplitude scaling, • removing linear trend, • removing noise etc.
Time-series Representation • Time series are generally high-dimensional data • Many techniques have been proposed: • Discrete Fourier Transformation (DFT) • Singular Value Decomposition (SVD) • Discrete Wavelet Transf. (DWT) • Piecewise Aggregate Approximation (PAA) • Adaptive Piecewise Constant Approx. (APCA) • Symbolic Aggregate approX. (SAX) • Indexable Piecewise Linear Approx. (IPLA) • Spline Representation • etc.
Similarity/distance Measure • Similarity-based retrieval is used in all a fore mentioned task types • The distance between time series needs to be carefully defined in order to reflect the underlying (dis)similarity (based on shapes and patterns). • There is a number of distance measures: • Lp distance (Lp) - Eucledian Distance (for p=2) • Dynamic Time Warping (DTW) • distance based on Longest Common Subsequence (LCS) • Edit Distance with Real Penalty (ERP) • Edit Distance on Real sequence (EDR) • Sequence Weighted Alignment model (Swale) [31], etc.
SimilarityMeasures • Many of these similarity measures are based on dynamic programming (DTW, LCS, ERP, EDR...) • The computational complexity of dynamic programming algorithms is quadratic • The usage of global constraints such as the Sakoe-Chiba band and the Itakura parallelogram can significantly speed up the calculation of similarities • The usage of global constraints can improve the accuracy of classification
Our Research • Dynamic Time Warping (DTW) and Longest Common Subsequence measure (LCS) • the speed-up gained from these constraints • the change of the 1-nearest neighbor graph with respect to the change of the constraint size • FAP (Framework for Analysis and Prediction)http://perun.pmf.uns.ac.rs/fap/ • UCR Time Series Repository http://www.cs.ucr.edu/~eamonn/time_series_data/
Agenda • Introduction • Related Work • Experimental Evaluation • Computational Times • The Change of 1NN Graph • Conclusions and Future Work
Euclidean Metric • Most intuitive metric for time series, and as a consequence very commonly used • Very fast –computation complexity is linear • Very brittle and sensitive to small translations across the time axis
Dynamic Time Warping (DTW) • Generalization of Euclidian measure • Allows elastic shifting of the time axis where in some points time “warps” • Computes the distance by finding an optimal path in the matrix of distances of two time series
Longest Common Subsequence (LCS) • Different methodology • Similarity between two time series is expressed as a length of the longest common subsequence of both time series
Global Constraints • DTW and LCS are based on dynamic programming – the algorithms search for the optimal path in the search matrix • Global constraints narrow the search path in the matrix which results in a significant decrease in the number of performed calculations
Agenda • Introduction • Related Work • Experimental Evaluation • Computational Times • The Change of 1NN Graph • Conclusions and Future Work
Quality of SimilarityMeasures • Quality of similarity measures is usually evaluated indirectly • By assessment of different classifier accuracy • Simple 1-nearest classifier (1NN) gives among the best results for time-series data • The accuracy of 1NN directly reflects the quality of a similarity measure • We report the calculation times for unconstrained and constrainedDTW and LCS • We focus on the 1NN graph and its change with regard to the change of constraints
Experimental Evaluation • The unconstrained measure and a measure with the following constraints: 75%, 50%, 25%, 20%, 15%, 10%, 5%, 1% and 0% of the size of the time series • Smaller constraints have more interesting behavior • Set of experiments was conducted on 38 datasets from UCR Time Series Repository • The length of time series varies from 24 to 1882 depending of the data set • The number of time series per data set varies from 60 to 9236.
Computational Times • The efficiency of calculating the distance matrix • The distance matrix for one data set is the matrix where element (i,j) contains the distance between i-th and j-th time series • The calculation of the distance matrix is a time-consuming operation • All experiments are performed on AMD Phenom II X4 945 with 3GB RAM
Computational Times • Introduction of global constraints in both measures significantly speeds up the process of distance matrix computation • Direct consequence of a faster similarity measure • It is known for DTW that smaller values of constraints can give more accurate classification • The average constraint size, which gives the best accuracy, for all datasets is 4% of the time-series length • LCS measure is still not well investigated
The Change of 1NN Graph • The nearest neighbor graph is a directed graph where each time series is connected with its nearest neighbor • graph for unconstrained measures (DTW and LCS) and for measures with the following constraints: 75%, 50%, 25%, 20%, 15%, 10%, 5%, 1% and 0% of the length of time series • The change of nearest neighbor graphs is tracked as the percentage of time series (nodes in the graph) that changed their nearest neighbor compared to the nearest neighbor in the unconstrained measure
The Change of 1NN Graph • Both measures behave in a similar manner when the constraint is narrowed • 1NN graph remains the same until the size of the constraint is narrowed to approximately 20%, and after that the graph starts to change significantly • All datasets (for both measures) reach high percentages of difference (over 50%) for small constraint sizes (5-10%) • Constrained measures represent qualitatively different measures than the unconstrained ones
Agenda • Introduction • Related Work • Experimental Evaluation • Computational Times • The Change of 1NN Graph • Conclusions and Future Work
Conclusions • We examined the influence of global constraints on two most representative elastic measures for time series: DTW and LCS • Through an extensive set of experiments we showed that the usage of global constraints can significantly reduce the computation time • We demonstrated that the constrained measures are qualitatively different than their unconstrained counterparts • For DTW it is known that the constrained measures are more accurate, while for LCS this issue is still open.
Future Work • To investigate the accuracy of the constrained LCS measure for different values of constraints • To explore the influence of global constraints on the computation time and 1NN graphs of other elastic measures like ERP, EDR, Swale, etc. • The constrained variants of these elastic measures should also be tested with respect to classification accuracy
Thank you for your attention FAP site: http://perun.pmf.uns.ac.rs/fap/