260 likes | 478 Views
IMPORTANT EXTREMA OF TIME SERIES: THEORY AND APPLICATIONS. Introduction. Time series Definition: Sequence of values measured at equal intervals Examples: stock prices, weather data, electrocardiograms, etc. Compression using Important extrema Importance levels Indexing based importance
E N D
Introduction • Time series • Definition: Sequence of values measured at equal intervals • Examples: stock prices, weather data, electrocardiograms, etc. • Compression using Important extrema • Importance levels • Indexing based importance • Distance computation based indexing trees • Pattern retrieval based on indexing trees
Previous work – Specific Indexing and fast retrieval of time series [Pratt, 2001; Pratt and Fink, 2002; Fink and Pratt, 2003] Compression Use Important points Identify important points using ration between values of points Use the parameter R to control the compression Do not have the direct correlation between compression rate and R Similarity measures and pattern retrieval Define leg/extended leg as segment between two consecutive important points Index the legs with their ration Retrieve the legs that are similar to the prominent leg of the pattern Find the similarity between pattern and the series who has leg similar to prominent leg Output the series whose similarity is within the given threshold Previous Work
Related structures and algorithms Stacks and linked lists Red-black trees Order statistics Time series representation Series structure full-size – number of points in the original series cmpr-size – number of points in the compressed series points – array or red-black tree of all points Point structure index – index of the point in the original series value – value of the point type – end-point, minimum, maximum or non-extremal next – next point in the compressed series prev – previous point in the compressed series side – strict, left, right, or flat imp – importance of point Basic Concepts
Distance measures Distance between real values Definition: two-argument function that satisfy the following properties For every value a, dist(a,a) = 0 For every two values a and b, dist(a,b) = dist(b,a) For every three values a, b, and c, if a<b<c, then dist(a,b) <= dist(a,c) and dist(b,c) <= dist(a,c) Examples |a-b|, |a-b|/|a|+|b| Distance composition Definition: real-value function f(d1,…,dq) on non-negative real arguments such that f(0,…,0) = 0 and f is monotonically increasing on each of its argument Example If w1, …,wq are weights and dist1, …,distq are distance functions then w1.dist1+….+wq.distq is a composition. Basic Concepts..
Basic Concepts.. • Distance between time series • Definition: For two equal-length series, a1,…anand b1,…,bn, and a distance function dist for real values, the corresponding lr distance between the series is lr = (1/n . Eni=1 (dist(ai ,bi))r)1/r • Example: In the below picture the l1 distance between series is 2.0 and l2 distance is 2.1 • Advantage of using distance functions • Flexibility of choosing different distance functions and their compositions.
Important Points • Extrema • We define extremum as a minimum or maximum in a series • Formal definition of a minimum: The point ai of a time series a1,…,an is a strict minimum if ai<ai-1 and ai<ai+1 • Example • Strict, left, right, and flat extrema
Important Points.. • Extrema.. • Compression by extracting all extrema • Example • Algorithm • Space complexity = constant • Time complexity = O(n); n = number of points in the series • Can process live series
Important Points.. • Important Extrema • Higher compression by selecting only certain important extrema. • Control the compression rate using the parameter R • Formal definition of important minimum: The point ai of a time series a1,…,an is a strict important minimum if • If ai is minimum among ail,…,air, and • dist(ai,ail) >= R and dist(ai,air) >= R • Examples of Strict, left, right, and flat important minima
Important Points.. • Important Extrema.. • Compression by extracting important extrema • Example • Algorithm • Space complexity = constant • Time complexity = O(n);n = number of points in the series • Can process live series
Important Points.. • Derivative series • Compression based on changes in slope • Example • Algorithm • The algorithm for important extrema can modified for derivative series
Important Points.. • Importance Levels • Idea: Assign numerical importance to the extrema and use the importance for compression, indexing, pattern retrieval • Definition:If a point is a strict (left, right, flat) extremum for compression with some value of R, then its strict (left, right, flat) importance is the maximal value of R for which it is a strict (left, right, flat) extremum. • Example with distance function = |a-b|
Important Points.. • Importance Levels • strict, left, right, and flat importance • Algorithm to assign importance • One pass through the series • Space complexity = O(m) ; m = number of extrema in the series • Time complexity = O(n) ; n = number of points in the series
Important Points.. • Compression rate • rate = number of points removed during the compression • Problem: Select important extrema according to a given rate • Example: The compression rate in below series is 60% since we have selected eight of 20 points
Important Points.. • Compression rate.. • Three-pass algorithm • Makes three-passes through the series • Space complexity = O(m) ; m = number of extrema in the series • Time complexity = O(n) ; n = number of points in the series • One-pass algorithm • Makes one pass and uses red-black tree to keep only desired number of important extrema. • Space complexity = O(m) ; m = number of extrema • Time complexity = O(n+m.lgs) ; n = number of points; s = number of points in the compressed series • Dependency on the distance functions
Indexing Trees • Idea • Index the series based on importance for fast retrieval of compressed series • Indexing based on importance • Can retrieve s points in O(s) time • Need to sort the retrieved points, which takes O(s.lgs) • Augmented indexing structure • Example series
Indexing Trees.. • Augmented indexing structure .. • Structure • Left superior: nearest extremum to the left in the original series with strictly greater importance • Right Superior: nearest extremum to the right with equal or greater importance
Indexing Trees.. • Algorithms • Sorted retrieval • Space and Time complexity = O(s) ; s = number of points to be retrieved • Building augmented tree • Space complexity = O(m) ; m = number of extrema • Time complexity = O(m.lgm) • Range Tree • Problem: retrieve important points of a given segment ail,…,air • Idea • Use Range tree • Index the points in the series by position as well as importance • Space complexity of building the range tree = O(m) • Time complexity of building the range tree = O(m. lgm) • Time complexity of retrieval = O(s. lgs + lgm)
Distance Computation • Distance range • Problem: find the distance range between two compressed series • Idea • Each leg in the series can be bounded by a rectangle as shown • Bounding rectangle represents the bounds of all the points in that segment • Find the distance range between the series by finding the distance range between the bounding rectangles • Algorithm • Time complexity = O(s); s = number of points in the compressed series • Space complexity = constant ; if the compressed series are in file = O(s); if we need to retrieve compressed series from tree
Distance Computation.. • Approximate Distance • Problem: find the distance between two series with a given a approximation • Algorithm • Uses pre-computed indexing trees • Generated highly compressed versions to start. If the distance range is not within the given accuracy, it increases the number of points by a factor of 2 • Time and space complexity = O(s) ; s= number of points required for a given accuracy • Threshold Test • Problem: determine whether the distance between two series is smaller than the given threshold • Algorithm • Same idea as in Approximate distance • Checks whether the lower bound is greater than the threshold or the upper bound is less than the threshold
Pattern Retrieval • Range Query • Fetch all the series that are within a given range from the pattern • Nearest neighbor • Fetch the closest series from the pattern • Multiple neighbors • Fetch given number of closest series to the pattern • Complexity of the algorithms • Time complexity • Best case = O(N) ; N = number of series in database • Worst case = O(n.N); n = number of point in each series • Space complexity • Best case = O(s.N) ; s= number of points required to perform the threshold test
Conclusions • Distance measures • Strict, left, right, and flat important extrema • Extrema in derivative series • Importance levels • compression, indexing, and pattern retrieval techniques using importance level • Future work • Experiments • Investigating the usefulness
Acknowledgements • Eugene Fink • Colleagues and Management at Nielsen Media Research • Colleagues at Cognizant Technology Solutions, Ltd • Family • Parents – Mohan Reddy Gandhi and Sulochana Gandhi • Brother – Sharat Gandhi • Wife – Madhuri Gandhi