300 likes | 492 Views
Time Series Sequence Matching. Jiaqin Wang CMPS 565. Papers. “ Fast subsequence Matching in time-series database ” Christos Faloutsos, M.Ranganathan Yannis Manolopoulos “ Skyline index for time series data ” Quanzhong Li, Ines Fernando Vega Lopez, Bongki Moon. Types of Time Series sequence.
E N D
Time Series Sequence Matching Jiaqin Wang CMPS 565
Papers • “Fast subsequence Matching in time-series database”Christos Faloutsos, M.Ranganathan Yannis Manolopoulos • “Skyline index for time series data”Quanzhong Li, Ines Fernando Vega Lopez, Bongki Moon
Types of Time Series sequence • Financial, marketing area • Stock prices • Sales numbers • Scientific databases • Weather data • Environmental data
Categories for time series sequencematching • Whole matching • data sequences and query sequence have the same length • Subsequence matching • Query sequence and data sequence have different length
Whole matching • Given N sequences with the same length l • Use features extraction function to convert sequences into n-dimensional values • DFT • N-dimensional value (Q1,Q2,…,Qn) • Most energy in first few coefficients • Keep first few coefficients • Reduce dimensions of sequence
Whole matching • Map each sequence as a n-dimensional point into the feature space • Only take first 2 coefficients • Organize these points into R-tree • For index and search in R-tree
Whole matching • New coming query sequence • Use DFT convert to feature point • Map the query feature point into feature space • Find out points whose distance to query point within tolerance e • Consider them similar
Some pictures of time series data and DFT • Discrete Fourier Transform (DFT ) • keep first few (2-3) coefficients • The first few coefficients contain most energy of the feature
Feature space • TS1(0.05,3) • TS2(0.01,12) • ……
Feature space • The distance e < minimum query distance
Subsequence matching • A collection of N sequences, each one has different length • A query Q with tolerance e • Find out all sequence Sі(1<i<N), along with the correct offsets k,such that the sequence Sі[k:k+Len(Q)-1] matches the query sequence: D(Q, Sі[k:k+Len(Q)-1] ) <= e
ST-index • Assuming the minimum query length w • Using a sliding window of size w and place it on the date sequence at every possible offsets of the whole data sequences • Extract the features in window at each possible offset and map each feature as a point into feature space
Figure • Sliding window on sequence from offset 0 to Len(S)-w+1 • The length of window is w
Figure • Sliding window on sequence from offset 0 to Len(S)-w+1 • The length of window is w
Figure • Sliding window on sequence from offset 0 to Len(S)-w+1 • The length of window is w
Figure • Sliding window on sequence from offset 0 to Len(S)-w+1 • The length of window is w
Figure • Sliding window on sequence from offset 0 to Len(S)-w+1 • The length of window is w
Result • A series of points in the feature space is curve • R-tree
MBRs • Store points in R-tree is inefficient • Divide trial into sub-trials using minimum bounding rectangles (MBRs)
MBRs in R-tree • Combine small MBRs • Get the index information
How to insert points into MBRs • Group the points into MBR with a fixed-number • Group the points into MBR with a variable-number
I-adaptive method • One greedy algorithm • number of disk access • cost function • average cost function
Algorithm • Assign the first point of the trail in a sub-trail • For each successive point • If it increases the average cost of current sub-trail • Then start another sub-trail • Else include this point in current sub-trial
Skyline index for time series data • “Skyline index for time series data”Quanzhong Li, Ines Fernando Vega Lopez, Bongki Moon
Adaptive Piecewise Constant Approximation (APCA) • What is APCA?
Adaptive Piecewise Constant Approximation (APCA) • Limitation of APCA • Internal overlap in MBRs
Skyline Bounding Region (SBR) • SBR • N time series data objects of length l • Specify 2-dimensional regions by top and bottom skylines
Approximate SBR • Many approaches • Equal-length constant-valued segments • Variance-length constant-valued segments • ASBR will cover the original SBR
Index Approximation SBR • R-Tree based Skyline index • Internal node • Approximation SBR • Pointer to child node • Leaf node • Pointer to time series data
The End Thank You