200 likes | 451 Views
Pattern Finding and Pattern Discovery in Time Series. Trần Quốc Long College of Computing, Georgia Tech. Long Q Tran College of Computing, Georgia Tech. Contents. Pattern Finding & Pattern Discovery Pattern Finding & Pattern Discovery in Time Series Hidden Markov Models (HMMs) Summary.
E N D
Pattern Finding and Pattern Discovery in Time Series Trần Quốc Long College of Computing, Georgia Tech Long Q Tran College of Computing, Georgia Tech
Contents • Pattern Finding & Pattern Discovery • Pattern Finding & Pattern Discovery in Time Series • Hidden Markov Models (HMMs) • Summary
Pattern Finding • Problems: given observed patterns O1, O2, … OK, specify which pattern the new data X possess? • Other names: pattern recognition, pattern classification • Examples • Recognition: matching fingerprints of the claimant with those of authorized personnel.
Pattern Finding • Patterns are known beforehand and are observed/described by • Explicit samples • Similar samples (usually) • Modeling approaches: • Build a model for each pattern • Find the best fit model for new data • Usually require training using observed samples
Pattern Discovery • Patterns are not known • But data which are believed to possess patterns are given • Examples: • Clustering: grouping similar samples into clusters • Associative rule mining: discover certain features that often appear together in data
Contents • Pattern Finding & Pattern Discovery • Pattern Finding & Pattern Discovery in Time Series • Hidden Markov Models (HMMs) • Summary
Time Series • Data are sampled over time X = X1 X2 … Xt … XL • Xt : data sampled at time t • L : sequence length • Xt are NOT independently and identically distributed (NOT i.d.d) • In other words, Xt may come from different processes that are dependent of each other
Pattern Finding in Time Series • Examples • In control, certain pattern of sensor signals indicate critical point of the production process • In stock, certain pattern (up/down) of price indicate the trend of the market • People often have to look at the graph by their own eyes and act accordingly when spotting known pattern X. Ge & P. Smyth (2000): detecting end-point in plasma etch (semiconductor manufacturing)
Pattern Finding in Time Series • Problems: • Data may contain one or more patterns inside • Data can be multi-dimensional (i.e. look at multiple graphs at the same time) • Automated pattern finding is crucial when time series are lengthy and multi-dimensional
Pattern Discovery in Time Series • Goals: From collected data, discover • Replicated, interesting patterns • Associative rule on patterns (can use to predict trends of time series)
Pattern Modeling in Time Series • Both pattern finding and pattern discovery need modeling • Desired properties of the model • The model can be built or trained using observed data • The similarity of new data and the model can be easily computed
Contents • Pattern Finding & Pattern Discovery • Pattern Finding & Pattern Discovery in Time Series • Hidden Markov Models (HMMs) • Summary
Y1 Y2 YL 0.6 0.4 … 0.4 0.6 0.4 0.6 1 2 1 2 0.6 0.4 X1 X2 XL Hidden Markov Models (HMMs) • One way to model time series pattern • Assumptions: • Xt is generated from certain probability distribution Yt (called state) • Number of states is finite (i.e. finite sources of data) • State transition follows Markov property
Hidden Markov Models (HMMs) • Parameters to estimate: • Transition probabilities • Distribution parameters in each state • Estimation procedure: • Initialization: k-means, viterbi training • Iterative training: forward-backward procedure (EM algorithm) • Variants of HMM: • Mixture of HMMs: allow many HMMs computed simultaneously • State durational HMM: allow a state remains for a duration
Mixture of HMMs • Assumption: • There are different processes (pattern) that generate the time series • Each process can be represented by a HMM • Mixture of HMMs allows • Packing all pattern models in one place • Identifying the processes that generate the time series • Training be efficiently implemented
2 0.6 0.4 0.4 0.6 0.4 0.6 1 2 1 2 0.6 0.4 Experiment • Experiment settings • Generate 200 sequences for each HMM • After 200 iterations Gaussian: = 0, = 1 1 Gaussian: = 2, = 1 2 11 = -0.07, 11 = 0.97 21 = 2.01, 21 = 0.99 12 = 1.90, 12 = 1.10 22 = -0.01, 22 = 0.98
Summary • Automated pattern finding and pattern discovery in time series are needed • HMMs and its variants can model time series patterns • Parameters can be efficiently initialized and estimated using observed data
Appendix: HMMs • Parameters: = (transition prob., distribution params.) • Recognition • Calculate P(X1X2…XL |) • Forward procedure • Estimation: • Maximize L() = P(X1X2…XL |) • EM algorithm: forward – backward procedure • Clustering • Find • Viterbi algorithm