410 likes | 700 Views
Finding surprising patterns in a time series database in linear time and space. Eamonn Keogh University of California, Riverside, CA Stefano Lonardi University of California, Riverside, CA Bill 'Yuan-chi' Chiu University of California, Riverside, CA KDD '02. 鍾宜珍 黃介揚 侯元忠 張仲威. Outline.
E N D
Finding surprising patterns in a time series database in linear time and space • Eamonn Keogh University of California, Riverside, CA • Stefano Lonardi University of California, Riverside, CA • Bill 'Yuan-chi' Chiu University of California, Riverside, CA • KDD '02 鍾宜珍黃介揚侯元忠張仲威
Outline • Introduction • TARZAN algorithm • Experimental evaluation • Conclusion
What is time series • Time series is a collection of observations made sequentially in time • example: weather patterns, commodity price, economic activity
What is surprising patterns? • surprising: pattern not expected • in this paper: The departure of frequency of the pattern from its expected frequency is beyond we could accept • Example:human electrocardiogram
Outline of TARZAN build suffix tree discretize abcaabb… markov chain computing scores calculate expected frequency Surprising pattern??
STEP 1Discretizing time series discretize abcaabb…
why? • Markov chain is a discrete time stochastic process
Discretizing time series • 0.90.90.6-1.1-0.3 • After sorting • -1.1-0.30.60.90.9
Discretizing time series Aaacb
STEP 2build suffix tree and markov model build suffix tree calculate expected frequency markov chain
Suffix Trees • Advantage: • space-efficient
Suffix Trees • cocoa • cocoa • ocoa • coa • oa • a
0 1 0 0 1 1 0 0 0 1 1 P = Transition matrix 1
欲知第一天生一隻豬(0),第四天生兩隻豬(1)的機率?欲知第一天生一隻豬(0),第四天生兩隻豬(1)的機率?
We still don’t know the true model • maximum likelihood estimator • Let y be a substring of x and • then,
Experiments • sensitivity • selectivity • compare with tsa-tree and IMM
Experiment1 sensitivity- 是否可以找到anomaly
Experiment2 • the power demand of a Dutch research facility • input 一整年的資料, 觀察是否偵測到假日電量的異常
Experiment3- selectivity • how? • random walk data- can contain any possible pattern • when size goes infinity, every pattern should be repeated • not consider IMM, TSA-tree • IMM- self will become saturated, thus, nothing will be surprising • TSA- not learn from experience
Conclusion • pros • linear time and space • great sensitivity • 不需要特別定義surprising patterns • cons • tree的形狀不能控制, 可能會不對稱 • selectivity 沒有跟其他演算法比較
Application • 網路方面- 流量異常, 偵測是否有攻擊 • 生醫方面- 偵測心律不整, 觀察腦波異常活動 • 工廠觀測- 供電量, 輸送速度是否異常
Discussion • 偵測異常的反應速度? • 是否適用於real time的偵測? • 在偵測異常之後的動作... • 是否可以追溯到發生異常的原點? • 是否能知道為何造成異常?
Reference • C. Shahabi, X. Tian, and W. Zhao. Tsa-tree: A Wavelet-Based Approach to Improve the Efficiency of Multi-Level Surprise and Trend Queries on Time-Series Data. In Proc. 12th International Conference on Scientific and Statistical Database Management, 2000. • D. Dasgupta and S. Forrest. Novelty Detection in Time Series Data using Ideas from Immunology. In Proc. of The International Conference on Intelligent Systems, 1999.