1.12k likes | 1.27k Views
Sensor data mining and forecasting. Christos Faloutsos CMU christos@cs.cmu.edu. Outline. Problem definition - motivation Linear forecasting - AR and AWSOM Coevolving series - MUSCLES Fractal forecasting - F4 Other projects graph modeling, outliers etc. Problem definition.
E N D
Sensor data mining and forecasting Christos Faloutsos CMU christos@cs.cmu.edu
Outline Problem definition - motivation Linear forecasting - AR and AWSOM Coevolving series - MUSCLES Fractal forecasting - F4 Other projects graph modeling, outliers etc C. Faloutsos
Problem definition • Given: one or more sequences x1 , x2 , … , xt , … (y1, y2, … , yt, … … ) • Find • forecasts; patterns • clusters; outliers C. Faloutsos
Motivation - Applications • Financial, sales, economic series • Medical • ECGs +; blood pressure etc monitoring • reactions to new drugs • elderly care C. Faloutsos
Motivation - Applications (cont’d) • ‘Smart house’ • sensors monitor temperature, humidity, air quality • video surveillance C. Faloutsos
Motivation - Applications (cont’d) • civil/automobile infrastructure • bridge vibrations [Oppenheim+02] • road conditions / traffic monitoring C. Faloutsos
Automobile traffic 2000 1800 1600 1400 1200 1000 800 600 400 200 0 Stream Data: automobile traffic # cars time C. Faloutsos
Motivation - Applications (cont’d) • Weather, environment/anti-pollution • volcano monitoring • air/water pollutant monitoring C. Faloutsos
Stream Data: Sunspots #sunspots per month time C. Faloutsos
Motivation - Applications (cont’d) • Computer systems • ‘Active Disks’ (buffering, prefetching) • web servers (ditto) • network traffic monitoring • ... C. Faloutsos
Stream Data: Disk accesses #bytes time C. Faloutsos
Settings & Applications • One or more sensors, collecting time-series data C. Faloutsos
Settings & Applications Each sensor collects data (x1, x2, …, xt, …) C. Faloutsos
Settings & Applications Sensors ‘report’ to a central site C. Faloutsos
Settings & Applications Problem #1: Finding patterns in a single time sequence C. Faloutsos
Settings & Applications Problem #2: Finding patterns in many time sequences C. Faloutsos
Problem #1: Goal: given a signal (eg., #packets over time) Find: patterns, periodicities, and/or compress count lynx caught per year (packets per day; temperature per day) year C. Faloutsos
Problem#1’: Forecast Given xt, xt-1, …, forecast xt+1 90 80 70 60 Number of packets sent ?? 50 40 30 20 10 0 1 3 5 7 9 11 Time Tick C. Faloutsos
Problem #2: • Given: A set of correlatedtime sequences • Forecast ‘Sent(t)’ C. Faloutsos
Differences from DSP/Stat • Semi-infinite streams • we need on-line, ‘any-time’ algorithms • Can not afford human intervention • need automatic methods • sensors have limited memory / processing / transmitting power • need for (lossy) compression C. Faloutsos
Important observations Patterns, rules, compression and forecasting are closely related: • To do forecasting, we need • to find patterns/rules • good rules help us compress • to find outliers, we need to have forecasts • (outlier = too far away from our forecast) C. Faloutsos
Pictorial outline of the talk C. Faloutsos
Outline Problem definition - motivation Linear forecasting AR AWSOM Coevolving series - MUSCLES Fractal forecasting - F4 Other projects graph modeling, outliers etc C. Faloutsos
Mini intro to A.R. C. Faloutsos
Forecasting "Prediction is very difficult, especially about the future." - Nils Bohr http://www.hfac.uh.edu/MediaFutures/thoughts.html C. Faloutsos
Problem#1’: Forecast • Example: give xt-1, xt-2, …, forecast xt 90 80 70 60 Number of packets sent ?? 50 40 30 20 10 0 1 3 5 7 9 11 Time Tick C. Faloutsos
Linear Regression: idea 85 Body height 80 75 70 65 60 55 50 45 40 15 25 35 45 Body weight • express what we don’t know (= ‘dependent variable’) • as a linear function of what we know (= ‘indep. variable(s)’) C. Faloutsos
Linear Auto Regression: C. Faloutsos
90 80 70 ?? 60 50 40 30 20 10 0 1 3 5 7 9 11 Time Tick Problem#1’: Forecast • Solution: try to express xt as a linear function of the past: xt-2, xt-2, …, (up to a window of w) Formally: C. Faloutsos
Linear Auto Regression: 85 ‘lag-plot’ 80 75 70 65 Number of packets sent (t) 60 55 50 45 40 15 25 35 45 Number of packets sent (t-1) • lag w=1 • Dependent variable = # of packets sent (S[t]) • Independent variable = # of packets sent (S[t-1]) C. Faloutsos
More details: • Q1: Can it work with window w>1? • A1: YES! xt xt-1 xt-2 C. Faloutsos
More details: • Q1: Can it work with window w>1? • A1: YES! (we’ll fit a hyper-plane, then!) xt xt-1 xt-2 C. Faloutsos
More details: • Q1: Can it work with window w>1? • A1: YES! (we’ll fit a hyper-plane, then!) xt xt-1 xt-2 C. Faloutsos
Even more details • Q2: Can we estimate a incrementally? • A2: Yes, with the brilliant, classic method of ‘Recursive Least Squares’ (RLS) (see, e.g., [Chen+94], or [Yi+00], for details) • Q3: can we ‘down-weight’ older samples? • A3: yes (RLS does that easily!) C. Faloutsos
Mini intro to A.R. C. Faloutsos
goal: capture arbitrary periodicities with NO human intervention on a semi-infinite stream How to choose ‘w’? C. Faloutsos
Outline Problem definition - motivation Linear forecasting AR AWSOM Coevolving series - MUSCLES Fractal forecasting - F4 Other projects graph modeling, outliers etc C. Faloutsos
Problem: • in a train of spikes (128 ticks apart) • any AR with window w < 128 will fail What to do, then? C. Faloutsos
Answer (intuition) • Do a Wavelet transform (~ short window DFT) • look for patterns in every frequency C. Faloutsos
Intuition • Why NOT use the short window Fourier transform (SWFT)? • A: how short should be the window? freq time w’ C. Faloutsos
main idea: variable-length window! wavelets f t C. Faloutsos
Advantages of Wavelets • Better compression (better RMSE with same number of coefficients - used in JPEG-2000) • fast to compute (usually: O(n)!) • very good for ‘spikes’ • mammalian eye and ear: Gabor wavelets C. Faloutsos
f value t time Wavelets - intuition: • Q: baritone/silence/ soprano - DWT? C. Faloutsos
f value t time Wavelets - intuition: • Q: baritone/soprano - DWT? C. Faloutsos
W1,3 t W1,1 W1,4 W1,2 t t t t frequency W2,1 W2,2 = t t W3,1 t V4,1 t time AWSOM xt C. Faloutsos
W1,3 t W1,1 W1,4 W1,2 t t t t frequency W2,1 W2,2 t t W3,1 t V4,1 t time AWSOM xt C. Faloutsos
Wl,t-2 Wl,t-1 Wl,t Wl’,t’-2 Wl’,t’-1 AWSOM - idea Wl,t l,1Wl,t-1l,2Wl,t-2 … Wl’,t’ l’,1Wl’,t’-1l’,2Wl’,t’-2 … Wl’,t’ C. Faloutsos
More details… • Update of wavelet coefficients • Update of linear models • Feature selection • Not all correlations are significant • Throw away the insignificant ones (“noise”) (incremental) (incremental; RLS) (single-pass) C. Faloutsos
Results - Synthetic data AWSOM AR Seasonal AR • Triangle pulse • Mix (sine + square) • AR captures wrong trend (or none) • Seasonal AR estimation fails C. Faloutsos
Results - Real data • Automobile traffic • Daily periodicity • Bursty “noise” at smaller scales • AR fails to capture any trend • Seasonal AR estimation fails C. Faloutsos