530 likes | 552 Views
Explore a cutting-edge method for large-scale automated forecasting using fractals. Discover how fractal dimensions provide unique insights for predicting future trends in various fields such as finance, healthcare, and environmental studies. The proposed methodology optimizes parameters to minimize forecasting errors, leading to more accurate predictions. See how this approach outperforms traditional forecasting techniques and leverages the intrinsic dimensionality of data. Dive into the theory, examples, and results to understand the impact of fractal forecasting on improving predictive accuracy and decision-making processes.
E N D
F4: Large Scale Automated Forecasting Using Fractals -Deepayan Chakrabarti -Christos Faloutsos CIKM 2002
Outline • Introduction/Motivation • Survey and Lag Plots • Exact Problem Formulation • Proposed Method • Fractal Dimensions Background • Our method • Results • Conclusions CIKM 2002
? General Problem Definition Value Time Given a time series {xt}, predict its future course, that is, xt+1, xt+2, ... CIKM 2002
Motivation Traditional fields • Financial data analysis • Physiological data, elderly care • Weather, environmental studies Sensor Networks(MEMS, “SmartDust”) • Long / “infinite” series • No human intervention “black box” CIKM 2002
Outline • Introduction/Motivation • Survey and Lag Plots • Exact Problem Formulation • Proposed Method • Fractal Dimensions Background • Our method • Results • Conclusions CIKM 2002
How to forecast? • ARIMA but linearity assumption • Neural Networks but large number of parameters and long training times [Wan/1993, Mozer/1993] • Hidden Markov Models O(N2) in number of nodes N; also fixing N is a problem [Ge+/2000] • Lag Plots CIKM 2002
Q0: Interpolation Method Q1: Lag = ? Q2: K = ? Interpolate these… To get the final prediction 4-NN New Point Lag Plots xt xt-1 CIKM 2002
Using SVD (state of the art) [Sauer/1993] xt Xt-1 Q0: Interpolation CIKM 2002
Why Lag Plots? • Based on the “Takens’ Theorem” [Takens/1981] • which says that delay vectors can be used for predictive purposes CIKM 2002
Extra Inside Theory Example: Lotka-Volterra equations ΔH/Δt = rH – aH*P ΔP/Δt = bH*P – mP H is density of preyP is density of predators Suppose only H(t) is observed. Internal state is (H,P). CIKM 2002
Outline • Introduction/Motivation • Survey and Lag Plots • Exact Problem Formulation • Proposed Method • Fractal Dimensions Background • Our method • Results • Conclusions CIKM 2002
Problem at hand • Given {x1, x2, …, xN} • Automatically set parameters - L(opt) (from Q1) - k(opt) (from Q2) • in Linear time on N • to minimise Normalized Mean Squared Error (NMSE) of forecasting CIKM 2002
Previous work/Alternatives • Manual Setting : BUT infeasible [Sauer/1992] • CrossValidation : BUT Slow; leave-one-out crossvalidation ~ O(N2logN) or more • “False Nearest Neighbors” : BUT Unstable [Abarbanel/1996] CIKM 2002
Outline • Introduction/Motivation • Survey and Lag Plots • Exact Problem Formulation • Proposed Method • Fractal Dimensions Background • Our method • Results • Conclusions CIKM 2002
X(t) Intrinsic Dimensionality ≈ Degrees of Freedom ≈ Information about Xt given Xt-1 X(t-1) Intuition x(t) time The Logistic Parabola xt = axt-1(1-xt-1) + noise CIKM 2002
x(t) x(t-1) x(t-2) x(t) x(t) x(t-1) x(t-1) x(t-2) x(t-2) Intuition x(t) x(t-1) CIKM 2002
Intuition • To find L(opt): • Go further back in time (ie., consider Xt-2, Xt-3 and so on) • Till there is no more information gained about Xt CIKM 2002
Outline • Introduction/Motivation • Survey and Lag Plots • Exact Problem Formulation • Proposed Method • Fractal Dimensions Background • Our method • Results • Conclusions CIKM 2002
Fractal Dimensions • FD = intrinsic dimensionality “Embedding” dimensionality = 3 Intrinsic dimensionality = 1 CIKM 2002
Fractal Dimensions FD = intrinsic dimensionality [Belussi/1995] log( # pairs) • Points to note: • FD can be a non-integer • There are fast methods to compute it CIKM 2002 log(r)
Outline • Introduction/Motivation • Survey and Lag Plots • Exact Problem Formulation • Proposed Method • Fractal Dimensions Background • Our method • Results • Conclusions CIKM 2002
epsilon f L(opt) Q1: Finding L(opt) • Use Fractal Dimensions to find the optimal lag length L(opt) Fractal Dimension Lag (L) CIKM 2002
Q2: Finding k(opt) • To find k(opt) • Conjecture: k(opt) ~ O(f) We choose k(opt) = 2*f + 1 CIKM 2002
Outline • Introduction/Motivation • Survey and Lag Plots • Exact Problem Formulation • Proposed Method • Fractal Dimensions Background • Our method • Results • Conclusions CIKM 2002
Value Datasets • Logistic Parabola: xt = axt-1(1-xt-1) + noise Models population of flies [R. May/1976] Time CIKM 2002
Value Datasets • Logistic Parabola: xt = axt-1(1-xt-1) + noise Models population of flies [R. May/1976] Time • LORENZ: Models convection currents in the air CIKM 2002
Value Datasets • Logistic Parabola: xt = axt-1(1-xt-1) + noise Models population of flies [R. May/1976] Error NMSE = ∑(predicted-true)2/σ2 Time • LORENZ: Models convection currents in the air • LASER: fluctuations in a Laser over time (from the Santa Fe Time Series Competition, 1992) CIKM 2002
Value Timesteps FD Logistic Parabola Lag • FD vs L plot flattens out • L(opt) = 1 CIKM 2002
Logistic Parabola Our Prediction from here Value Timesteps CIKM 2002
Value Logistic Parabola Comparison of prediction to correct values Timesteps CIKM 2002
Logistic Parabola FD Our L(opt) = 1, which exactly minimizes NMSE NMSE CIKM 2002 Lag
FD Value Timesteps LORENZ Lag • L(opt) = 5 CIKM 2002
LORENZ Our Prediction from here Value Timesteps CIKM 2002
LORENZ Value Comparison of prediction to correct values Timesteps CIKM 2002
LORENZ FD L(opt) = 5 Also NMSE is optimal at Lag = 5 NMSE CIKM 2002 Lag
FD Laser Value Lag • L(opt) = 7 Timesteps CIKM 2002
Laser Our Prediction starts here Value Timesteps CIKM 2002
Laser Value Comparison of prediction to correct values Timesteps CIKM 2002
FD Laser L(opt) = 7 Corresponding NMSE is close to optimal NMSE CIKM 2002 Lag
Speed and Scalability • Preprocessing is linear in N • Proportional to time taken to calculate FD CIKM 2002
Outline • Introduction/Motivation • Survey and Lag Plots • Exact Problem Formulation • Proposed Method • Fractal Dimensions Background • Our method • Results • Conclusions CIKM 2002
Conclusions Our Method: • Automatically set parameters • L(opt) (answers Q1) • k(opt) (answers Q2) • In linear time on N CIKM 2002
Conclusions • Black-box non-linear time series forecasting • Fractal Dimensions give a fast, automated method to set all parameters • So, given any time series, we can automatically build a prediction system • Useful in a sensor network setting CIKM 2002
Extra Snapshot http://snapdragon.cald.cs.cmu.edu/TSP CIKM 2002
Extra Future Work • Feature Selection • Multi-sequence prediction CIKM 2002
Extra Discussion – Some other problems How to forecast? Given: • x1, x2, …, xN • L(opt) • k(opt) How to find the k(opt) nearest neighbors quickly? CIKM 2002
Extra Motivation • Forecasting also allows us to • Find outliers anything that doesn’t match our prediction! • Find patterns if different circumstances lead to similar predictions, they may be related. CIKM 2002
Extra Motivation (Examples) Traditional • EEGs : Patterns of electromagnetic impulses in the brain • Intensity variations of white dwarf stars • Highway usage over time Sensors • “Active Disks” for forecasting / prefetching / buffering • “Smart House” sensors monitor situation in a house • Volcano monitoring CIKM 2002
Extra • Store all the delay vectors {xt-1, …, xt-L(opt)} and corresponding prediction xt • Find the latest delay vector xt • Find nearest neighbors Interpolate • Interpolate Xt-1 General Method L(opt) = ? K(opt) = ? CIKM 2002
Extra Intuition Fractal dimension • The FD vs L plot does flatten out • L(opt) = 1 CIKM 2002 Lag