1 / 40

Time Series Forecasting With Feed-Forward Neural Networks: Guidelines And Limitations

Time Series Forecasting With Feed-Forward Neural Networks: Guidelines And Limitations. Eric Plummer Computer Science Department University of Wyoming March 13, 2014. Topics. Thesis Goals Time Series Forecasting Neural Networks K-Nearest-Neighbor Test-Bed Application Empirical Evaluation

kuper
Download Presentation

Time Series Forecasting With Feed-Forward Neural Networks: Guidelines And Limitations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Time Series Forecasting WithFeed-Forward Neural Networks:Guidelines And Limitations Eric Plummer Computer Science Department University of Wyoming March 13, 2014

  2. Topics • Thesis Goals • Time Series Forecasting • Neural Networks • K-Nearest-Neighbor • Test-Bed Application • Empirical Evaluation • Data Preprocessing • Contributions • Future Work • Conclusion • Demonstration Eric Plummer

  3. Thesis Goals • Compare neural networks and k-nearest-neighbor for time series forecasting • Analyze the response of various configurations to data series with specific characteristics • Identify when neural networks and k-nearest-neighbor are inadequate • Evaluate the effectiveness of data preprocessing Eric Plummer

  4. Time Series Forecasting –Description • What is it? • Given an existing data series, observe or model the data series to make accurate forecasts • Example data series • Financial (e.g., stocks, rates) • Physically observed (e.g., weather, sunspots) • Mathematical (e.g., Fibonacci sequence) Eric Plummer

  5. Time Series Forecasting –Difficulties • Why is it difficult? • Limited quantity of data • Observed data series sometimes too short to partition • Noise • Erroneous data points • Obscuring component • Moving Average • Nonstationarity • Fundamentals change over time • Nonstationary mean: “Ascending” data series • First-difference preprocessing • Forecasting method selection • Statistics • Artificial intelligence Eric Plummer

  6. Time Series Forecasting –Importance • Why is it important? • Preventing undesirable events by forecasting the event, identifying the circumstances preceding the event, and taking corrective action so the event can be avoided (e.g., inflationary economic period) • Forecasting undesirable, yet unavoidable, events to preemptively lessen their impact (e.g., solar maximum w/ sunspots) • Profiting from forecasting (e.g., financial markets) Eric Plummer

  7. Neural Networks – Background • Loosely based on the human brain’s neuron structure • Timeline • 1940’s – McCulloch and Pitts – proposed neuron models in the form of binary threshold devices and stochastic algorithms • 1950’s & 1960’s – Rosenblatt – class of learning machines called perceptrons • Late 1960’s – Minsky and Papert – discouraging analysis of perceptrons (linearly separable classes) • 1980’s – Rumelhart, Hinton, and Williams – generalized delta rule for learning by back-propagation for training multilayer perceptrons • Present – many new training algorithms and architectures, but nothing “revolutionary” Eric Plummer

  8. Neural Networks –Architecture • A feed-forward neural network can have any number of: • Layers • Units per layer • Network inputs • Network outputs • Hidden layers (A, B) • Output layer (C) Eric Plummer

  9. Neural Networks –Units • A unit has: • Connections • Weights • Bias • Activation function • Weights and bias are randomly initialized before training • Unit’s input consists of: • Sum of the products of each connection value and associated weight • Add the bias • Input is then fed into unit’s activation function • Unit’s output is the output of activation function • Hidden layers: Sigmoid • Output layer: Linear Eric Plummer

  10. Neural Networks –Training • Partition data series into: • Training set • Validation set (optional) • Test set (optional) • Typically, the training procedure is: • Perform backpropagation training with training set • After n epochs, compute total squared error on training set and validation set • If consistently validation error  and training error , stop training. • Overfitting: Training set learned too well • Generalization: Given inputs not in training and validation sets, able to accurately forecast Eric Plummer

  11. Backpropagation training: First, examples in the form of <input, output> pairs are extracted from the data series Then, the network is trained with backpropagation on the examples: Present an example’s input vector to the network inputs and run the network sequentially forward Propagate the error sequentially backward from the output layer For every connection, change the weight modifying that connection in proportion to the error When all three steps have been performed for all examples, one epoch has occurred Goal is to converge to a near-optimal solution based on the total squared error Neural Networks –Training Eric Plummer

  12. Neural Networks –Training Backpropagation training cycle Eric Plummer

  13. Neural Networks –Forecasting • Forecasting method depends on examples • Examples depend on step-ahead size If step-ahead size is one: Iterative forecasting If step-ahead size is greater than one: Direct forecasting Eric Plummer

  14. Neural Networks –Forecasting Iterative forecasting Can continue this indefinitely Eric Plummer

  15. Neural Networks –Forecasting Directly forecasting n steps This is the only forecast Eric Plummer

  16. K-Nearest-Neighbor –Forecasting • No model to train • Simple linear search • Compare reference to candidates • Select k candidates with lowest error • Forecast is average of k next values Eric Plummer

  17. Test-Bed Application –FORECASTER • Written in Visual C++ with MFC • Object-oriented • Multithreaded • Wizard-based • Easily modified • Implements feed-forward neural networks & k-nearest-neighbor • Used for time series forecasting • Eventually will be upgraded for classification problems Eric Plummer

  18. Empirical Evaluation – Data Series Less Noisy Original More Noisy Ascending Sunspots

  19. Empirical Evaluation –Neural Network Architectures • Number of network inputs based on data series • Need to make unambiguous examples • For “sawtooths”: • 24 inputs are necessary • Test networks with 25 & 35 inputs • Test networks with 1 hidden layer with 2, 10, & 20 hidden layer units • One output layer unit • For sunspots: • 30 inputs • 1 hidden layer with 30 units • For real-world data series, selection may be trial-and-error! Eric Plummer

  20. Heuristic method: Start with aggressive learning rate Gradually lower learning rate as validation error increases Stop training when learning rate cannot be lowered anymore Simple method: Use conservative learning rate Training stops when: Number of training epochs equals the epochs limit -or- Training error is less than or equal to error limit Empirical Evaluation –Neural Network Training Eric Plummer

  21. Empirical Evaluation –Neural Network Forecasting • Metric to compare forecasts: Coefficient of Determination • Value may be (-, 1] • Want value between 0 and 1, where 0 is forecasting the mean of the data series and 1 is forecasting the actual value • Must have actual values to compare with forecasted values • For networks trained on original, less noisy, and more noisy data series, forecast will be compared to original series • For networks trained on ascending data series, forecast will be compared to continuation of ascending series • For networks trained on sunspots data series, forecast will be compared to test set Eric Plummer

  22. Empirical Evaluation –K-Nearest-Neighbor • Choosing window size analogous to choosing number of neural network inputs • For sawtooth data series: • k = 2 • Test window sizes of 20, 24, and 30 • For sunspots data series: • k = 3 • Window size of 10 • Compare forecasts via coefficient of determination Eric Plummer

  23. Empirical Evaluation –Candidate Selection • Neural networks • For each training method, data series, and architecture, 3 candidates were trained • Also, average of 3 candidates’ forecasts was taken: forecasting by committee • Best forecast was selected based on coefficient of determination • K-nearest-neighbor • For each data series, k, and window size, only one search was performed (only one needed) Eric Plummer

  24. Empirical Evaluation – Original Data Series Heuristic NN Simple NN Smaller NN K-N-N

  25. Empirical Evaluation – Less Noisy Data Series Heuristic NN Simple NN K-N-N

  26. Empirical Evaluation – More Noisy Data Series Heuristic NN Simple NN K-N-N

  27. Empirical Evaluation – Ascending Data Series Heuristic NN Simple NN

  28. Empirical Evaluation – Longer Forecast Heuristic NN

  29. Empirical Evaluation – Sunspots Data Series Simple NN & K-N-N

  30. Empirical Evaluation –Discussion • Heuristic training method observations: • Networks train longer (more epochs) on smoother data series like the original and ascending data series • The total squared error and unscaled error are higher for noisy data series • Neither the number of epochs nor the errors appear to correlate well with the coefficient of determination • In most cases, the committee forecast is worse than the best candidate's forecast • When actual values are unavailable, choosing the best candidate is difficult! Eric Plummer

  31. Empirical Evaluation –Discussion • Simple training method observations: • The total squared error and unscaled error are higher for noisy data series with the exception of the 35:10:1 network trained on the more noisy data series • The errors do not appear to correlate well with the coefficient of determination • In most cases, the committee forecast is worse than the best candidate's forecast • There are four networks whose coefficient of determination is negative, compared with two for the heuristic training method Eric Plummer

  32. Empirical Evaluation –Discussion • General observations: • One training method did not appear to be clearly better • Increasingly noisy data series increasingly degraded the forecasting performance • Nonstationarity in the mean degraded the performance • Too few hidden units (e.g., 35:2:1) forecasted well on simpler data series, but failed for more complex ones • Excessive numbers of hidden units (e.g, 35:20:1) did not hurt performance • Twenty-five network inputs was not sufficient • K-nearest-neighbor was consistently better than the neural networks • Feed-forward neural networks are extremely sensitive to architecture and parameter choices, and making such choices is currently more art than science, more trial-and-error than absolute, more practice than theory! Eric Plummer

  33. Data Preprocessing • First-difference • For ascending data series, a neural network trained on first-difference can forecast near perfectly • In that case, it is better to train and forecast on first-difference • FORECASTER reconstitutes forecast from its first-difference • Moving average • For noisy data series, moving average would eliminate much of the noise • But would also smooth out peaks and valleys • Series may then be easier to learn and forecast • But in some series, the “noise” may be important data (e.g., utility load forecasting) Eric Plummer

  34. Contributions • Filled a void within feed-forward neural network time series forecasting literature: know how networks respond to various data series characteristics in a controlled environment • Showed that k-nearest-neighbor is a better forecasting method for the data series used in this research • Reaffirmed that neural networks are very sensitive to architecture, parameter, and learning method changes • Presented some insight into neural network architecture selection: selecting number of network inputs based on data series • Presented a neural network training heuristic that produced good results Eric Plummer

  35. Future Work • Upgrade FORECASTER to work with classification problems • Add more complex network types, including wavelet networks for time series forecasting • Investigate k-nearest-neighbor further • Add other forecasting methods, (e.g., decision trees for classification) Eric Plummer

  36. Conclusion • Presented: • Time series forecasting • Neural networks • K-nearest-neighbor • Empirical evaluation • Learned a lot about the implementation details of the forecasting techniques • Learned a lot about MFC programming Thank You Eric Plummer

  37. Demonstration Various files can be found at: http://w3.uwyo.edu/~eplummer Eric Plummer

  38. Unit Output, Error, and Weight Change Formulas

  39. Forecast Error Formulas

  40. Related Work • Drossu and Obradovic (1996): hybrid stochastic and neural network approach to time series forecasting • Zhang and Thearling (1994): parallel implementations of neural networks and memory-based reasoning • Geva (1998): multiscale fast wavelet transform and an array of feed-forward neural networks • Lawrence, Tsoi, and Giles (1996): encodes the series with a self-organizing map and uses recurrent neural networks • Kingdon (1997): automated intelligent system for financial forecasting and uses neural networks and genetic algorithms Eric Plummer

More Related