1 / 16

Ensembles of Nearest Neighbor Forecasts

Ensembles of Nearest Neighbor Forecasts. Dragomir Yankov, Eamonn Keogh Dept. of Computer Science & Eng. University of California Riverside Dennis DeCoste Yahoo! Research. Outline. Problem formulation NN forecasting framework Stability of the forecasts Ensembles of NN forecasts

julius
Download Presentation

Ensembles of Nearest Neighbor Forecasts

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ensembles of Nearest Neighbor Forecasts Dragomir Yankov, Eamonn Keogh Dept. of Computer Science & Eng. University of California Riverside Dennis DeCoste Yahoo! Research

  2. Outline • Problem formulation • NN forecasting framework • Stability of the forecasts • Ensembles of NN forecasts • Experimental evaluation

  3. Data specifics – many patterns present in the data Problem formulation • Predict the number of impressions to be observed for a specific website

  4. Forecasting framework – overview

  5. Forecasting framework – formalization • Formalization • Direct forecasts: Given: a query , its k nearest neighbors Estimate: the query continuation • Other approaches: iterative forecasts, mutually validating forecasts

  6. Forecasting framework – components • Similarity measure • Standardized Euclidean distance: where • Prediction accuracy • Prediction root mean square error: • Weighting function – uniform weights

  7. Stability of the forecasts • Stability with respect to the training data • NN is stable in the case of classification and majority voting (Breiman ’96) • Here – extrapolation plus regression. Changing one neighbor can change the forecast significantly • Stability with respect to the input parameters • Parameters: k, weights of different neighbors, query length, prediction horizon • Different combinations lead to different forecasts

  8. Ensembles of NN forecasts • Main idea: rather than tuning up the best parameters for the entire dataset, for each query select the model that will predict it best • Issues • What base models to use • How to select among them

  9. Optimal Single Predictor Optimal Ensemble (Using Oracle) Ensembles of NN forecasts • Base models to use • We focus on pairs of NN learners, in which the base models differ in the number of neighbors used • The optimal single predictors and the suitable ensembles are determined on a validation set using an oracle

  10. Ensembles of NN forecasts • Selecting among the base models: • Learn a classifier to select the more suitable model for individual queries (SVM with Gaussian kernel) Note: The classifier does not need to be perfect. It is important to identify the “bad” cases for each base learner

  11. Ensembles of NN forecasts • Selecting among the base models: • Extracted features: • Statistics from the query and its nearest neighbors: Mean, Median, Variance, Amplitude • Statistics from the models’ forecasts: Mean, Median, Variance, Amplitude • Distances between the forecasts of the individual neighbors • Performance of the models on the query’s nearest neighbors • Step-back forecasts (good for short horizons)

  12. Experimental evaluation • Website impressions

  13. Experimental evaluation • Website impressions • Computing the optimal single predictors • Comparison with the accuracy of the ensemble approach

  14. Experimental evaluation • Website impressions

  15. Experimental evaluation • Bias-Variance improvement • We compute the bias2 and variance terms in the error decomposition for h=100 steps ahead • The statistics are recorded over 50 random subsamples from the original training set

  16. Conclusions and future directions • The proposed technique improves significantly the prediction accuracy of the single NN forecasting models • It outlines a principled solution to the bias-variance problem of the NN forecasts • It is a data specific rather than a generic approach • Combining more models and varying other parameters would require selecting different features Thank you!

More Related