160 likes | 254 Views
Ensembles of Nearest Neighbor Forecasts. Dragomir Yankov, Eamonn Keogh Dept. of Computer Science & Eng. University of California Riverside Dennis DeCoste Yahoo! Research. Outline. Problem formulation NN forecasting framework Stability of the forecasts Ensembles of NN forecasts
E N D
Ensembles of Nearest Neighbor Forecasts Dragomir Yankov, Eamonn Keogh Dept. of Computer Science & Eng. University of California Riverside Dennis DeCoste Yahoo! Research
Outline • Problem formulation • NN forecasting framework • Stability of the forecasts • Ensembles of NN forecasts • Experimental evaluation
Data specifics – many patterns present in the data Problem formulation • Predict the number of impressions to be observed for a specific website
Forecasting framework – formalization • Formalization • Direct forecasts: Given: a query , its k nearest neighbors Estimate: the query continuation • Other approaches: iterative forecasts, mutually validating forecasts
Forecasting framework – components • Similarity measure • Standardized Euclidean distance: where • Prediction accuracy • Prediction root mean square error: • Weighting function – uniform weights
Stability of the forecasts • Stability with respect to the training data • NN is stable in the case of classification and majority voting (Breiman ’96) • Here – extrapolation plus regression. Changing one neighbor can change the forecast significantly • Stability with respect to the input parameters • Parameters: k, weights of different neighbors, query length, prediction horizon • Different combinations lead to different forecasts
Ensembles of NN forecasts • Main idea: rather than tuning up the best parameters for the entire dataset, for each query select the model that will predict it best • Issues • What base models to use • How to select among them
Optimal Single Predictor Optimal Ensemble (Using Oracle) Ensembles of NN forecasts • Base models to use • We focus on pairs of NN learners, in which the base models differ in the number of neighbors used • The optimal single predictors and the suitable ensembles are determined on a validation set using an oracle
Ensembles of NN forecasts • Selecting among the base models: • Learn a classifier to select the more suitable model for individual queries (SVM with Gaussian kernel) Note: The classifier does not need to be perfect. It is important to identify the “bad” cases for each base learner
Ensembles of NN forecasts • Selecting among the base models: • Extracted features: • Statistics from the query and its nearest neighbors: Mean, Median, Variance, Amplitude • Statistics from the models’ forecasts: Mean, Median, Variance, Amplitude • Distances between the forecasts of the individual neighbors • Performance of the models on the query’s nearest neighbors • Step-back forecasts (good for short horizons)
Experimental evaluation • Website impressions
Experimental evaluation • Website impressions • Computing the optimal single predictors • Comparison with the accuracy of the ensemble approach
Experimental evaluation • Website impressions
Experimental evaluation • Bias-Variance improvement • We compute the bias2 and variance terms in the error decomposition for h=100 steps ahead • The statistics are recorded over 50 random subsamples from the original training set
Conclusions and future directions • The proposed technique improves significantly the prediction accuracy of the single NN forecasting models • It outlines a principled solution to the bias-variance problem of the NN forecasts • It is a data specific rather than a generic approach • Combining more models and varying other parameters would require selecting different features Thank you!