Ensembles of Nearest Neighbor Forecasts

Ensembles of Nearest Neighbor Forecasts Dragomir Yankov, Eamonn Keogh Dept. of Computer Science & Eng. University of California Riverside Dennis DeCoste Yahoo! Research

Outline • Problem formulation • NN forecasting framework • Stability of the forecasts • Ensembles of NN forecasts • Experimental evaluation

Data specifics – many patterns present in the data Problem formulation • Predict the number of impressions to be observed for a specific website

Forecasting framework – overview

Forecasting framework – formalization • Formalization • Direct forecasts: Given: a query , its k nearest neighbors Estimate: the query continuation • Other approaches: iterative forecasts, mutually validating forecasts

Forecasting framework – components • Similarity measure • Standardized Euclidean distance: where • Prediction accuracy • Prediction root mean square error: • Weighting function – uniform weights

Stability of the forecasts • Stability with respect to the training data • NN is stable in the case of classification and majority voting (Breiman ’96) • Here – extrapolation plus regression. Changing one neighbor can change the forecast significantly • Stability with respect to the input parameters • Parameters: k, weights of different neighbors, query length, prediction horizon • Different combinations lead to different forecasts

Ensembles of NN forecasts • Main idea: rather than tuning up the best parameters for the entire dataset, for each query select the model that will predict it best • Issues • What base models to use • How to select among them

Optimal Single Predictor Optimal Ensemble (Using Oracle) Ensembles of NN forecasts • Base models to use • We focus on pairs of NN learners, in which the base models differ in the number of neighbors used • The optimal single predictors and the suitable ensembles are determined on a validation set using an oracle

Ensembles of NN forecasts • Selecting among the base models: • Learn a classifier to select the more suitable model for individual queries (SVM with Gaussian kernel) Note: The classifier does not need to be perfect. It is important to identify the “bad” cases for each base learner

Ensembles of NN forecasts • Selecting among the base models: • Extracted features: • Statistics from the query and its nearest neighbors: Mean, Median, Variance, Amplitude • Statistics from the models’ forecasts: Mean, Median, Variance, Amplitude • Distances between the forecasts of the individual neighbors • Performance of the models on the query’s nearest neighbors • Step-back forecasts (good for short horizons)

Experimental evaluation • Website impressions

Experimental evaluation • Website impressions • Computing the optimal single predictors • Comparison with the accuracy of the ensemble approach

Experimental evaluation • Website impressions

Experimental evaluation • Bias-Variance improvement • We compute the bias2 and variance terms in the error decomposition for h=100 steps ahead • The statistics are recorded over 50 random subsamples from the original training set

Conclusions and future directions • The proposed technique improves significantly the prediction accuracy of the single NN forecasting models • It outlines a principled solution to the bias-variance problem of the NN forecasts • It is a data specific rather than a generic approach • Combining more models and varying other parameters would require selecting different features Thank you!

Ensembles of Nearest Neighbor Forecasts

Ensembles of Nearest Neighbor Forecasts

Presentation Transcript

K-nearest neighbor methods

K-Nearest Neighbor Learning

Nearest Neighbor Classifiers

Reverse Nearest Neighbor Aggregates

Nearest-Neighbor Classifiers

Nearest Neighbor

Nearest neighbor matching

Nearest-Neighbor Classifiers

Optimized Nearest Neighbor Methods

Continuous Reverse Nearest Neighbor Monitoring

Classification Nearest Neighbor

Nearest-neighbor model for Tm

Nearest Neighbor and Reverse Nearest Neighbor Queries for Moving Objects

The Nearest-Neighbor Classifier

Nearest Neighbor

K nearest neighbor

Exact Nearest Neighbor Algorithms

K-Nearest Neighbor

K-Nearest Neighbor Learning

Classification Nearest Neighbor

Learning: Nearest Neighbor

Nearest Neighbor Classifier