Model Selection and Assessment Using Cross-indexing

Model Selection and Assessment Using Cross-indexing Juha Reunanen ABB, Web Imaging Systems, Finland

Model Selection Using Cross-Validation • Choose a search algorithm – for example: hill-climbing, grid search, genetic algorithm • Evaluate the models using cross-validation • Select the model that gives the best CV score

Multiple-Comparison Procedure (D. D. Jensen and P. R. Cohen: Multiple Comparisons in Induction Algorithms, Machine Learning, volume 38, pages 309–338, 2000) • Example: Choosing an investment advisor • Criterion: Predict stock market change (+/–) correctly for 11 out of 14 days • You evaluate 10 candidates • Your friend evaluates 30 candidates • If everyone is just guessing, your probability of accepting is 0.253, your friend’s 0.583

The Problem • Overfitting on the first level of inference:Increasing model complexity may decrease the training error while the test error goes up • Overfitting on the second level of inference:Making the search more intense may decrease the CV error estimate, even if the test error would actually go up

Overfitting Visualized Model Complexity, or Number of Models Evaluated

Solutions • First level of inference: • Regularization – penalize complex models • Model selection – welcome to the second level... • Second level of inference: • Regularization! (G. C. Cawley and N. L. C. Talbot: Preventing over-fitting during model selection via Bayesian regularisation of the hyper-parameters, Journal of Machine Learning Research, volume 8, pages 841-861, 2007) • Another layer of (cross-)validation...

Another Layer of Validation • A lot of variance: the estimate related to the winner gets biased (in the MCP sense) • Cross-validation makes it smoother, but does not remove the problem

The Cross-indexing Trick • Assume an outer loop of cross-validation using five folds • Use (for example) three folds to determine the best depth, and the rest two to assess it • This essentially removes the multiple-comparison effect • Revolve, and average (or, create an ensemble) • Previously shown to work in feature selection (Juha Reunanen: Less Biased Measurement of Feature Selection Benefits, SLSFS 2005, LNCS 3940, pages 198–208, 2006)

Competition Entries • Stochastic search guided by cross-validation • Several candidate models (and corresponding search processes running pseudo-parallel):Prepro+naiveBayes, PCA+kernelRidge, GS+kernelRidge, Prepro+linearSVC, Prepro+nonlinearSVC, Relief+neuralNet, RF, and Boosting (with neuralNet, SVC and kernelRidge) • Final selection and assessment using the cross-indexing criterion

Milestone Results Agnostic learning ranks as of December 1st, 2006 Yellow: CLOP model. CLOP prize winner: Juha Reunanen (both ave. rank and ave. BER). Best ave. BER held by Reference (Gavin Cawley) with “the bad”.

Models Selected

Conclusions • Because of multiple-comparison procedures (MCPs) on the different levels of inference, validation is often used to estimate final performance • On the second level, the cross-indexing trick may give estimates that are less biased (when comparing to straightforward outer-loop CV)

Model Selection and Assessment Using Cross-indexing