1 / 12

Understanding Bayesian Model Comparison in Regression

Learn about Bayesian model comparison for regression from renowned author C.M. Bishop with relevant insights into maximizing evidence functions and limitations of fixed basis functions.

bclapp
Download Presentation

Understanding Bayesian Model Comparison in Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ch 3. Linear Models for Regression (2/2)Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Summarized by Yung-Kyun Noh Biointelligence Laboratory, Seoul National University http://bi.snu.ac.kr/

  2. Contents • 3.4 Bayesian Model Comparison • 3.5 The Evidence Approximation • 3.5.1 Evaluation of the evidence function • 3.5.2 Maximizing the evidence function • 3.5.3 Effective number of parameters • 3.6 Limitations of Fixed Basis Functions (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  3. Bayesian Model Comparison (1/3) • The problem of model selection from a Bayesian perspective • Over-fitting associated with maximum likelihood can be avoided by marginalizing over the model parameters instead of making point estimates of their values. • It also allow multiple complexity parameters to be determined simultaneously as part of the training process. (relevance vector machine) • The Bayesian view of model comparison simply involves the use of probabilities to represent uncertainty in the choice of model. • Posterior • : prior, a preference for different models. • : model evidence (marginal likelihood), the preference shown by the data for different models. Parameters have been marginalized out. (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  4. Bayesian Model Comparison (2/3) • Bayes factor: the ratio of model evidences for two models • Predictive distribution: mixture distribution. Averaging the predictive distribution weighted by the posterior probabilities. • Model evidence • Sampling perspective: Marginal likelihood can be viewed as the probability of generating the data set D from a model whose parameters are sampled at random from the prior. • Posterior distribution over parameters • Evidence is the normalizing term that appears in the denominator when evaluating the posterior distribution over parameters (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  5. Bayesian Model Comparison (3/3) • Assume that the posterior distribution is sharply peaked around the most probable value wMAP. • For a model having a set of M parameters, • A simple model has little variability and so will generate data sets that are fairly similar to each other. • A complex model spreads its predictive probability over too broad a range of data sets and so assigns relatively small probability to any one of them. (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  6. The Evidence Approximation (1/2) • Fully Bayesian treatment of linear basis function model • Hyperparameters: α, β. • Prediction: Marginalize w.r.t. hyperparameters as well as w. • Predictive distribution • If the posterior distribution is sharply peaked around values , the predictive distribution is obtained simply by marginalizing over w in which are fixed to the values . (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  7. The Evidence Approximation (2/2) • If the prior is relatively flat, • In the evidence framework the values of are obtained by maximizing the marginal likelihood function . • Hyperparameters can be determined from the training data alone from this method. (w/o recourse to cross-validation) • Recall that the ratio α/β is analogous to a regularization parameter. • Maximizing evidence • Set evidence function’s derivative equal to zero, re-estimate equations for α,β. • Use technique called the expectation maximization (EM) algorithm. (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  8. Evaluation of the Evidence Function • Marginal likelihood Model evidence (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  9. Maximizing the Evidence Function • Maximization of • Set derivative w.r.t α, β to zero. • w.r.t. α • ui and λi are eigenvector and eigenvalue described by • Maximizing hyperparameter • w.r.t. β (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  10. Effective Number of Parameters (1/2) • γ: effective total number of well determined parameters (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  11. Optimal α Log evidence Test err. Effective Number of Parameters (2/2) Optimal α (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  12. Limitations of Fixed Basis Functions • Models comprising a linear combination of fixed, nonlinear basis functions. • Have closed-form solutions to the least-squares problem. • Have a tractable Bayesian treatment. • The difficulty • The basis functions are fixed before the training data set is observed, and is a manifestation of the curse of dimensionality. • Properties of data sets to alleviate this problem • The data vectors {xn} typically lie close to a nonlinear manifold whose intrinsic dimensionality is smaller than that of the input space • Target variables may have significant dependence on only a small number of possible directions within the data manifold. (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

More Related