1 / 26

Simple   Bayesian Supervised Models

Simple   Bayesian Supervised Models. Saskia Klein & Steffen Bollmann. Content. Recap from last weak Bayesian  Linear  Regression What is linear regression? Application of the Bayesian Theory on Linear Regression Example Comparison to Conventional Linear Regression

ojal
Download Presentation

Simple   Bayesian Supervised Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Simple  BayesianSupervisedModels Saskia Klein & Steffen Bollmann

  2. Content • Recap from last weak • Bayesian  Linear  Regression • What is linear regression? • Application of the Bayesian Theory on Linear Regression • Example • Comparison to Conventional Linear Regression • Bayesian  Logistic  Regression • Naive  Bayes  classifier • Source:   • Bishop  (ch.  3,4); Barber (ch. 10) Saskia Klein & Steffen Bollmann

  3. Maximum a posterior estimation • The bayesian approach to estimate parameters of the distribution given a set of observationsis to maximize posterior distribution. • It allows to account for the prior information. likelihood prior posterior evidence

  4. Conjugate prior • In general, for a given probability distribution p(x|η), we can seek a prior p(η) that is conjugate to the likelihood function, so that the posterior distribution has the same functional form as the prior. • For any member of the exponential family, there exists a conjugate prior that can be written in the form • Important conjugate pairs include: Binomial – Beta Multinomial – Dirichlet Gaussian – Gaussian (for mean) Gaussian – Gamma (for precision) Exponential – Gamma

  5. Linear Regression • goal: predict the value of a target variable given the value of a D-dimensional vector of input variables • linear regression models: linear functions of the adjustable parameters for example: Saskia Klein & Steffen Bollmann

  6. Linear Regression • Training • … training data set comprising observations, where • … corresponding target values • compute the weights • Prediction • goal: predict the value of for a new value of • = model the predictive distribution • and make predictions of in such a way as to minimize the expected value of a loss function Saskia Klein & Steffen Bollmann

  7. Examples of linear regression models • simplest linear regression model: • linear function of the weights/parameters and the data • linear regression models using basis functions : Saskia Klein & Steffen Bollmann

  8. Bayesian Linear Regression • model: • … target variable • … model • … data • … weights/parameters • … additive Gaussian noise: with zero mean and precision (inverse variance) Saskia Klein & Steffen Bollmann

  9. Maximum a posterior estimation • The bayesian approach to estimate parameters of the distribution given a set of observationsis to maximize posterior distribution. • It allows to account for the prior information. likelihood prior posterior evidence

  10. Bayesian Linear Regression - Likelihood • likelihoodfunction: • observation of N training data sets of inputs and target values (independently drawn from the distribution) Saskia Klein & Steffen Bollmann

  11. Maximum a posterior estimation • The bayesian approach to estimate parameters of the distribution given a set of observationsis to maximize posterior distribution. • It allows to account for the prior information. likelihood prior posterior evidence

  12. Conjugate prior • In general, for a given probability distribution p(x|η), we can seek a prior p(η) that is conjugate to the likelihood function, so that the posterior distribution has the same functional form as the prior. • For any member of the exponential family, there exists a conjugate prior that can be written in the form • Important conjugate pairs include: Binomial – Beta Multinomial – Dirichlet Gaussian – Gaussian (for mean) Gaussian – Gamma (for precision) Exponential – Gamma

  13. Bayesian Linear Regression - Prior • prior probability distribution over the model parameters • conjugate prior: Gaussian distribution • mean and covariance Saskia Klein & Steffen Bollmann

  14. Maximum a posterior estimation • The bayesian approach to estimate parameters of the distribution given a set of observationsis to maximize posterior distribution. • It allows to account for the prior information. likelihood prior posterior evidence

  15. Bayesian Linear Regression – Posterior Distribution • due to the conjugate prior, the posterior will also be Gaussian (derivation: Bishop p.112) Saskia Klein & Steffen Bollmann

  16. Example Linear Regression • matlab Saskia Klein & Steffen Bollmann

  17. Predictive Distribution • making predictionsoffornewvaluesof • predictivedistribution: • variance of the distribution: • first term represents the noise in the data • second term reflects the uncertainty associated with the parameters • optimal prediction, for a new value of , would be the conditional mean of the target variable: Saskia Klein & Steffen Bollmann

  18. Common Problem in Linear Regression: Overfitting/modelcomplexitiy • Least Squares approach (maximizing the likelihood): • point estimate of the weights • Regularization: regularization term and value needs to be chosen • Cross-Validation: requires large datasets and high computational power • Bayesian approach: • distribution of the weights • good prior • model comparison: computationally demanding, validation data not required Saskia Klein & Steffen Bollmann

  19. From Regression to Classification • for regression problems: • target variable was the vector of real numbers whose values we wish to predict • in case of classification: • target values represent class labels • two-class problem: • K > 2: class 2 Saskia Klein & Steffen Bollmann

  20. Classification • goal: take an input vector and assign it to one of discrete classes decision boundary Saskia Klein & Steffen Bollmann

  21. Bayesian Logistic Regression • model the class-conditional densities and the prior probabilities and apply Bayes Theorem: Saskia Klein & Steffen Bollmann

  22. Bayesian Logistic Regression • exact Bayesian inference for logistic regression is intractable • Laplace approximation • aims to find a Gaussian approximation to a probability density defined over a set of continuous variables • posterior distribution is approximated around Saskia Klein & Steffen Bollmann

  23. Example • Barber: DemosExercises\demoBayesLogRegression.m Saskia Klein & Steffen Bollmann

  24. Example • Barber: DemosExercises\demoBayesLogRegression.m Saskia Klein & Steffen Bollmann

  25. Naive  Bayesclassifier • Why naive? • strong independence assumptions • assumes that the presence/absence of a feature of a class is unrelated to the presence/absence of any other feature, given the class variable • Ignores relation between features and assumes that all feature contribute independently to a class [http://en.wikipedia.org/wiki/Naive_Bayes_classifier] Saskia Klein & Steffen Bollmann

  26. Thank you for your attention  Saskia Klein & Steffen Bollmann

More Related