260 likes | 420 Views
Simple Bayesian Supervised Models. Saskia Klein & Steffen Bollmann. Content. Recap from last weak Bayesian Linear Regression What is linear regression? Application of the Bayesian Theory on Linear Regression Example Comparison to Conventional Linear Regression
E N D
Simple BayesianSupervisedModels Saskia Klein & Steffen Bollmann
Content • Recap from last weak • Bayesian Linear Regression • What is linear regression? • Application of the Bayesian Theory on Linear Regression • Example • Comparison to Conventional Linear Regression • Bayesian Logistic Regression • Naive Bayes classifier • Source: • Bishop (ch. 3,4); Barber (ch. 10) Saskia Klein & Steffen Bollmann
Maximum a posterior estimation • The bayesian approach to estimate parameters of the distribution given a set of observationsis to maximize posterior distribution. • It allows to account for the prior information. likelihood prior posterior evidence
Conjugate prior • In general, for a given probability distribution p(x|η), we can seek a prior p(η) that is conjugate to the likelihood function, so that the posterior distribution has the same functional form as the prior. • For any member of the exponential family, there exists a conjugate prior that can be written in the form • Important conjugate pairs include: Binomial – Beta Multinomial – Dirichlet Gaussian – Gaussian (for mean) Gaussian – Gamma (for precision) Exponential – Gamma
Linear Regression • goal: predict the value of a target variable given the value of a D-dimensional vector of input variables • linear regression models: linear functions of the adjustable parameters for example: Saskia Klein & Steffen Bollmann
Linear Regression • Training • … training data set comprising observations, where • … corresponding target values • compute the weights • Prediction • goal: predict the value of for a new value of • = model the predictive distribution • and make predictions of in such a way as to minimize the expected value of a loss function Saskia Klein & Steffen Bollmann
Examples of linear regression models • simplest linear regression model: • linear function of the weights/parameters and the data • linear regression models using basis functions : Saskia Klein & Steffen Bollmann
Bayesian Linear Regression • model: • … target variable • … model • … data • … weights/parameters • … additive Gaussian noise: with zero mean and precision (inverse variance) Saskia Klein & Steffen Bollmann
Maximum a posterior estimation • The bayesian approach to estimate parameters of the distribution given a set of observationsis to maximize posterior distribution. • It allows to account for the prior information. likelihood prior posterior evidence
Bayesian Linear Regression - Likelihood • likelihoodfunction: • observation of N training data sets of inputs and target values (independently drawn from the distribution) Saskia Klein & Steffen Bollmann
Maximum a posterior estimation • The bayesian approach to estimate parameters of the distribution given a set of observationsis to maximize posterior distribution. • It allows to account for the prior information. likelihood prior posterior evidence
Conjugate prior • In general, for a given probability distribution p(x|η), we can seek a prior p(η) that is conjugate to the likelihood function, so that the posterior distribution has the same functional form as the prior. • For any member of the exponential family, there exists a conjugate prior that can be written in the form • Important conjugate pairs include: Binomial – Beta Multinomial – Dirichlet Gaussian – Gaussian (for mean) Gaussian – Gamma (for precision) Exponential – Gamma
Bayesian Linear Regression - Prior • prior probability distribution over the model parameters • conjugate prior: Gaussian distribution • mean and covariance Saskia Klein & Steffen Bollmann
Maximum a posterior estimation • The bayesian approach to estimate parameters of the distribution given a set of observationsis to maximize posterior distribution. • It allows to account for the prior information. likelihood prior posterior evidence
Bayesian Linear Regression – Posterior Distribution • due to the conjugate prior, the posterior will also be Gaussian (derivation: Bishop p.112) Saskia Klein & Steffen Bollmann
Example Linear Regression • matlab Saskia Klein & Steffen Bollmann
Predictive Distribution • making predictionsoffornewvaluesof • predictivedistribution: • variance of the distribution: • first term represents the noise in the data • second term reflects the uncertainty associated with the parameters • optimal prediction, for a new value of , would be the conditional mean of the target variable: Saskia Klein & Steffen Bollmann
Common Problem in Linear Regression: Overfitting/modelcomplexitiy • Least Squares approach (maximizing the likelihood): • point estimate of the weights • Regularization: regularization term and value needs to be chosen • Cross-Validation: requires large datasets and high computational power • Bayesian approach: • distribution of the weights • good prior • model comparison: computationally demanding, validation data not required Saskia Klein & Steffen Bollmann
From Regression to Classification • for regression problems: • target variable was the vector of real numbers whose values we wish to predict • in case of classification: • target values represent class labels • two-class problem: • K > 2: class 2 Saskia Klein & Steffen Bollmann
Classification • goal: take an input vector and assign it to one of discrete classes decision boundary Saskia Klein & Steffen Bollmann
Bayesian Logistic Regression • model the class-conditional densities and the prior probabilities and apply Bayes Theorem: Saskia Klein & Steffen Bollmann
Bayesian Logistic Regression • exact Bayesian inference for logistic regression is intractable • Laplace approximation • aims to find a Gaussian approximation to a probability density defined over a set of continuous variables • posterior distribution is approximated around Saskia Klein & Steffen Bollmann
Example • Barber: DemosExercises\demoBayesLogRegression.m Saskia Klein & Steffen Bollmann
Example • Barber: DemosExercises\demoBayesLogRegression.m Saskia Klein & Steffen Bollmann
Naive Bayesclassifier • Why naive? • strong independence assumptions • assumes that the presence/absence of a feature of a class is unrelated to the presence/absence of any other feature, given the class variable • Ignores relation between features and assumes that all feature contribute independently to a class [http://en.wikipedia.org/wiki/Naive_Bayes_classifier] Saskia Klein & Steffen Bollmann
Thank you for your attention Saskia Klein & Steffen Bollmann