Simple Bayesian Supervised Models

Simple BayesianSupervisedModels Saskia Klein & Steffen Bollmann

Content • Recap from last weak • Bayesian Linear Regression • What is linear regression? • Application of the Bayesian Theory on Linear Regression • Example • Comparison to Conventional Linear Regression • Bayesian Logistic Regression • Naive Bayes classifier • Source: • Bishop (ch. 3,4); Barber (ch. 10) Saskia Klein & Steffen Bollmann

Maximum a posterior estimation • The bayesian approach to estimate parameters of the distribution given a set of observationsis to maximize posterior distribution. • It allows to account for the prior information. likelihood prior posterior evidence

Conjugate prior • In general, for a given probability distribution p(x|η), we can seek a prior p(η) that is conjugate to the likelihood function, so that the posterior distribution has the same functional form as the prior. • For any member of the exponential family, there exists a conjugate prior that can be written in the form • Important conjugate pairs include: Binomial – Beta Multinomial – Dirichlet Gaussian – Gaussian (for mean) Gaussian – Gamma (for precision) Exponential – Gamma

Linear Regression • goal: predict the value of a target variable given the value of a D-dimensional vector of input variables • linear regression models: linear functions of the adjustable parameters for example: Saskia Klein & Steffen Bollmann

Linear Regression • Training • … training data set comprising observations, where • … corresponding target values • compute the weights • Prediction • goal: predict the value of for a new value of • = model the predictive distribution • and make predictions of in such a way as to minimize the expected value of a loss function Saskia Klein & Steffen Bollmann

Examples of linear regression models • simplest linear regression model: • linear function of the weights/parameters and the data • linear regression models using basis functions : Saskia Klein & Steffen Bollmann

Bayesian Linear Regression • model: • … target variable • … model • … data • … weights/parameters • … additive Gaussian noise: with zero mean and precision (inverse variance) Saskia Klein & Steffen Bollmann

Bayesian Linear Regression - Likelihood • likelihoodfunction: • observation of N training data sets of inputs and target values (independently drawn from the distribution) Saskia Klein & Steffen Bollmann

Conjugate prior • In general, for a given probability distribution p(x|η), we can seek a prior p(η) that is conjugate to the likelihood function, so that the posterior distribution has the same functional form as the prior. • For any member of the exponential family, there exists a conjugate prior that can be written in the form • Important conjugate pairs include: Binomial – Beta Multinomial – Dirichlet Gaussian – Gaussian (for mean) Gaussian – Gamma (for precision) Exponential – Gamma

Bayesian Linear Regression - Prior • prior probability distribution over the model parameters • conjugate prior: Gaussian distribution • mean and covariance Saskia Klein & Steffen Bollmann

Bayesian Linear Regression – Posterior Distribution • due to the conjugate prior, the posterior will also be Gaussian (derivation: Bishop p.112) Saskia Klein & Steffen Bollmann

Example Linear Regression • matlab Saskia Klein & Steffen Bollmann

Predictive Distribution • making predictionsoffornewvaluesof • predictivedistribution: • variance of the distribution: • first term represents the noise in the data • second term reflects the uncertainty associated with the parameters • optimal prediction, for a new value of , would be the conditional mean of the target variable: Saskia Klein & Steffen Bollmann

Common Problem in Linear Regression: Overfitting/modelcomplexitiy • Least Squares approach (maximizing the likelihood): • point estimate of the weights • Regularization: regularization term and value needs to be chosen • Cross-Validation: requires large datasets and high computational power • Bayesian approach: • distribution of the weights • good prior • model comparison: computationally demanding, validation data not required Saskia Klein & Steffen Bollmann

From Regression to Classification • for regression problems: • target variable was the vector of real numbers whose values we wish to predict • in case of classification: • target values represent class labels • two-class problem: • K > 2: class 2 Saskia Klein & Steffen Bollmann

Classification • goal: take an input vector and assign it to one of discrete classes decision boundary Saskia Klein & Steffen Bollmann

Bayesian Logistic Regression • model the class-conditional densities and the prior probabilities and apply Bayes Theorem: Saskia Klein & Steffen Bollmann

Bayesian Logistic Regression • exact Bayesian inference for logistic regression is intractable • Laplace approximation • aims to find a Gaussian approximation to a probability density defined over a set of continuous variables • posterior distribution is approximated around Saskia Klein & Steffen Bollmann

Example • Barber: DemosExercises\demoBayesLogRegression.m Saskia Klein & Steffen Bollmann

Naive Bayesclassifier • Why naive? • strong independence assumptions • assumes that the presence/absence of a feature of a class is unrelated to the presence/absence of any other feature, given the class variable • Ignores relation between features and assumes that all feature contribute independently to a class [http://en.wikipedia.org/wiki/Naive_Bayes_classifier] Saskia Klein & Steffen Bollmann

Thank you for your attention  Saskia Klein & Steffen Bollmann

Simple Bayesian Supervised Models

Simple Bayesian Supervised Models

Presentation Transcript

Bayesian Decision Theory

Bayesian Classifiers and Software Sensors for Intrusion Detection Systems.

Graphical model software for machine learning

Lecture #4: Bayesian analysis of mapped data

On the Optimality of the Simple Bayesian Classifier under Zero-One Loss

Bayesian Learning for Conditional Models

Overview of Supervised Learning

Prototype-Driven Learning for Sequence Models

Bayesian calibration and comparison of process-based forest models

Bayesian models for fMRI data

Latent Variable and Structural Equation Models: Bayesian Perspectives and Implementation.

LING 696B: Graph-based methods and Supervised learning

Bayesian Inference

Basic Models in Theoretical Neuroscience

Bayesian Models

Stochastic Volatility Models: Bayesian Framework

Bayesian Learning for Conditional Models

Lecture IV: A Bayesian Viewpoint on Sparse Models

Image Stabilization by Bayesian Dynamics