250 likes | 309 Views
Explore the concept of Bayesian Decision Theory focusing on Normal Density, Covariance Matrix, Discriminant Functions, and Decision Surfaces in multivariate distributions. Understand how to optimize classification with minimum error rates.
E N D
Lecture 2.Bayesian Decision Theory Multivariate normal distribution Discriminant function for normal distributions Discriminant function for discrete distribution
Normal density Reminder: the covariance matrix is symmetric and positive semidefinite. Entropy - the measure of uncertainty Normal distribution has the maximum entropy over all distributions with a given mean and variance.
Normal density Let Σ be the covariance matrix, then it has k pairs of eigenvalues and eigenvectors. A can be decomposed as: Σ is positive semidefinite: …0 Zero is achieved when the data doesn’t occupy the entire k dimensional space.
Normal density Whitening transform:
Discriminant function Features -> discriminant functions gi(x), i=1,…,c Assign class i if gi(x) > gj(x) j i Decision surface defined by gi(x) = gj(x)
Normal density To make a minimum error rate classification (zero-one loss), we use discriminant functions: This is the log of the numerator in the Bayes formula. Log is used because we are only comparing the gi’s, and log is monotone. When normal density is assumed: We have:
Discriminant function for normal density i = 2I Linear discriminant function: Note: blue boxes – irrelevant terms.
Discriminant function for normal density The decision surface is where With equal prior, x0 is the middle point between the two means. The decision surface is a hyperplane,perpendicular to the line between the means.
Discriminant function for normal density “Linear machine”: dicision surfaces are hyperplanes.
Discriminant function for normal density With unequal prior probabilities, the decision boundary shifts to the less likely mean.
Discriminant function for normal density (2) i =
Discriminant function for normal density Set: The decision boundary is:
Discriminant function for normal density The hyperplane is generally not perpendicular to the line between the means.
Discriminant function for normal density (3) i is arbitrary Decision boundary is hyperquadrics (hyperplanes, pairs of hyperplanes, hyperspheres, hyperellipsoids, hyperparaboloids, hyperhyperboloids)
Discriminant function for normal density Decision boundary:
Discriminant function for normal density Extention to multi-class.
Discriminant function for discrete features Discrete features: x = [x1, x2, …, xd ]t, xi{0,1 } pi = P(xi = 1 | 1) qi = P(xi = 1 | 2) The likelihood will be:
Discriminant function for discrete features The discriminant function: The likelihood ratio:
Discriminant function for discrete features So the decision surface is again a hyperplane.