1.03k likes | 1.78k Views
Bayesian Decision Theory (Classification). 主講人:虞台文. Contents. Introduction Generalize Bayesian Decision Rule Discriminant Functions The Normal Distribution Discriminant Functions for the Normal Populations. Minimax Criterion Neyman-Pearson Criterion.
E N D
Contents • Introduction • Generalize Bayesian Decision Rule • Discriminant Functions • The Normal Distribution • Discriminant Functions for the Normal Populations. • Minimax Criterion • Neyman-Pearson Criterion
Bayesian Decision Theory(Classification) Introduction
What is Bayesian Decision Theory? • Mathematical foundation for decision making. • Using probabilistic approach to help making decision (e.g., classification) so as to minimize the risk (cost).
Preliminaries and Notations a state of nature prior probability feature vector class-conditional density posterior probability
Decision unimportant in making decision
Decision Decide i if P(i|x) > P(j|x) j i Decide i if p(x|i)P(i) > p(x|j)P(j) j i • Special cases: • P(1)=P(2)= =P(c) • p(x|1)=p(x|2) = = p(x|c)
Decide i if P(i|x) > P(j|x) j i Decide i if p(x|i)P(i) > p(x|j)P(j) j i Two Categories Decide 1 if P(1|x) > P(2|x); otherwise decide 2 Decide 1 if p(x|1)P(1) > p(x|2)P(2); otherwise decide 2 • Special cases: • P(1)=P(2) • Decide 1 if p(x|1)> p(x|2); otherwise decide 1 • 2. p(x|1)=p(x|2) • Decide 1 if P(1) > P(2); otherwise decide 2
Example R2 R1 P(1)=P(2)
R2 R1 R2 R1 Example P(1)=2/3 P(2)=1/3 Decide 1 if p(x|1)P(1) > p(x|2)P(2); otherwise decide 2
Classification Error Consider two categories: Decide 1 if P(1|x) > P(2|x); otherwise decide 2
Classification Error Consider two categories: Decide 1 if P(1|x) > P(2|x); otherwise decide 2
Bayesian Decision Theory(Classification) Generalized Bayesian Decision Rule
The Generation a set of c states of nature a set of a possible actions The loss incurred for taking action i when the true state of nature is j. Risk can be zero. We want to minimize the expected loss in making decision.
Conditional Risk Given x, the expected loss (risk) associated with taking action i.
Decision Bayesian Decision Rule:
Overall Risk Decision function • Bayesian decision rule: • the optimal one to minimize the overall risk • Its resulting overall risk is called the Bayesian risk
State of Nature Loss Function Action Two-Category Classification
Two-Category Classification Perform 1 if R(2|x) > R(1|x); otherwise perform 2
Two-Category Classification Perform 1 if R(2|x) > R(1|x); otherwise perform 2 positive positive Posterior probabilities are scaled before comparison.
irrelevant Two-Category Classification Perform 1 if R(2|x) > R(1|x); otherwise perform 2
This slide will be recalled later. Two-Category Classification Threshold Likelihood Ratio Perform 1 if
Bayesian Decision Theory(Classification) Discriminant Functions
Action (e.g., classification) x How to define discriminant functions? The Multicategory Classification gi(x)’s are called the discriminant functions. g1(x) (x) g2(x) gc(x) Assign x to i if gi(x) > gj(x) for all j i.
If f(.) is a monotonically increasing function, than f(gi(.) )’s are also be discriminant functions. Simple Discriminant Functions Minimum Risk case: Minimum Error-Rate case:
Decision Regions Two-category example Decision regions are separated by decision boundaries.
Bayesian Decision Theory(Classification) The Normal Distribution
Basics of Probability Discrete random variable (X) -Assume integer Probability mass function (pmf): Cumulative distribution function (cdf): Continuous random variable (X) not a probability Probability density function (pdf): Cumulative distribution function (cdf):
Expectations Let g be a function of random variable X. The kth moment The 1st moment The kth central moments
Fact: Important Expectations Mean Variance
Entropy The entropy measures the fundamental uncertainty in the value of points selected randomly from a distribution.
p(x) μ x σ σ 2σ 2σ 3σ 3σ • Properties: • Maximize the entropy • Central limit theorem Univariate Gaussian Distribution X~N(μ,σ2) E[X] =μ Var[X] =σ2
Random Vectors A d-dimensional random vector VectorMean: Covariance Matrix:
Multivariate Gaussian Distribution X~N(μ,Σ) A d-dimensional random vector E[X] =μ E[(X-μ) (X-μ)T] =Σ
Properties of N(μ,Σ) X~N(μ,Σ) A d-dimensional random vector Let Y=ATX, where A is a d × k matrix. Y~N(ATμ, ATΣA)
Properties of N(μ,Σ) X~N(μ,Σ) A d-dimensional random vector Let Y=ATX, where A is a d × k matrix. Y~N(ATμ, ATΣA)
On Parameters of N(μ,Σ) X~N(μ,Σ)
More On Covariance Matrix is symmetric and positive semidefinite. : orthonormal matrix, whose columns are eigenvectors of . : diagonal matrix (eigenvalues).
Whitening Transform X~N(μ,Σ) Y=ATX Y~N(ATμ, ATΣA) Let
Whitening Transform Whitening X~N(μ,Σ) Linear Transform Y=ATX Y~N(ATμ, ATΣA) Let Projection
Mahalanobis Distance X~N(μ,Σ) r2 constant depends on the value of r2
Mahalanobis Distance X~N(μ,Σ) r2 constant depends on the value of r2
Bayesian Decision Theory(Classification) Discriminant Functions for the Normal Populations
Minimum-Error-Rate Classification Xi~N(μi,Σi)
Minimum-Error-Rate Classification Three Cases: Case 1: Classes are centered at different mean, and their feature components are pairwisely independent have the same variance. Case 2: Classes are centered at different mean, but have the same variation. Case 3: Arbitrary.
Case 1. i = 2I irrelevant irrelevant
Case 1. i = 2I Boundary btw. i and j