BAYES OPTIMAL CLASSIFIER

BAYES OPTIMAL CLASSIFIER Session 17 CO2

Motivation “what is the most probable hypothesis given the training data?” “what is the most probable classification of the new instance given the training data?”

Bayes Optimal Classifier In general, the most probable classification of the new instance is obtained by combining the predictions of all hypotheses, weighted by their posterior probabilities. If the possible classification of the new example can take on any value vjfrom some set V, then the probability P(vj|D) that the correct classification for the new instance is vj, is just

Bayes Optimal Classifier The optimal classification of the new instance is the value vj, for which P (vj|D) is maximum.

Example To develop some intuitions consider a hypothesis space containing three hypotheses, h1, h2, and h3. Suppose that the posterior probabilities of these hypotheses given the training data are 0.4, 0.3, and 0.3 respectively. Thus, h1 is the MAP hypothesis. Suppose a new instance x is encountered, which is classified positive by h1, but negative by h2and h3. Taking all hypotheses into account, the probability that xis positive is 0.4(the probability associated with hi), and the probability that it is negative is therefore 0.6. The most probable classification (negative) in this case is different from the classification generated by the MAP hypothesis.

Example To illustrate in terms of the above example, the set of possible classifications of the new instance is V = {⊕, ⊖}, and therefore,

Example Another example, After Production, an electrical circuit is given a quality score of A, B, C, or D. Over a certain period of time, 77% of the circuits were given a quality score A, 11% were given a B, 7% were given a C, and 5% were given a D. Furthermore, it was found that 2% of the circuits given a quality score A eventually failed, and the failure rate was 10% for circuits given a quality score B, 14% for circuits given a quality score C, and 25% for circuits given a quality score D. A new circuit designed will fail or pass if it is categorised as B.

Property One curious property of the Bayes optimal classifier is that the predictions it makes can correspond to a hypothesis not contained in H! Imagine using Equation of Bayes optimal classifier to classify every instance in X. The labeling of instances defined in this way need not correspond to the instance labeling of any single hypothesis h from H.

Demerit Although the Bayes optimal classifier obtains the best performance that can be achieved from the given training data, it can be quite costly to apply. The expense is due to the fact that it computes the posterior probability for every hypothesis in H and then combines the predictions of each hypothesis to classify each new instance.

BAYES OPTIMAL CLASSIFIER