1 / 22

Chapter 2 (part 3) Bayesian Decision Theory (Sections 2-6,2-9)

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000 with the permission of the authors and the publisher. Chapter 2 (part 3) Bayesian Decision Theory (Sections 2-6,2-9).

Download Presentation

Chapter 2 (part 3) Bayesian Decision Theory (Sections 2-6,2-9)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pattern ClassificationAll materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000with the permission of the authors and the publisher

  2. Chapter 2 (part 3)Bayesian Decision Theory (Sections 2-6,2-9) Discriminant Functions for the Normal Density Bayes Decision Theory – Discrete Features

  3. Discriminant Functions for the Normal Density • We saw that the minimum error-rate classification can be achieved by the discriminant function gi(x) = ln P(x | i) + ln P(i) • Case of multivariate normal Pattern Classification, Chapter 2 (Part 3)

  4. Case i = 2I(I stands for the identity matrix) • What does “i = 2I” say about the dimensions? • What about the variance of each dimension? Pattern Classification, Chapter 2 (Part 3)

  5. We can further simplify by recognizing that the quadratic term xtx implicit in the Euclidean norm is the same for all i. Pattern Classification, Chapter 2 (Part 3)

  6. A classifier that uses linear discriminant functions is called “a linear machine” • The decision surfaces for a linear machine are pieces of hyperplanes defined by: gi(x) = gj(x) The equation can be written as: wt(x-x0)=0 Pattern Classification, Chapter 2 (Part 3)

  7. The hyperplane separatingRiand Rj always orthogonal to the line linking the means! Pattern Classification, Chapter 2 (Part 3)

  8. Pattern Classification, Chapter 2 (Part 3)

  9. Pattern Classification, Chapter 2 (Part 3)

  10. Pattern Classification, Chapter 2 (Part 3)

  11. Case i =  (covariance of all classes are identical but arbitrary!)Hyperplane separating Ri and Rj (the hyperplane separating Ri and Rj is generally not orthogonal to the line between the means!) Pattern Classification, Chapter 2 (Part 3)

  12. Pattern Classification, Chapter 2 (Part 3)

  13. Pattern Classification, Chapter 2 (Part 3)

  14. Case i = arbitrary • The covariance matrices are different for each category The decision surfaces are hyperquadratics (Hyperquadrics are: hyperplanes, pairs of hyperplanes, hyperspheres, hyperellipsoids, hyperparaboloids, hyperhyperboloids) Pattern Classification, Chapter 2 (Part 3)

  15. Pattern Classification, Chapter 2 (Part 3)

  16. Pattern Classification, Chapter 2 (Part 3)

  17. Bayes Decision Theory – Discrete Features • Components of x are binary or integer valued, x can take only one of m discrete values v1, v2, …, vm  concerned with probabilities rather than probability densities in Bayes Formula: Pattern Classification, Chapter 2 (Part 3)

  18. Bayes Decision Theory – Discrete Features • Conditional risk is defined as before: R(a|x) • Approach is still to minimize risk: Pattern Classification, Chapter 2 (Part 3)

  19. Bayes Decision Theory – Discrete Features • Case of independent binary features in 2 category problem Let x = [x1, x2, …, xd ]twhere each xiis either 0 or 1, with probabilities: pi = P(xi = 1 | 1) qi = P(xi = 1 | 2) Pattern Classification, Chapter 2 (Part 3)

  20. Bayes Decision Theory – Discrete Features • Assuming conditional independence, P(x|wi) can be written as a product of component probabilities: Pattern Classification, Chapter 2 (Part 3)

  21. Bayes Decision Theory – Discrete Features • Taking our likelihood ratio Pattern Classification, Chapter 2 (Part 3)

  22. The discriminant function in this case is: Pattern Classification, Chapter 2 (Part 3)

More Related