1 / 20

Pattern Recognition: Baysian Decision Theory

Pattern Recognition: Baysian Decision Theory. Charles Tappert Seidenberg School of CSIS, Pace University. Pattern Classification Most of the material in these slides was taken from the figures in Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2001.

mei
Download Presentation

Pattern Recognition: Baysian Decision Theory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pattern Recognition:Baysian Decision Theory Charles Tappert Seidenberg School of CSIS, Pace University

  2. Pattern ClassificationMost of the material in these slides was taken from the figures in Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2001

  3. Baysian Decision Theory • Fundamental pure statistical approach • Assumes relevant probabilities are known perfectly • Makes theoretically optimal decisions

  4. Baysian Decision Theory • Based on Bayes formula P(j| x) = p(x | j)P(j) / p(x) which is easily derived from writing the joint probability density two ways • P(j , x) = P(j|x)p(x) • P(j , x) = p(x|j)p(j) Note: uppercase P(.) denotes a probability mass function and lowercase p(.) a density function

  5. Bayes Formula • Bayes formula P(j| x) = p(x | j)P(j) / p(x) can be expressed informally in English as posterior = likelihood x prior / evidence and Bayes decision chooses the class j with the greatest posterior probability

  6. Bayes Formula • Bayes formula: P(j| x) = p(x | j)P(j) / p(x) • Bayes decision chooses class j with the greatest P(j| x) • Since p(x)is the same for all classes, greatest P(j| x) means greatest p(x | j)P(j) • Special case: if all classes are equally likely, i.e. same P(j), we get a further simplification – greatest P(j| x) is greatest likelihood p(x | j)

  7. Baysian Decision Theory • Now, let’s look at the fish example of two classes – sea bass and salmon – and one feature – lightness • Let p(x | 1) and p(x | 2) describe the difference in lightness between populations of sea bass and salmon (see next slide)

  8. Baysian Decision Theory • In the previous slide, if the two classes are equally likely, we get the simplification – greatest posterior means greatest likelihood,and Bayes decision is to choose class 1 when p(x | 1) > p(x | 2), i.e. when lightness is > approximately 12.4 • However, if the two classes are not equally likely, we get a case like the next slide

  9. Baysian Parameter Estimation • Because the actual probabilities are rarely known, they are usually estimated after assuming the form of the distributions • The usually assumed form of the distributions is multivariate normal

  10. Baysian Parameter Estimation • Assuming multivariate normal probability density functions, it is necessary to estimate for each pattern class • Feature means • Feature covariance matrices

  11. Multivariate Normal Densities • Simplifying assumptions can be made for multivariate normal density functions • Statistically independent features with equal variances yields hyperplane decision surfaces • Equal covariance matrices for each class also yields hyperplane decision surfaces • Arbitrary normal distributions yields hyperquadric decision surfaces

  12. Nonparametric Techniques • Probabilities are not known • Two approaches • Estimate the density functions from sample patterns • Bypass probability estimation entirely • Use a non-parametric method • Such as k-Nearest-Neighbor

  13. k-Nearest-Neighbor

  14. k-Nearest-Neighbor (k-NN) Method • Used where probabilities are not known • Bypasses probability estimation entirely • Easy to implement • Asymptotic error never worst than twice Baysian error • Computationally intense, therefore slow

  15. Simple PR System with k-NN • Good for feasibility studies – easy to implement • Typical procedural steps • Extract feature measurements • Normalize features to 0-1 range • Classify by k nearest neighbor • Using Euclidean distance

  16. Simple PR System with k-NN (cont):Two Modes of Operation • Leave-one-out procedure • One input file of training/test patterns • Repeatedly train on all samples except one which is left for testing • Good for feasibility study with little data • Train and test on separate files • One input file for training and one for testing • Good for measuring performance change when varying an independent variable (e.g., different keyboards for keystroke biometric)

  17. Simple PR System with k-NN (cont) • Used in keystroke biometric studies • Feasibility study – Dr. Mary Curtin • Different keyboards/modes – Dr. Mary Villani • Used in other studies that used keystroke data • Study of procedures for handling incomplete and missing data – e.g., fallback procedures in the keystroke biometric system – Dr. Mark Ritzmann • New kNN-ROC procedures – Dr. Robert Zack • Used in other biometric studies • Mouse movement – Larry Immohr • Stylometry + keystroke study – John Stewart

  18. Conclusions • Bayes decision method best if probabilities known • Bayes method okay if you are good with statistics and the form of the probability distributions can be assumed, especially if there is justification for simplifying assumptions like independent features • Otherwise, stay with easier to implement methods that provide reasonable results, like k-NN

More Related