1 / 49

Pattern Recognition

Pattern Recognition. Probability Review. Why Bother About Probabilities?. Accounting for uncertainty is a crucial component in decision making (e.g., classification) because of ambiguity in our measurements. Probability theory is the proper mechanism for accounting for uncertainty .

edan-suarez
Download Presentation

Pattern Recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pattern Recognition Probability Review

  2. Why Bother About Probabilities? • Accounting for uncertainty is a crucial component in decision making (e.g., classification) because of ambiguity in our measurements. • Probability theory is the proper mechanism for accounting for uncertainty. • Need to take into account reasonable preferences about the state of the world, for example: "If the fish was caught in the Atlantic ocean, then it is more likely to be salmon than sea-bass

  3. Definitions • Random experiment • An experiment whose result is not certain in advance (e.g., throwing a die) • Outcome • The result of a random experiment • Sample space • The set of all possible outcomes (e.g., {1,2,3,4,5,6}) • Event • A subset of the sample space (e.g., obtain an odd number in the experiment of throwing a die = {1,3,5})

  4. Intuitive Formulation • Intuitively, the probability of an event a could be defined as: • Assumes that all outcomes in the sample space are equally likely (Laplacian definition) Where N(a) is the number that event a happens in n trials

  5. Axioms of Probability S A1 A3 A2 A4 B

  6. Prior (Unconditional) Probability • This is the probability of an event prior to arrival of any evidence. P(Cavity)=0.1 means that “in the absence of any other information, there is a 10% chance that the patient is having a cavity”.

  7. Posterior (Conditional) Probability • This is the probability of an event given some evidence. P(Cavity/Toothache)=0.8 means that “there is an 80% chance that the patient is having a cavity given that he is having a toothache”

  8. Posterior (Conditional) Probability (cont’d) • Conditional probabilities can be defined in terms of unconditional probabilities: • Conditional probabilities lead to the chain rule: P(A,B)=P(A/B)P(B)=P(B/A)P(A)

  9. Law of Total Probability • If A1, A2, …, An is a partition of mutually exclusive events and B is any event, then • Special case : • Using the chain rule, we can rewrite the law of total probability using conditional probabilities: S A1 A3 B A2 A4

  10. Law of Total Probability: Example • My mood can take one of two values • Happy, Sad • The weather can take one of three values • Rainy, Sunny, Cloudy • We can compute P(Happy) and P(Sad) as follows: P(Happy)=P(Happy/Rainy)+P(Happy/Sunny)+P(Happy/Cloudy) P(Sad)=P(Sad/Rainy)+P(Sad/Sunny)+P(Sad/Cloudy)

  11. Bayes’ Theorem • Conditional probabilities lead to the Bayes’ rule: where • Example: consider the probability of Disease given Symptom: where

  12. Bayes’ Theorem Example • Meningitis causes a stiff neck 50% of the time. • A patient comes in with a stiff neck – what is the probability that he has meningitis? • Need to know two things: • The prior probability of a patient having meningitis (1/50,000) • The prior probability of a patient having a stiff neck (1/20) P(M/S)=0.0002

  13. General Form of Bayes’ Rule • If A1, A2, …, An is a partition of mutually exclusive events and B is any event, then the Bayes’ rule is given by: where

  14. Independence • Two events A and B are independent iff: P(A,B)=P(A)P(B) • From the above formula, we can show: P(A/B)=P(A) and P(B/A)=P(B) • A and B are conditionally independent given C iff: P(A/B,C)=P(A/C) e.g., P(WetGrass/Season,Rain)=P(WetGrass/Rain)

  15. Random Variables • In many experiments, it is easier to deal with a summary variable than with the original probability structure. • Example: in an opinion poll, we ask 50 people whether agree or disagree with a certain issue. • Suppose we record a "1" for agree and "0" for disagree. • The sample space for this experiment has 250 elements. • Suppose we are only interested in the number of people who agree. • Define the variable X=number of "1“ 's recorded out of 50. • Easier to deal with this sample space (has only 51 elements).

  16. Random Variables (cont’d) • A random variable (r.v.) is the value we assign to the outcome of a random experiment (i.e., a function that assigns a real number to each event). X(j)

  17. Random Variables (cont’d) • How is the probability function of the random variable is being defined from the probability function of the original sample space? • Suppose the sample space is S=<s1, s2, …, sn> • Suppose the range of the random variable X is <x1,x2,…,xm> • We observe X=xj iff the outcome of the random experiment is an such that X(sj)=xj P(X=xj)=P( : X(sj)=xj) e.g., What is P(X=2) in the previous example?

  18. Discrete/Continuous Random Variables • A discrete r.v. can assume only a countable number of values. • Consider the experiment of throwing a pair of dice • A continuous r.v. can assume a range of values (e.g., sensor readings).

  19. Probability mass (pmf) and density function (pdf) • The pmf /pdf of a r.v. X assigns a probability for each possible value of X. • Warning:given two r.v.'s, X and Y, their pmf/pdf are denoted as pX(x) and pY(y); for convenience, we will drop the subscripts and denote them as p(x) and p(y), however, keep in mind that these functions are different !

  20. Probability mass (pmf) and density function (pdf) (cont’d)_ • Some properties of the pmf and pdf:

  21. Probability Distribution Function (PDF) • With every r.v., we associate a function called probability distribution function (PDF) which is defined as follows: • Some properties of the PDF are: • (1) • (2) F(x) is a non-decreasing function of x • If X is discrete, its PDF can be computed as follows:

  22. Probability Distribution Function (PDF) (cont’d)

  23. Probability mass (pmf) and density function (pdf) (cont’d)_ • If X is continuous, its PDF can be computed as follows: • Using the above formula, it can be shown that:

  24. Probability mass (pmf) and density function (pdf) (cont’d)_ • Example: the Gaussian pdf and PDF 0

  25. Joint pmf (discrete r.v.) • For n random variables, the joint pmf assigns a probability for each possible combination of values: p(x1,x2,…,xn)=P(X1=x1, X2=x2, …, Xn=xn) Warning:the joint pmf /pdf of the r.v.'s X1, X2, ..., Xn and Y1, Y2, ..., Yn are denoted as pX1X2...Xn(x1,x2,...,xn) and pY1Y2...Yn(y1,y2,...,yn); for convenience, we will drop the subscripts and denote them p(x1,x2,...,xn) and p(y1,y2,...,yn), keep in mind, however, that these are two different functions.

  26. Joint pmf(discrete r.v.) (cont’d) • Specifying the joint pmf requires an enormous number of values • kn assuming n random variables where each one can assume one of k discrete values. • things much simpler if we assume independence or conditional independence … P(Cavity, Toothache) is a 2 x 2 matrix Joint Probability Sum of probabilities = 1.0

  27. Joint pdf (continuous r.v.) For n random variables, the joint pdf assigns a probability for each possible combination of values:

  28. Discrete/Continuous Probability Distributions

  29. Interesting Properties • The conditional pdf can be derived from the joint pdf: • Conditional pdfs lead to the chain rule (general form):

  30. Interesting Properties (cont’d) • Knowledge about independence between r.v.'s is very powerful since it simplifies things a lot, e.g., if X and Y are independent, then: • The law of total probability:

  31. Marginalization • From a joint probability, we can compute the probability of any subset of the variables by marginalization: • Example - case of joint pmf : • Examples - case of joint pdf :

  32. Why is the joint pmf (or pdf) useful?

  33. Why is the joint pmf (or pdf) useful (cont’d)?

  34. Probabilistic Inference • If we could define all possible values for the probability distribution, then we could read off any probability we were interested in. • In general, it is not practical to define all possible entries for the joint probability function. • Probabilistic inference consists of computing probabilities that are not explicitly stored by the reasoning system (e.g., marginals, conditionals).

  35. Normal (Gaussian) Distribution

  36. Normal (Gaussian) Distribution (cont’d) • Shape and parameters of Gaussian distribution • number of parameters is • shape determined by Σ

  37. Normal (Gaussian) Distribution (cont’d) • Mahalanobis distance: • If variables are independent, the multivariate normal distribution becomes:

  38. Expected Value

  39. Expected Value (cont’d)

  40. Properties of the Expected Value

  41. Variance and Standard Deviation

  42. Covariance

  43. Covariance Matrix

  44. Uncorrelated r.v.’s

  45. Properties of the covariance matrix

  46. Covariance Matrix Decomposition Φ-1= ΦT

  47. Linear Transformations

  48. Linear Transformations (cont’d) Whitening transform

  49. Moments of r.v.’s

More Related