1 / 15

Bayes Rule

Bayes Rule. Rev. Thomas Bayes (1702-1761). How is this rule derived? Using Bayes rule for probabilistic inference: P(Cause | Evidence): diagnostic probability P(Evidence | Cause): causal probability. Bayesian decision theory.

lovey
Download Presentation

Bayes Rule

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bayes Rule Rev. Thomas Bayes(1702-1761) • How is this rule derived? • Using Bayes rule for probabilistic inference: • P(Cause | Evidence): diagnostic probability • P(Evidence | Cause): causal probability

  2. Bayesian decision theory • Suppose the agent has to make a decision about the value of an unobserved query variable X given some observed evidenceE = e • Partially observable, stochastic, episodic environment • Examples: X = {spam, not spam}, e = email messageX = {zebra, giraffe, hippo}, e = image features • The agent has a loss function, which is 0 if the value of X is guessed correctly and 1 otherwise • What is agent’s optimal estimate of the value of X? • Maximum a posteriori (MAP) decision: value of X that minimizes expected loss is the one that has the greatest posterior probability P(X = x | e)

  3. MAP decision • X = x: value of query variable • E = e: evidence • Maximum likelihood (ML) decision: posterior likelihood prior

  4. Example: Spam Filter • We have X = {spam, ¬spam}, E = email message. • What should be our decision criterion? • Compute P(spam | message) andP(¬spam | message), and assign the message to the class that gives higher posterior probability

  5. Example: Spam Filter • We have X = {spam, ¬spam}, E = email message. • What should be our decision criterion? • Compute P(spam | message) andP(¬spam | message), and assign the message to the class that gives higher posterior probability P(spam | message)  P(message | spam) P(spam) P(¬spam | message)  P(message | ¬spam) P(¬spam)

  6. Example: Spam Filter • We need to find P(message | spam) P(spam) and P(message | ¬spam) P(¬spam) • How do we represent the message? • Bag of words model: • The order of the words is not important • Each word is conditionally independent of the others given message class • If the message consists of words(w1, …, wn), how do we compute P(w1, …, wn | spam)? • Naïve Bayes assumption: each word is conditionally independent of the others given message class

  7. Example: Spam Filter • Our filter will classify the message as spam if • In practice, likelihoods are pretty small numbers, so we need to take logs to avoid underflow: • Model parameters: • Priors P(spam), P(¬spam) • Likelihoods P(wi | spam), P(wi| ¬spam) • These parameters need to be learned from a training set (a representative sample of email messages marked with their classes)

  8. Parameter estimation • Model parameters: • Priors P(spam), P(¬spam) • Likelihoods P(wi | spam), P(wi | ¬spam) • Estimation by empirical word frequencies in the training set: • This happens to bethe parameter estimate that maximizes the likelihood of the training data: # of occurrences of wi in spam messages P(wi| spam) = total # of words in spam messages d: index of training document, i: index of a word

  9. Parameter estimation • Model parameters: • Priors P(spam), P(¬spam) • Likelihoods P(wi | spam), P(wi | ¬spam) • Estimation by empirical word frequencies in the training set: • Parameter smoothing: dealing with words that were never seen or seen too few times • Laplacian smoothing: pretend you have seen every vocabulary word one more time than you actually did # of occurrences of wi in spam messages P(wi| spam) = total # of words in spam messages

  10. Bayesian decision making: Summary • Suppose the agent has to make decisions about the value of an unobserved query variable Xbased on the values of an observed evidence variableE • Inference problem: given some evidence E = e, what is P(X | e)? • Learning problem: estimate the parameters of the probabilistic model P(X | E) given a training sample{(x1,e1), …, (xn,en)}

  11. Bag-of-word models for images Csurka et al. (2004), Willamowski et al. (2005), Grauman & Darrell (2005), Sivic et al. (2003, 2005)

  12. Bag-of-word models for images • Extract image features

  13. Bag-of-word models for images • Extract image features

  14. Bag-of-word models for images • Extract image features • Learn “visual vocabulary”

  15. Bag-of-word models for images • Extract image features • Learn “visual vocabulary” • Map image features to visual words

More Related