290 likes | 908 Views
Bayes Rule. Rev. Thomas Bayes (1702-1761). How is this rule derived? Using Bayes rule for probabilistic inference: P(Cause | Evidence): diagnostic probability P(Evidence | Cause): causal probability. Bayesian decision theory.
E N D
Bayes Rule Rev. Thomas Bayes(1702-1761) • How is this rule derived? • Using Bayes rule for probabilistic inference: • P(Cause | Evidence): diagnostic probability • P(Evidence | Cause): causal probability
Bayesian decision theory • Suppose the agent has to make a decision about the value of an unobserved query variable X given some observed evidenceE = e • Partially observable, stochastic, episodic environment • Examples: X = {spam, not spam}, e = email messageX = {zebra, giraffe, hippo}, e = image features • The agent has a loss function, which is 0 if the value of X is guessed correctly and 1 otherwise • What is agent’s optimal estimate of the value of X? • Maximum a posteriori (MAP) decision: value of X that minimizes expected loss is the one that has the greatest posterior probability P(X = x | e)
MAP decision • X = x: value of query variable • E = e: evidence • Maximum likelihood (ML) decision: posterior likelihood prior
Example: Spam Filter • We have X = {spam, ¬spam}, E = email message. • What should be our decision criterion? • Compute P(spam | message) andP(¬spam | message), and assign the message to the class that gives higher posterior probability
Example: Spam Filter • We have X = {spam, ¬spam}, E = email message. • What should be our decision criterion? • Compute P(spam | message) andP(¬spam | message), and assign the message to the class that gives higher posterior probability P(spam | message) P(message | spam) P(spam) P(¬spam | message) P(message | ¬spam) P(¬spam)
Example: Spam Filter • We need to find P(message | spam) P(spam) and P(message | ¬spam) P(¬spam) • How do we represent the message? • Bag of words model: • The order of the words is not important • Each word is conditionally independent of the others given message class • If the message consists of words(w1, …, wn), how do we compute P(w1, …, wn | spam)? • Naïve Bayes assumption: each word is conditionally independent of the others given message class
Example: Spam Filter • Our filter will classify the message as spam if • In practice, likelihoods are pretty small numbers, so we need to take logs to avoid underflow: • Model parameters: • Priors P(spam), P(¬spam) • Likelihoods P(wi | spam), P(wi| ¬spam) • These parameters need to be learned from a training set (a representative sample of email messages marked with their classes)
Parameter estimation • Model parameters: • Priors P(spam), P(¬spam) • Likelihoods P(wi | spam), P(wi | ¬spam) • Estimation by empirical word frequencies in the training set: • This happens to bethe parameter estimate that maximizes the likelihood of the training data: # of occurrences of wi in spam messages P(wi| spam) = total # of words in spam messages d: index of training document, i: index of a word
Parameter estimation • Model parameters: • Priors P(spam), P(¬spam) • Likelihoods P(wi | spam), P(wi | ¬spam) • Estimation by empirical word frequencies in the training set: • Parameter smoothing: dealing with words that were never seen or seen too few times • Laplacian smoothing: pretend you have seen every vocabulary word one more time than you actually did # of occurrences of wi in spam messages P(wi| spam) = total # of words in spam messages
Bayesian decision making: Summary • Suppose the agent has to make decisions about the value of an unobserved query variable Xbased on the values of an observed evidence variableE • Inference problem: given some evidence E = e, what is P(X | e)? • Learning problem: estimate the parameters of the probabilistic model P(X | E) given a training sample{(x1,e1), …, (xn,en)}
Bag-of-word models for images Csurka et al. (2004), Willamowski et al. (2005), Grauman & Darrell (2005), Sivic et al. (2003, 2005)
Bag-of-word models for images • Extract image features
Bag-of-word models for images • Extract image features
Bag-of-word models for images • Extract image features • Learn “visual vocabulary”
Bag-of-word models for images • Extract image features • Learn “visual vocabulary” • Map image features to visual words