Probabilistic inference

Probabilistic inference • Suppose the agent has to make a decision about the value of an unobserved query variable X given some observed evidenceE = e • Partially observable, stochastic, episodic environment • Examples: X = {spam, not spam}, e = email messageX = {zebra, giraffe, hippo}, e = image features • Bayes decision theory: • The agent has a loss function, which is 0 if the value of X is guessed correctly and 1 otherwise • The estimate of X that minimizes expected loss is the one that has the greatest posterior probability P(X = x | e) • This is the Maximum a Posteriori (MAP) decision

MAP decision • Value of x that has the highest posterior probability given the evidence e

MAP decision • Value of x that has the highest posterior probability given the evidence e posterior likelihood prior

MAP decision • Value of x that has the highest posterior probability given the evidence e • Maximum likelihood (ML) decision: posterior likelihood prior

Example: Naïve Bayes model • Suppose we have many different types of observations (symptoms, features) F1, …, Fn that we want to use to obtain evidence about an underlying hypothesis H • MAP decision involves estimating • If each feature can take on k values, how many entries are in the joint probability table?

Example: Naïve Bayes model • Suppose we have many different types of observations (symptoms, features) F1, …, Fn that we want to use to obtain evidence about an underlying hypothesis H • MAP decision involves estimating • We can make the simplifying assumption that the different features are conditionally independent given the hypothesis: • If each feature can take on k values, what is the complexity of storing the resulting distributions?

Naïve Bayes Spam Filter • MAP decision: to minimize the probability of error, we should classify a message as spam if P(spam | message) > P(¬spam | message)

Naïve Bayes Spam Filter • We need to find P(message | spam) P(spam) and P(message | ¬spam) P(¬spam) • The message is a sequence of words (w1, …, wn) • Bag of words representation • The order of the words in the message is not important • Each word is conditionally independent of the others given message class (spam or not spam)

Naïve Bayes Spam Filter • We need to find P(message | spam) P(spam) and P(message | ¬spam) P(¬spam) • The message is a sequence of words (w1, …, wn) • Bag of words representation • The order of the words in the message is not important • Each word is conditionally independent of the others given message class (spam or not spam) • Our filter will classify the message as spam if

Bag of words illustration US Presidential Speeches Tag Cloudhttp://chir.ag/projects/preztags/

Naïve Bayes Spam Filter posterior prior likelihood

Parameter estimation • In order to classify a message, we need to know the prior P(spam) and the likelihoods P(word | spam) and P(word | ¬spam) • These are the parameters of the probabilistic model • How do we obtain the values of these parameters? prior P(word | spam) P(word | ¬spam) spam: 0.33 ¬spam: 0.67

Parameter estimation • How do we obtain the prior P(spam) and the likelihoods P(word | spam) and P(word | ¬spam)? • Empirically: use training data • This is the maximum likelihood (ML) estimate, or estimate that maximizes the likelihood of the training data: # of word occurrences in spam messages P(word | spam) = total # of words in spam messages d: index of training document, i: index of a word

Parameter estimation • How do we obtain the prior P(spam) and the likelihoods P(word | spam) and P(word | ¬spam)? • Empirically: use training data • Parameter smoothing: dealing with words that were never seen or seen too few times • Laplacian smoothing: pretend you have seen every vocabulary word one more time than you actually did # of word occurrences in spam messages P(word | spam) = total # of words in spam messages

Bag-of-word models for images Csurka et al. (2004), Willamowski et al. (2005), Grauman & Darrell (2005), Sivic et al. (2003, 2005)

Bag-of-word models for images • Extract image features

Bag-of-word models for images • Extract image features • Learn “visual vocabulary”

Bag-of-word models for images • Extract image features • Learn “visual vocabulary” • Map image features to visual words

Bayesian decision making: Summary • Suppose the agent has to make decisions about the value of an unobserved query variable X based on the values of an observed evidence variableE • Inference problem: given some evidence E = e, what is P(X | e)? • Learning problem: estimate the parameters of the probabilistic model P(X | E) given a training sample{(x1,e1), …, (xn,en)}

Probabilistic inference

Probabilistic inference

Presentation Transcript

Probabilistic Inference Lecture 3

Probabilistic Lexical Models for Textual Inference

Probabilistic Inference Lecture 2

Probabilistic Inference Lecture 4 – Part 1

Principled Probabilistic Inference and Interactive Activation

Probabilistic Inference Lecture 5

Probabilistic Inference Lecture 6 – Part 2

Probabilistic Inference Lecture 7

Probabilistic Inference Lecture 1

Probabilistic Inference

Probabilistic networks Inference and Other Problems

Lifted First-Order Probabilistic Inference

Probabilistic Inference in PRISM

Probabilistic Inference Lecture 6 – Part 1

On Distributing Probabilistic Inference

Probabilistic Inference in Distributed Systems

Probabilistic Inference

Probabilistic Inference: Conscious and Unconscious

Probabilistic Inference

First-Order Probabilistic Inference

Probabilistic Inference: Conscious and Unconscious