390 likes | 512 Views
CPSC 7373: Artificial Intelligence Lecture 5: Probabilistic Inference. Jiang Bian, Fall 2012 University of Arkansas at Little Rock. Overview and Example.
E N D
CPSC 7373: Artificial IntelligenceLecture 5: Probabilistic Inference Jiang Bian, Fall 2012 University of Arkansas at Little Rock
Overview and Example • The alarm (A) might go off because of either a Burglary (B) and/or an Earthquake (E). And when the alarm (A) goes off, either John (J) and/or Mary (M) will call to report. • Possible questions: • Given the evidence of either B or E, what’s the probability of J or M will call? • Answer to this type of questions: • Posterior distribution: P(Q1, Q2 … | E1=e1, E2=e2) • It's the probability distribution of one or more query variables given the values of the evidence variables. EVIDENCE B E HIDDEN A QUERY J M
Overview and Example • The alarm (A) might go off because of either a Burglary (B) and/or an Earthquake (E). And when the alarm (A) goes off, either John (J) and/or Mary (M) will call to report. • Possible questions: • Out of all the possible values for all the query variables, which combination of values has the highest probability? • Answer to these questions: • argmaxq: P(Q1=q1, Q2=q2 … | E1=e1, …) • Which Q values are maxable given the evidence values? EVIDENCE B E HIDDEN A QUERY J M
Overview and Example Imagine the situation where Mary has called to report that the alarm is going off, and we want to know whether or not there has been a burglary. For each of the nodes, tell us if the node is an evidence node, a hidden node or a query node? B E A J M
Overview and Example Imagine the situation where Mary has called to report that the alarm is going off, and we want to know whether or not there has been a burglary. For each of the nodes, tell us if the node is an evidence node, a hidden node or a query node? Evidence: M Query: B Hidden: E, A, J B E A J M
Inference through enumeration P(+b|+j, +m) = ??? Imagine the situation where both John and Mary have called to report that the alarm is going off, and we want to know the probability of a burglary. B E A Definition: Conditional probability: P(Q|E) = P(Q, E) / P(E) J M
Inference through enumeration P(+b|+j, +m) = ??? = P(+b, +j, +m) / P(+j, +m) P(+b, +j, +m) B E A J M Definition: Conditional probability: P(Q|E) = P(Q, E) / P(E)
Inference through enumeration B E Given +e, +a ??? A J M
Inference through enumeration P(+b, +j, +m)
Inference through enumeration P(+j, +m)
Inference through enumeration P(+b|+j, +m) = ??? = P(+b, +j, +m) / P(+j, +m) = 0.0005922376 / 0.44741431924 = 0.284 B E A J M Definition: Conditional probability: P(Q|E) = P(Q, E) / P(E)
Enumeration • We assumed binary events/Boolean variables. • Only 5 variables: • 25= 32 rows in the CPT • Practically, what if we have a large network? B E A J M
Example: Car-diagnosis Initial evidence: engine won't start Testable variables (thin ovals), diagnosis variables (thick ovals) Hidden variables (shaded) ensure sparse structure, reduce parameters
Example: Car insurance Predict claim costs (medical, liability, property) given data on application form (other unshaded nodes) If Boolean: 227rows in the CPT NOT Boolean in reality.
Speed Up Enumeration P(+b, +j, +m) Pulling out terms:
Speed up enumeration • Maximize Independence • The structure of the Bayes network determines how efficient to calculate the probability values. O(n) X1 X2 Xn X1 Xn X2 O(2n)
Bayesian networks: definition • A simple, graphical notation for conditional independence assertions and hence for compact specification of full joint distributions • Syntax: • a set of nodes, one per variable • a directed, acyclic graph (link = “directly influences") • a conditional distribution for each node given its parents: P(Xi|Parents(Xi)) • In the simplest case, conditional distribution represented as a conditional probability table (CPT) giving the distribution over Xi for each combination of parent values
Constructing Bayesian Networks • Dependent or Independent? • P(J|M) = P(J)? The alarm (A) might go off because of either a Burglary (B) and/or an Earthquake (E). And when the alarm (A) goes off, either John (J) and/or Mary (M) will call to report. Suppose we choose the ordering M, J, A, B, E B E A J M J M
J M A P(A|J,M) = P(A|J)? P(A|J,M) = P(A)?
J M A P(B|A, J, M) = P(B|A)? P(B|A, J, M) = P(B)? B
J M A B E P(E|B, A, J, M) = P(E|A)? P(E|B, A, J, M) = P(E|A, B)?
J M • Deciding conditional independence is hard in non-causal directions • (Causal models and conditional independence seem hardwired for humans!) • Assessing conditional probabilities is hard in non-causal directions • Network is less compact: 1 + 2 + 4 + 2 + 4=13 numbers needed A B E
Variable Elimination • Variable elimination: carry out summations right-to-left, storing intermediate results (factors) to avoid re-computation (sum out A) (sum out E)
Variable Elimination • Variable elimination: • Summing out a variable from a product of factors: • move any constant factors outside the summation • add up submatrices in pointwise product of remaining factors • still N-P complete, but faster than enumeration Pointwise product of factors f1 and f2
Variable Elimination R T L P(R) P(T|R) P(L|T) • Joining factors • P(R, T)
Variable Elimination R T L P(L|T) RT L • P(R, T) Marginalize on the variable R, to gives us a table of just the variable T. P(R,T) - > P(T)
Variable Elimination R T L P(L|T) RT L • P(R, T) 2) Marginalize on the variable R, to gives us a table of just the variable T. P(R,T) - > P(T)
Variable Elimination R T L P(L|T) RT L T L 3) Joint probability of P(T, L) P(T)
Variable Elimination R T L P(L|T) RT L T L 3) Joint probability of P(T, L) P(T)
Variable Elimination R T L RT L 4) P(L) T L P(T, L) T, L
Variable Elimination R T L RT L 4) P(L) T L P(T, L) T, L Choice of ordering is important!
Approximate Inference: Sampling • Joint probability of heads and tails of a 1 cent, and a 5 cent coin. • Advantages: • Computationally easier. • Works even without CPTs.
Sampling Example Sprinkler: P(S|C) Cloudy: P(C) C Samples: +c, ¬s, +r S R Rain: P(R|C) Sprinkler: P(W|S,R) W • Sampling is consistent if we want to compute the full joint probability of the network or individual variables. • What about conditional probability? P(w|¬c) • Rejection sampling: need to reject samples that do not match the probabilities that we are interested in.
Rejection sampling • Too many rejected samples make it in-efficient. • Likelihood weight sampling: inconsistent B A
Likelihood weighting Sprinkler: P(S|C) Cloudy: P(C) C P(R|+s, +w) S R Rain: P(R|C) Sprinkler: P(W|S,R) W Weight samples: +c, 0.1 +s, +r, 0.99 +w weight: .01 x .99, +c, +s, +r, +w P(C|+s, +r) ??
Gibbs Sampling • Markov Chain Monte Carlo (MCMC) • Sample one variable at a time conditioning on others. +c +s -r -w +c -s -r -w +c -s +r -w
Monty Hall Problem • Suppose you're on a game show, and you're given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 2 [but the door is not opened], and the host, who knows what's behind the doors, opens another door, say No. 1, which has a goat. He then says to you, "Do you want to pick door No. 3?" Is it to your advantage to switch your choice? P(C=3|S=2) = ?? P(C=3|H=1,S=2) = ??
Monty Hall Problem • Suppose you're on a game show, and you're given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 2 [but the door is not opened], and the host, who knows what's behind the doors, opens another door, say No. 1, which has a goat. He then says to you, "Do you want to pick door No. 3?" Is it to your advantage to switch your choice? P(C=3|S=2) = 1/3 P(C=3|H=1,S=2) = 2/3 Why???
Monty Hall Problem • P(C=3|H=1,S=2) • = P(H=1|C=3,S=1)P(C=3|S=1)/SUM(P(H=1|C=i, S=2)P(C=i|S=2) = 2/3 • P(C=1|S=2) = P(C=2|S=2)=P(C=3|S=2) = 1/3