CPSC 7373: Artificial Intelligence Lecture 4: Uncertainty

CPSC 7373: Artificial IntelligenceLecture 4: Uncertainty Jiang Bian, Fall 2012 University of Arkansas at Little Rock

Chapter 13: Uncertainty • Outline • Uncertainty • Probability • Syntax and Semantics • Inference • Independence and Bayes' Rule

Uncertainty Let action At = leave for airport tminutes before flight Will At get me there on time? Problems: • partial observability (road state, other drivers' plans, etc.) • noisy sensors (traffic reports) • uncertainty in action outcomes (flat tire, etc.) • immense complexity of modeling and predicting traffic Hence a purely logical approach either • risks falsehood: “A25 will get me there on time”, or • leads to conclusions that are too weak for decision making: “A25 will get me there on time, if there‘s no accident on the bridge and it doesn’t rain and my tires remain intact etc etc.” (A1440 might reasonably be said to get me there on time but I'd have to stay overnight in the airport …)

Methods for handling uncertainty • Default or nonmonotonic logic: • Assume my car does not have a flat tire • Assume A25 works unless contradicted by evidence • Issues: What assumptions are reasonable? How to handle contradiction? • Rules with fudge factors: • A25 |→0.3 get there on time • Sprinkler |→0.99WetGrass • WetGrass |→0.7Rain • Issues: Problems with combination, e.g., Sprinkler causes Rain?? • Probability • Model agent's degree of belief Given the available evidence, A25will get me there on time with probability 0.04

Probability Probabilistic assertions summarize effects of • laziness: failure to enumerate exceptions, qualifications, etc. ignorance: lack of relevant facts, initial conditions, etc. Subjective probability: • Probabilities relate propositions to agent's own state of knowledge e.g., P(A25 | no reported accidents) = 0.06 These are not assertions about the world Probabilities of propositions change with new evidence: e.g., P(A25 | no reported accidents, 5 a.m.) = 0.15

Bayes Network: Example BATTERY AGE FAN-BLET BROKEN ALTERNATOR BROKEN BATTERY DEAD NOT CHARGING BATTERY METER BATTERY FLAT NO OIL NO GAS FUEL LINE BLOCKED STARTER BROKEN LIGHTS CAR WON’T START OIL LIGHT GAS GAUGE DIP STICK

Probabilities: Coin Flip • Suppose the probability for heads is 0.5. What's the probability for it coming up tails? • P(H) = 1/2 • P(T) = __?__

Probabilities: Coin Flip • Suppose the probability for heads is 1/4. What's the probability for it coming up tails? • P(H) = 1/4 • P(T) = __?__

Probabilities: Coin Flip • Suppose the probability for heads is 1/2. Each of the coin flip is independent. What's the probability for it coming up three heads in a row? • P(H) = 1/2 • P(H, H, H) = __?__

Probabilities: Coin Flip • Xi = result of i-th coin flig; • Xi = {H, T}; and • Pi(H) = 1/2 ∀i • P(X1=X2=X3=X4) = __?__

Probabilities: Coin Flip • Xi = result of i-th coin flig; • Xi = {H, T}; and • Pi(H) = 1/2 ∀i • P(X1=X2=X3=X4) = __?__ • P(X1=X2=X3=X4=H) + P(X1=X2=X3=X4=T) = 1/8

Probabilities: Coin Flip • Xi = result of i-th coin flig; • Xi = {H, T}; and • Pi(H) = 1/2 ∀i • P({X1,X2,X3,X4} contains at least 3 H) = __?__

Probabilities: Coin Flip • Xi = result of i-th coin flig; • Xi = {H, T}; and • Pi(H) = 1/2 ∀i • P({X1,X2,X3,X4} contains at least 3 H) = __?__ • P(HHHH) + P(HHHT) + P(HHTH) + P(HTHH) + P(THHH) = 5*1/16 = 5/16

Probabilities: Summary • Complementary probability: • P(A) = p; then P(¬A) = 1 – p • Independence: • X⊥Y; then P(X)P(Y) = P(X,Y) marginal joint probability

Dependence • Given, P(X1=H)= ½ • H: P(X2=H|X1=H) = 0.9 • T: P(X2=T|X1=T) = 0.8 • P(X2=H) = __?__

Dependence • Given, P(X1=H)= ½ • H: P(X2=H|X1=H) = 0.9 • T: P(X2=T|X1=T) = 0.8 • P(X2=H) = __?__ • P(X2=H|X1=H) * P(X1=H) + P(X2=H|X1=T) * P(X1=T) • = 0.9 * ½ + (1 – 0.8) * ½ = 0.55

What we have learned? • Total probability: • Negation of probabilities • What about?

What we have learned? • Negation of probabilities • What about? • You can negate the event (X), but you can never negate the conditional variable (Y).

Example: Weather • Given, • P(D1); P(D1=Sunny) = 0.9 • P(D2=Sunny|D1=Sunny) = 0.8 • P(D2=Rainy|D1=Sunny) = ??

Example: Weather • Given, • P(D1); P(D1=Sunny) = 0.9 • P(D2=Sunny|D1=Sunny) = 0.8 • P(D2=Rainy|D1=Sunny) = ?? • 1 - P(D2=Sunny|D1=Sunny) = 0.2 • P(D2=Sunny|D1=Rainy) = 0.6 • P(D2=Rainy|D1=Rainy) = ?? • 1 - P(D2=Sunny|D1=Rainy) = 0.4 • Assume the transition probabilities from D2 to D3 are the same: • P(D2=Sunny) = ??; and P(D3=Sunny) = ??

Example: Weather • Given, • P(D1); P(D1=Sunny) = 0.9 • P(D2=Sunny|D1=Sunny) = 0.8 • P(D2=Sunny|D1=Rainy) = 0.6 • Assume the transition probabilities from D2 to D3 are the same: • P(D2=Sunny) = 0.78; • P(D2=Sunny|D1=Sunny) * P(D1=Sunny) + P(D2=Sunny|D1=Rainy) * P(D1=Rainy) = 0.8*0.9 + 0.6*0.1 = 0.78 • P(D3=Sunny) = 0.756 • P(D3=Sunny|D2=Sunny) * P(D2=Sunny) + P(D3=Sunny|D2=Rainy) * P(D2=Rainy) = 0.8*0.78 + 0.6*0.22 = 0.756

Example: Cancer • There exists a type of cancer, where 1% of the population will carry the disease. • P(C) = 0.01; P(¬C) = 1- 0.01 = 0.99 • There exists a test of the cancer. • P(+|C) = 0.9; P(-|C) = 0.1 • P(+|¬C) = 0.2; P(-|¬C) = 0.8 • P(C|+) = ?? • Joint probabilities: • P(+, C) = ??; P(-, C) = ?? • P(+, ¬C) = ??; P(-, ¬C) = ??

Example: Cancer • There exists a type of cancer, where 1% of the population will carry the disease. • P(C) = 0.01; P(¬C) = 1- 0.01 = 0.99 • There exists a test of the cancer. • P(+|C) = 0.9; P(-|C) = 0.1 • P(+|¬C) = 0.2; P(-|¬C) = 0.8 • P(C|+) = ?? • Joint probabilities: • P(+, C) = 0.009; P(-, C) = 0.001 • P(+, ¬C) = 0.198; P(-, ¬C) = 0.792

Example: Cancer • There exists a type of cancer, where 1% of the population will carry the disease. • P(C) = 0.01; P(¬C) = 1- 0.01 = 0.99 • There exists a test of the cancer. • P(+|C) = 0.9; P(-|C) = 0.1 • P(+|¬C) = 0.2; P(-|¬C) = 0.8 • P(C|+) = 0.043 • P(+, C) / (P(+, C) + P(+, ¬C)) • 0.009 / 0.009 + 0.198 = 0.043 • Joint probabilities: • P(+, C) = 0.009; P(-, C) = 0.001 • P(+, ¬C) = 0.198; P(-, ¬C) = 0.792

Bayes Rule LIKELILHOOD PRIOR POSTERIOR MARGINAL LIKEIIHOOD TOTAL PROBABILITY

Bayes Rule: Cancer Example LIKELILHOOD PRIOR POSTERIOR MARGINAL LIKEIIHOOD

Bayes Network • Graphically, • Diagnostic reasoning: P(A|B) or P(A|¬B) • How many parameters?? A P(A) Not observable B observable P(B|A) P(B|¬A)

Two test cancer example • 2-Test Cancer Example • P(C|T1=+,T2=+) = P(C|++) = ?? C P(C) = 0.01; P(¬C) = 0.99 P(+|C) = 0.9; P(-|C) = 0.1 P(-|¬C) = 0.8; P(+|¬C) = 0.2 T1 T2

Two test cancer example • 2-Test Cancer Example • P(C|T1=+,T2=+) = P(C|++) = 0.1698 C P(C) = 0.01; P(¬C) = 0.99 P(+|C) = 0.9; P(-|C) = 0.1 P(-|¬C) = 0.8; P(+|¬C) = 0.2 T1 T2

Bayes Rule: Compute

Two test cancer example • 2-Test Cancer Example P(C) = 0.01; P(¬C) = 0.99 P(+|C) = 0.9; P(-|C) = 0.1 P(-|¬C) = 0.8; P(+|¬C) = 0.2 P(C|++) = ?? C T1 T2

Two test cancer example • 2-Test Cancer Example P(C) = 0.01; P(¬C) = 0.99 P(+|C) = 0.9; P(-|C) = 0.1 P(-|¬C) = 0.8; P(+|¬C) = 0.2 P(C|+-) = ?? C T1 T2

Two test cancer example • 2-Test Cancer Example P(C) = 0.01; P(¬C) = 0.99 P(+|C) = 0.9; P(-|C) = 0.1 P(-|¬C) = 0.8; P(+|¬C) = 0.2 P(C|+-) = 0.0056 C T1 T2

Conditional Independence • 2-Test Cancer Example • We not only assume that T1 and T2 are identically distributed; but also conditionally independent. • P(T2|C,T1)=P(T2|C) P(C) = 0.01; P(¬C) = 0.99 P(+|C) = 0.9; P(-|C) = 0.1 P(-|¬C) = 0.8; P(+|¬C) = 0.2 C T1 T2

Conditional Independence • Given A, B ⊥C • B ⊥C|A =? B⊥C A B C

Conditional Independence • Given A, B ⊥C • B ⊥C|A =? B⊥C • Intuitively, getting a positive test result about cancer gives us information about whether you have cancer or not. • So if you get a positive test result you're going to raise the probability of having cancer relative to the prior probability. • With that increased probability we will predict that another test will with a higher likelihood give us a positive response than if we hadn't taken the previous test. A B C

Two test cancer example • 2-Test Cancer Example P(C) = 0.01; P(¬C) = 0.99 P(+|C) = 0.9; P(-|C) = 0.1 P(-|¬C) = 0.8; P(+|¬C) = 0.2 P(T2=+|T1=+) = ?? C T1 T2

Conditional independence: cancer example • 2-Test Cancer Example • Conditional independence: given that I know C, knowledge of the first test gives me no more information about the second test. • It only gives me information if C was unknown. P(C) = 0.01; P(¬C) = 0.99 P(+|C) = 0.9; P(-|C) = 0.1 P(-|¬C) = 0.8; P(+|¬C) = 0.2 C P(T2=+|T1=+) = P(+2|+1,C)P(C|+1) + P(+2 |+1, ¬C)P(¬C|+1) = P(+2|C)0.043 + P(+2|¬C)(1-0.043) = 0.9 * 0.043 + 0.2 * 0.957 = 0.2301 T1 T2 P(+2|+1,C) = P(+2|C)

Absolute and Conditional A B A⊥B C A⊥B|C A B A⊥B => A ⊥B | C ?? A⊥B | C => A ⊥B ??

Explaining Away S R P(H|S, R) = 1 P(H|¬S, R) = 0.9 P(H|S, ¬R) = 0.7 P(H|¬S, ¬R) = 0.1 P(S) = 0.7 P(R) = 0.01 H P(R|H,S) = ?? Explaining away means that if we know that we are happy, then sunny weather can explain away the cause of happiness. - If I know that it’s sunny, it becomes less likely that I received a raise. If we see a certain effect that could be caused by multiple causes, seeing one of those causes can explain away any other potential cause of this effect over here.

Absolute and Conditional A B A B A⊥B C C A⊥B | C => A ⊥B ?? A B A⊥B => A ⊥B | C ??

Bayes Networks • Bayes networks define probability distributions over graphs of random variables. • Instead of enumerating all possibilities of the combinations of random variables, the Bayes network is defined by probability distributions that are inherent to each individual node. • The joint probability represented by a Bayes network is the product of various Bayes network probabilities that are defined over individual nodes where each node's probability is only conditioned on the incoming arcs. • P(A)P(B)P(C|A,B)P(D|C)P(E|C) • 10 probability values P(A), P(B) A B P(C|A,B) C D E P(D|C) P(E|C) • 25-1 = 31 probability values

CPSC 7373: Artificial Intelligence Lecture 4: Uncertainty