380 likes | 416 Views
Uncertainty in Environments. Comp3710 Artificial Intelligence Computing Science Thompson Rivers University. Course Outline. Part I – Introduction to Artificial Intelligence Part II – Classical Artificial Intelligence Part III – Machine Learning Introduction to Machine Learning
E N D
Uncertainty in Environments Comp3710 Artificial Intelligence Computing Science Thompson Rivers University
Course Outline • Part I – Introduction to Artificial Intelligence • Part II – Classical Artificial Intelligence • Part III – Machine Learning • Introduction to Machine Learning • Neural Networks • Probabilistic Reasoning and Bayesian Belief Networks • Artificial Life: Learning through Emergent Behavior • Part IV – Advanced Topics Uncertainty
Learning Outcomes • Relate a proposition to probability, i.e., random variables. • Give the sample space for a set of random variables. • Use one of Komogorov’s 3 axioms P(A B) = P(A) + P(B) –P(AB) • Give the joint probability distribution for a set of random variables • Compute probabilities given a joint probability distribution • Give the joint probability distribution for a set of random variables that have conditional independence relations. • Compute the diagnostic probability from causal probabilities. Uncertainty
Outline • How to deal with uncertainty in environments • Probability • Syntax and Semantics • Inference • Independence and Bayes' Rule Uncertainty
1. Uncertainty in environments • An example of decision making problem: • Let action At = leave for airport t minutes before flight. • Will At get me there on time? • [Q] How to decide? • Problems:Application environments are not deterministic and unpredictable. Many hidden factors. • partial observability (road state, other drivers' plans, etc.) • noisy sensors (traffic reports) • uncertainty in action outcomes (flat tire, etc.) • immense complexity of modeling and predicting traffic • Hence a purely logical approach either • risks falsehood: “A25 will get me there on time”, or • leads to conclusions that are too weak for decision making: • “A25 will get me there on time if there's no accident on the bridge and it doesn't rain and my tires remain intact etc etc.” • (A1440 might reasonably be said to get me there on time but I'd have to stay overnight in the airport …) Uncertainty
Topics Methods for handling uncertainty • [Q] How to handle uncertainty? • [Q] Default logic? • Default assumptions: My car does not have a flat tire. ... • A25 works unless contradicted by evidence. • But issues: What assumptions are reasonable? • [Q] Can we use fuzzy logic? • Probability – uncertainty/certainty • Model of degree of belief • Given the available evidence or knowledge – environments, • A25 will get me there on time with probability 0.04. • Fuzzy logic – ambiguity/clearness • Degree of truth – value (observed data/facts) • E.g., my age Uncertainty
2. Probability • Probability is a way of expressing knowledge or belief that an event will occur or has occurred. E.g., … • Probabilistic assertions summarize effects of hidden factors, i.e., • laziness: failure to enumerate exceptions, qualifications, etc. • ignorance: lack of relevant facts, initial conditions, etc. • Subjectiveor Bayesianprobability: • Probabilities can relate propositions to agent's own state of knowledge (or conditions). e.g., P(A25 | no reported accidents) = 0.06 • Probabilities of propositions change with new evidence: e.g., P(A25 | no reported accidents, 5 a.m.) = 0.15 • Hence probability theory can be a good tool. Uncertainty
Topics Making decisions under uncertainty • Suppose I believe the following: P(A25 gets me there on time | …) = 0.04 P(A90 gets me there on time | …) = 0.70 P(A120 gets me there on time | …) = 0.95 P(A1440 gets me there on time | …)= 0.9999 • [Q] Which action to choose? • Depends on my preferences for missing flight vs. time spent waiting, etc. • Utility (the quality of being useful) theory is used to represent and infer preferences. • Decision theory = probability theory + utility theory Uncertainty
3. Syntax and Semantics • Sample space, event, probability, probability distribution • Random variable and proposition • Prior probability (aka unconditional probability) • Posterior probability (aka conditional probability) Uncertainty
Sample Space and Probability • Definition. Begin with a setΩ – the sample spaceof the same type atomic events • E.g., 6 possible rolls of a die -> Ω = {???} • ω in Ωis a sample point/possible world/atomic event • Definition. A probability space or probability distributionor probability model is a sample spacewith an assignment P(ω) for everyω in Ωs.t. • 0 ≤P(ω) ≤ 1 • ΣωP(ω) = 1 • E.g., P(1)=P(2)=P(3)=P(4)=P(5)=P(6)=1/6. [Q] Sample space? [Q] Probability distribution? • Definition. An eventA can be considered as a subset of Ω. • P(A) = Σω in AP(ω) • E.g., P(die roll < 4) = P(1) + P(2) + P(3) = 1/6 + 1/6 + 1/6 = 1/2 Uncertainty
3. Axioms of probability • Kolmogorov’s axioms: For any propositions (events) A, B • 0 ≤P(A) ≤ 1 • P(true) = 1 and P(false) = 0 • P(A B) = P(A) + P(B) –P(AB) Uncertainty
Random Variable and Proposition • Definition. Basic element: random variable, representing a sample space • A function from sample points to some range, e.g., the reals or Booleans • E.g., Odd: N -> {true, false} Odd(1) = true • Boolean random variables • E.g., Cavity (do I have a cavity?); [Q] What is the sample space? • Discrete random variables (finite or infinite) • E.g., Weather is one of <sunny,rainy,cloudy,snow>; [Q] What is the sample space? • Continuous random variables (bounded or unbounded) • E.g., RoomTemperature < 22; [Q] What is the sample space? Uncertainty
3. Proposition (how to relate to probability?) • Think of a proposition as the event where the proposition is true. • E.g., I have a toothache; The weather is sunny and there is no cavity. • [Q] How to relate to probability, i.e., random variables? • Similar to propositional logic: possible worlds defined by assignment of values to random variables. • A sample point is a value of a set of random variables. • A sample space is the Cartesian product of the ranges of random variables. • [Q] What is the sample space for “Cavity and Weather”? • Elementary proposition constructed by assignment of a value to a random variable: E.g., Weather =sunny; Cavity = false (can be abbreviated as cavity) • Complex propositions formed from elementary propositions and standard logical connectives E.g., (Weather = sunny) (Cavity = false) • Proposition = conjunction/disjunction of atomic events Uncertainty
Prior and Posterior Probability • Prior (aka unconditional) probability • Posterior (aka conditional) probability Uncertainty
Prior probability • Prior or unconditional probabilities of propositions E.g., P(Cavity = true) = 0.1 and P(Weather = sunny) = 0.72 correspond to belief prior to arrival of any (new) evidence. • Probability distribution gives values for all possible assignments: P(Weather) = <0.72,0.1,0.08,0.1> (normalized, i.e., sums to 1) • Joint probability distribution for a set of random variables gives the probability of every atomic event on those random variables. P(Weather,Cavity) = a 4 × 2 matrix of values: Weather =sunnyrainycloudysnow Cavity = true 0.144 0.02 0.016 0.02 Cavity = false 0.576 0.08 0.064 0.08 [Q] What is the sum of all the values above? • Every question about a domain can be answered with the joint probability distribution. [Q] Any problem? Size of the table. Uncertainty
P(Weather,Cavity) = a 4 × 2 matrix of values: Weather =sunnyrainycloudysnow Cavity = true 0.144 0.02 0.016 0.02 Cavity = false 0.576 0.08 0.064 0.08 • [Q] What is the probability of Weather =cloudy? • [Q] What is the probability of snow and no-cavity? • [Q] What is the probability of sunnyorcavity? P(Weather=sunny) P(Cavity=true) = P(Weather=sunny) + P(Cavity=true) ??? ??? = (.144 + .576) + (.144 + .02 + .016 + .02) – ??? Uncertainty
Prior probability – continuous variables • In general, express probability distribution as parameterized function, called a probability density function(pdf), of value • E.g., P(X=x) = U[18,26](x) = 0.125 • P is a density; integrates to 1 0.125 18 dx 26 Uncertainty
Example of Normal distribution, or also called Gaussian distribution Probabilities [This image is from Wikipedia.org] Events Uncertainty
3. • Interesting facts: • P(A) = 1 – P(~A) • P(A) = P(A B) + P(A ~B) for any B Uncertainty
Conditional probability • Posterioror conditional probabilities e.g., P(cavity | toothache) = 0.6 i.e., given that toothache is all I know, (it is not like if toothache,) 60% chance of cavity. Another example, P(toothache | cavity) = 0.8 • New evidence/observation/fact may be irrelevant, allowing simplification, e.g., when Cavity and Weather are irrelevant, P(cavity | toothache, sunny) = P(cavity | toothache) = 0.6 • This kind of inference, sanctioned by domain knowledge, is crucial. • Very interesting examples: [Q] The stiff neck is one symptom of meningitis; the probability of having meningitis with stiff neck? [Q] A car won’t start. What is the probability of having a bad starter? What is the probability of having a bad battery? Uncertainty
b • Definition of conditional probability: P(a | b) = P(a b) / P(b) if P(b) > 0 • Product rule gives an alternative formulation: P(ab) = P(a) P(b | a) = P(b) P(a | b) • Chain rule is derived by successive application of product rule: P(abc) = ??? = P(ab) P(c | ab) = P(a) P(b | a) P(c | ab) P(X1,…,Xn) = P(X1,...,Xn-1) P(Xn | X1,...,Xn-1) = P(X1,...,Xn-2) P(Xn-1 | X1,...,Xn-2) P(Xn | X1,...,Xn-1) = … = P(X1) P(X2 | X1) P(X3 | X1, X2) P(X4 | X1, X2, X3) … = i=1P(Xi | X1,…,Xi-1) will be used in Inference later! a n Uncertainty
3. Topics • Interesting facts: • P(A|B) = 1 – P(~A|B) Uncertainty
4. Inference by Enumeration • Start with the joint probability distribution:Toothache (I have a toothache), Cavity (meaning?), Catch (catch probe?). • [Q] What are random variables here? • [Q] What is the sum of probabilities of all the atomic events? • [Q] How can you obtain the above probability distribution? • For any proposition φ, the probability is the sum of the atomic events where it is true: P(φ) = Σω:ω╞φP(ω) Uncertainty
Start with the joint probability distribution: Toothache, Cavity, Catch • For any proposition φ, sum the atomic events where it is true: P(φ) = Σω:ω╞φP(ω) • [Q] P(toothache) = 0.108 + 0.012 + 0.016 + 0.064 = 0.2 • [Q] P(cavity toothache) = ??? • [Q] P(cavity toothache) = ??? Uncertainty
Start with the joint probability distribution: Toothache, Cavity, Catch • Can also compute conditional probabilities: Very important! P(cavity | toothache) = P(cavity toothache) P(toothache) =0.016 + 0.064 0.108 + 0.012 + 0.016 + 0.064 = 0.4 [Q] P(cavity | toothache) = ??? P(cavity | toothache) = ??? [Q] P(cavity | catch) = ??? [Q] P(cavity | toothache) ??? Uncertainty
Denominator can be viewed as a normalization constant α P(Cavity | toothache) = P(Cavity,toothache) / P(toothache) = α, P(Cavity,toothache) = α, [P(Cavity,toothache,catch) + P(Cavity,toothache,catch)] = α, [<0.108,0.016> + <0.012,0.064>] = α, <0.12,0.08> = <0.6,0.4> [Q] What is α? General idea: compute distribution on query variable by fixing evidencevariables and summing over hidden variables Uncertainty
Topics • Obvious problems: For n random variables, • Worst-case time complexity O(dn)where d is the largest arity • Space complexity O(dn) to store the joint distribution • How to find the numbers for O(dn)entries? • In other words, too much. • [Q] How to reduce? Uncertainty
5. Independence & Bayes’ Rule • Independence • Conditional independence • Bayes’ rule Uncertainty
5. Independence • A and B are independent iffP(A|B) = P(A) or P(B|A) = P(B) or P(A,B) = P(A) P(B) P(Toothache, Catch, Cavity, Weather)= P(Toothache, Catch, Cavity) ×P(Weather) • In the above example, 32 (2*2*2*4) entries reduced to 12 (2*2*2 + 4). • Absolute independence is powerful but rare. • Dentistry is a large field with hundreds of variables, none of which are independent. [Q] What do we have to do? Uncertainty
Conditional Independence • If I have a cavity, the probability that the probe catches in it doesn't depend on whether I have a toothache: (1) P(catch | (toothache cavity)) = P(catch | cavity) • The same independence holds if I haven't got a cavity: (2) P(catch | (toothache cavity)) = P(catch | cavity) • Catch is conditionally independent of Toothache given Cavity: P(Catch | Toothache, Cavity) = P(Catch | Cavity) • When Catch is conditionally independent of Toothache given Cavity,equivalent statements: P(Toothache | Catch, Cavity) = P(Toothache | Cavity) P(Toothache, Catch | Cavity) = P(Toothache | Cavity) P(Catch | Cavity) (similarly P(A, B) = P(A) P(B) when A and B are independent) Uncertainty
Write out full joint distribution using chain rule: P(Toothache, Catch, Cavity) = P(Toothache | Catch, Cavity) P(Catch, Cavity) by the product rule = P(Toothache | Catch, Cavity) P(Catch | Cavity) P(Cavity) by ??? = P(Toothache | Cavity) P(Catch | Cavity) P(Cavity) by ??? 2 * 2 * 2 = 8 -> 2 + 2 + 1 How ??? Cavity Toothache Catch [Q] P(toothache) from the tables below??? = P(tcavity) + P(tcavity) = P(t|c) P(c) + P(t|c) P(c) = ??? [Q] P(t|cavity) = ??? = 1 – P(t|c) = ??? ? Uncertainty
Write out full joint distribution using chain rule: P(Toothache, Catch, Cavity) = P(Toothache | Catch, Cavity) P(Catch, Cavity) = P(Toothache | Catch, Cavity) P(Catch | Cavity) P(Cavity) = P(Toothache | Cavity) P(Catch | Cavity) P(Cavity) • In most cases, the use of conditional independence reduces the size of the representation of the joint distribution from exponential in n to linear in n. • Conditional independence is our most basic and robust form of knowledge about uncertain environments. Cavity Toothache Catch Uncertainty
5. • Interesting fact: • P(A,B|C) = P(A|C)P(B|C) when A and B are conditionally independent of each other given C. • Actually it is a definition. • P(toothache catch | cavity) = .108 / (.108 + .012 + .072 + .008) = .054 • P(toothache | cavity) = (.108 + .012) / .2 = .12 / .2 = .06 • P(catch | cavity) = (.108 + .072) / .2 = .09 • P(toothache | cavity) P(catch | cavity) = .06 * .09 = .054 • P(toothache catch | cavity) = P(toothache | cavity) P(catch | cavity) • … • Therefore Toothache and Catch are conditionally independent given Cavity. It is how to show the conditional independency. Uncertainty
Bayes' Rule • Product rule P(ab) = P(a | b) P(b) = P(b | a) P(a) Bayes' rule, or also called Bayes’ Theorem: P(a | b) = P(b | a) P(a) / P(b) • or in distribution form P(Y|X) = P(X|Y) P(Y) / P(X) = αP(X|Y) P(Y) • Useful for assessing diagnosticprobability from causalprobability: • P(Cause|Effect) = P(Effect|Cause) P(Cause) / P(Effect) • [Q] The stiff neck is one symptom of meningitis; the probability of having meningitis with stiff neck? • [Q] A car won’t start. What is the probability of having a bad starter? What is the probability of having a bad battery? Uncertainty
Useful for assessing diagnosticprobability from causalprobability: • P(Cause|Effect) = P(Effect|Cause) P(Cause) / P(Effect) • [Q] When cavity, usually toothache P(cavity | toothache) ? • [Q] E.g., let M be meningitis, S be stiff neck; the stiff neck is one symptom of meningitis; the probability of having meningitis with stiff neck? P(s) = 0.1; P(m) = 0.0001; P(s|m) = 0.8 P(m|s) = ??? • [Q] How to find P(s|m), P(m) and P(s) ??? • Note: posterior probability of meningitis still very small! Uncertainty
Useful for assessing diagnosticprobability from causalprobability: • P(Cause|Effect) = P(Effect|Cause) P(Cause) / P(Effect) • [Q] E.g., let B be ‘bad battery’, N be ‘no starting in cold morning’; N is one symptom of B; the probability of having bad battery when a car won’t start in cold morning? P(n) = 0.01; P(b) = 0.05; P(n|b) = 0.8 P(b|n) = ??? Uncertainty
5. Bayes' rule and conditional independence Topics P(Cavity | toothache catch) = P(toothache catch | Cavity) P(Cavity) = P(toothache | Cavity) P(catch | Cavity) P(Cavity) • This is an example of a naïve Bayes model: P(Cause,Effect1, … ,Effectn) = P(Cause) ΠiP(Effecti|Cause), where Effects are conditionally independent of each other, given Cause. • We will use this later again! Uncertainty
Topics 6. Summary • Probability is a rigorous formalism for uncertain knowledge. • Joint probability distribution for discrete random variables specifies probability of every atomic event. • Queries can be answered by summing over atomic events. • For nontrivial domains, we must find a way to reduce the joint size. • Independenceand conditional independence provide the tools. Uncertainty