Intro to AI Uncertainty

Intro to AIUncertainty Ruth Bergman Fall 2002

Why Not Use Logic? • Suppose I want to write down rules about medical diagnosis: Diagnostic rules: A x has(x,sorethroat)  has(x, cold) Causal rules: A x has(x,cold)  has(x, sorethroat) • Clearly, this isn’t right: Diagnostic case: • we may not know exactly which collections of symptoms or tests allow us to infer a diagnosis (qualification problem) • even if we did we may not have that information • even if we do, how do we know it is correct? Causal rules: • Symptoms don’t usually appear guaranteed; note logical case would use contrapositive • There are lots of causes for symptoms; if we miss one we might get an incorrect inference • How do we reason backwards?

Uncertainty • The problem with pure FOL is that it deals with black and write • The world isn’t black and write because of uncertainty: • Uncertainty due to imprecision or noise • Uncertainty because we don’t know everything about the domain • Uncertainty because in practice we often cannot acquire all the information we’d like. • As a result, we’d like to assign a degree of belief (or plausibility or possibility) to any statement we make • note this is different than a degree of truth!

Ways of Handling Uncertainty • MYCIN: operationalize uncertainty with the rules: • a  b with certainty 0.7 • we know a with certainty 1 • ergo, we know b with 0.7 • but, we if we also know • a  c with certainty 0.6 • b v c  d with certainty 1 • do we know d with certainty .7, .6, .88, 1, ....? • suppose a ~e and ~e  ~d .... • In a rule-based system, such non-local dependencies are hard to catch

Probability • Problems such as this have led people to invent lots of calculi for uncertainty; probability still dominates • Basic idea: • I have some DoB (a prior probability) about some proposition p • I receive evidence about p; the evidence is related to p by a conditional probability • From these two quantities, I can compute an updated DoB about p --- a posterior probability

Probability Review • Basic probability is on propositions or propositional statements: • P(A) (A is a proposition) • P(Accident), P(phonecall), P(Cold) • P(X = v) (X is a random variable; v a value) • P(card = JackofClubs), P(weather=sunny), .... • P(A v B), P(A ^ B), P(~A) ... • Referred to as the prior or unconditional probability • The conditional probability of A given B P(A | B) = P(A,B)/P(B) • the product rule P(A,B) = P(A | B) * P(B) • Conditional independence P(A | B) = P(A) • A is conditionally independent of B

Probability Review • The joint distribution of A and B • P(A,B) = x ( equivalent to P(A ^ B) = x) P(A=1,B) = .1 P(A=1) = .1 + .2 = .3 P(A =1 | B) = .1/.4 = .25

Alarm System Example • A burglary alarm system is fairly reliable at detecting burglary • It may also respond to minor earthquakes • Neighbors John and Mary will call when they hear the alarm • John always calls when he hears the alarm • He sometimes confuses the telephone with the alarm and calls • Mary sometimes misses the alarm • Given the evidence of who has or has not called, we would like to estimate the probability of a burglary.

Alarm System Example • P(Alarm|Burglary) A burglary alarm system is fairly reliable at detecting burglary • P(Alarm|Earthquake) It may also respond to minor earthquakes • P(JohnCalls|Alarm), P(MaryCalls|Alarm) Neighbors John and Mary will call when they hear the alarm • John always calls when he hears the alarm • P(JohnCalls|~Alarm) He sometimes confuses the telephone with the alarm and calls • Mary sometimes misses the alarm • Given the evidence of who has or has not called, we would like to estimate the probability of a burglary. P(Burglary|JohnCalls,MaryCalls)

earthquake burglary alarm John calls Mary calls Influence Diagrams • Another way to present this information is an influence diagram

earthquake burglary alarm John calls Mary calls Influence Diagrams • A set of random variables. • A set of directed arcs • An arc from X to Y means that X has influence on Y. • Each node has an associated conditional probability table. • The graph has no directed cycle.

earthquake burglary alarm John calls Mary calls Conditional Probability Tables • Each row contains the conditional probability for a possible combination of values of the parent nodes • Each row must sum to 1

P(E) P(B) earthquake burglary 0.002 0.001 B E P(A) T T T F F T F F 0.95 0.94 0.29 0.001 alarm John calls Mary calls A P(A) A P(A) T F 0.90 0.05 T F 0.70 0.01 Belief Network for the Alarm

The Semanics of Belief Networks • The probability that the alarm sounded but neither a burglary nor an earthquake has occurred and both John and Mary call • P(J ^ M ^ A ^ ~B ^ ~E) = P(J | A) P(M | A) P(A | ~B ^ ~E) P(~B) P(~E) = 0.9 * 0.7 * 0.001 * 0.999* 0.998 = 0.00062 • More generally, we can write this as • P(x1, ... xn) = πi P(xi | Parents(Xi))

Constructing Belief Networks • Choose the set of variables Xi that describe the domain • Choose an ordering for the variables • Ideally, work backward from observables to root causes • While there are variables left: • Pick a variable Xi and add it to the network • Set Parents{Xi} to the minimal set of nodes such that conditional independence holds • Define the conditional probability table for Xi • Once you’re done, its likely you’ll realize you need to fiddle a little bit!

Node Ordering • The correct order to add nodes is • Add the “root causes” first • Then the variables they influence • And so on… • Alarm example: consider the ordering • MaryCalls, JohnCalls, Alarm, Burglary, Earthquake • MaryCalls, JohnCalls, Earthquake, Burglary, Alarm John calls Mary calls earthquake burglary alarm

Probabilistic Inference • Diagnostic inference (from effets to causes) • Given that JohnCalls, infer that P(B|J) = 0.016 • Causal inference (from causes to effects) • Given Burglary, P(J|B) = 0.86 and P(M|B) = 0.67 • Intercausal inference (between causes of a common effect) • Given Alarm, P(B|A) = 0.376 • If Earthquake is also true, P(B|A^E) = 0.003 • Mixed inference (combining two or more of the above) • P(A|J ^ ~E) = 0.03 • P(B|J ^ ~E) = 0.017

X Y E Z Z Z Conditional Independence D-separation • if every undirected path from a set of nodes X to a set of nodes Y is d-separated by E, then X and Y are conditionally independent given E • a set of nodes E d-separates two sets of nodes X and Y if every undirected path from a node in X to a node in Y is blocked given E

X Y E Z Z Z Conditional Independence • An undirected path from X to Y is blocked given E if there is a node Z s.t. • Z is in E and there is one arrow leading in and one arrow leading out • Z is in E and Z has both arrows leading out • Neither Z nor any descendant of Z is in E and both path arrows lead into Z

An Inference Algorithm for Belief Networks • In order to develop an algorithm, we will assume our networks are singly connected • A network is singly connected if there is at most a single undirected path between nodes in the network • note this means that any two nodes can be d-separated by removing a single node • These are also known as polytrees. • We will then consider a generic node X with parents U1...Um,and children Y1 ... Yn. • parents of Yi are Zi,j • Evidence above X is Ex+; below is Ex-

U1 Um Z1j Z1j Y1 Y1 Singly Connected Network Ex+ … X Ex- …

Inference in Belief Networks • P(X|Ex) = P(X | Ex+, Ex-) = k P(Ex- | X, Ex+) P(X | Ex+)k P(Ex- | X) P(X | Ex+) • the last follows by noting that X d-separates its parents and children • Now, we note that we can apply the product rule to the second term P(X | Ex+) = Σu P(X | u, Ex+) P(u | Ex+) = ΣuP(X | u) πi P(ui | EU/X) again, these last facts follow from conditional independence • Note that we now have a recursive algorithm: the first term in the sum is just a table lookup; the second is what we started with on a smaller set of nodes. i

The Algorithm Evidence-Except(X,V) return P(E-X\V| X) Y  children[X] – V if Y is empty then return a uniform distribution else for each Yiin Y do calculate P(E-Yi|yi) = Evidence-Except(Yi, null) Zi = PARENTS(Yi) – X foreach Zij in Zi calculate P(Zij | Ezij\Yi) = Support-Except(Zij,Yi) return k2πiΣy P(Ex-| yi) Σz P(yi | X, zi ) πj P(zij | EZij/Yi)

The Call • For a node X, call Support-Except(X,null)

PathFinder • Diagnostic system for lymph node disease • Pathfinder IV a Bayesian model • 8 hrs devising vocabulary • 35 hrs defining topology • 40 hrs to make 14000 probability assessments • most recent version appears to outperform the experts who designed it!

Other Uncertainty Calculi • Dempster-Shafer Theory • Ignorance: there are sets which have no probability • In this case, the best you can do, in some cases, is bound the probability • D-S theory is one way of doing this • Fuzzy Logic • Suppose we introduce a fuzzy membership function (a degree of membership • Logical semantics are based on set membership • Thus, we get a logic with degrees of truth • e.g. John is a big man  bigman(John) w. truth value 0.

Intro to AI Uncertainty