Probabilistic Reasoning

Probabilistic Reasoning ECE457 Applied Artificial Intelligence Spring 2008 Lecture #9

Outline • Bayesian networks • D-separation and independence • Inference • Russell & Norvig, sections 14.1 to 14.4 ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 2

Recall the Story from FOL • Anyone passing their 457 exam and winning the lottery is happy. Anyone who studies or is lucky can pass all their exams. Bob did not study but is lucky. Anyone who’s lucky can win the lottery. • Is Bob happy? ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 3

Add Probabilities • Anyone passing their 457 exam and winning the lottery has a 99% chance of being happy. Anyone only passing their 457 exam has an 80%, while someone only winning the lottery has a 60% chance of being happy, and someone who does neither has a 20% chance of being happy. Anyone who studies has a 90% chance of passing their exams. Anyone who’s lucky has a 50% chance of passing their exams. Anyone who’s both lucky and who studied has a 99% chance of passing, but someone who didn’t study and is unlucky has a 1% chance of passing. There’s a 20% chance that Bob studied, but a 75% chance that he’ll be lucky. Anyone who’s lucky has a 40% chance of winning the lottery, while an unlucky person only has a 1% chance of winning. • What’s the probability of Bob being happy? ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 4

Probabilities in the Story • Example of probabilities in the story • P(Lucky) = 0.75 • P(Study) = 0.2 • P(PassExam|Study) = 0.9 • P(PassExam|Lucky) = 0.5 • P(Win|Lucky) = 0.4 • P(Happy|PassExam,Win) = 0.99 • Some variables directly affect others! • Graphical representation of dependencies and conditional independencies between variables? ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 5

Study Lucky PassExam Win Happy Bayesian Network • Belief network • Directed acyclic graph • Nodes represent variables • Edges represent conditional relationships • Concise representation of any full joint probability distribution ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 6

Study Lucky PassExam Win Happy Bayesian Network • Nodes with no parents have prior probabilities • Nodes with parents have conditional probability tables • For all truth value combinations of their parents ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 7

Study Lucky PassExam Win Happy Bayesian Network P(L) = 0.75 P(S) = 0.2 ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 8

m x a b c d g f e j h i k z n l o p q r s t u v w y Bayesian Network ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 9

Chain Rule • A node is conditionally independent of its predecessors given its parents • If we know the value of a node’s parents, we don’t care about more distant ancestors • Their influence is included through the parents • More generally, a node is conditionally independent of its non-descendents given its parents • Update chain rule • P(A1,A2,…,An) = i=1n P(Ai|parents(Ai)) • parents(Ai)  { Ai+1, …, An } ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 11

Chain Rule Example • Probability that Bob is happy because he won the lottery and passed his exam, because he’s lucky but did not study • P(H,W,E,L,S) = P(H|WE) * P(W|L) * P(E|LS) * P(L) * P(S)P(H,W,E,L,S) = 0.99 * 0.4 * 0.5 * 0.75 * 0.8P(H,W,E,L,S) = 0.12 ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 12

Study Lucky PassExam Win Happy Constructing Bayesian Nets • Build from the top-down • Start with root nodes • Add children • Go down to leaves ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 13

Study Lucky PassExam Win Happy Constructing Bayesian Nets • What happens if we build with the wrong order? • Network becomes needlessly complicated • Node ordering is important! ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 14

Connections • We can understand dependence in a network by considering how evidence is transmitted through it • Information entered at one node • Propagates to descendents and ancestors through connected nodes • Provided no node in path already has evidence (in which case we would stop the propagation) ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 15

Study Lucky PassExam Win Happy Serial Connection • Study and Happy are dependent • Study and Happy are independent given PassExam • Intuitively, the only way Study can affect Happy is through PassExam ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 16

Study Lucky PassExam Win Happy Diverging Connection • Win and PassExams are dependent • Win and PassExams are independent given Lucky • Intuitively, Lucky can explain both Win and PassExam. Win and PassExam can affect each other by changing the belief in Lucky ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 17

Study Lucky PassExam Win Happy Converging Connection • Lucky and Study are independent • Lucky and Study are dependent given PassExam • Intuitively, Lucky can be used to explain away Study ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 18

D-Separation • Determine if two variables are independent given some other variables • X is independent of Y given Z if X and Y are d-separate given Z • X is d-separate from Y if, for all (undirected) paths between X and Y, there exists a node Z for which: • The connection is serial or diverging and there is evidence for Z • The connection is converging and there is no evidence for Z or any of its descendents ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 19

X X X Y Y Y D-Separation ZBlocks path if not in evidence ZBlocks path if in evidence ZBlocks path if in evidence Z2 Blocks path if not in evidence ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 20

D-Separation • Can be computed in linear time using depth-first-search algorithm • Fast algorithm to know if two nodes are independent • Allows us to infer whether learning the value of a variable might give us information about another variable given what we already know • All d-separated variables are independent but not all independent variable are d-separated ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 21

f b c d e f g h i d a a j d e f g h i j a b b c j c i h i j a b c d e f e g g j a b c d e f g h h i D-Separation Exercise • If we observe a value for node e • Nodes that are not d-separate need to be updated • The graph becomes split into two independent, d-separate areas ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 22

a b c d e f g h i j D-Separation Exercise • If we observe a value for node g, what other nodes are updated? • Nodes f, h and i • If we observe a value for node a, what other nodes are updated? • Nodes b, c, d, e, f ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 23

a b c d e f g h i j D-Separation Exercise • Given an observation of c, are nodes a and f independent? • Yes • Given an observation of i, are nodes g and j independent? • No ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 24

l n b c d g h i z k p s u v w y x o Other Independence Criteria • A node is conditionally independent of its non-descendents given its parents • Recall from updated chain rule m ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 25

l n b c d g h i z k p s u v w y x o Other Independence Criteria • A node is conditionally independent of all others in the network given its parents, children, and children’s parents • Markov blanket m ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 26

Inference in Bayesian Network • Compute the posterior probability of a query variable given an observed event • P(A1,A2,…,An) = i=1n P(Ai|parents(Ai)) • Observed evidence variables E = E1,…,Em • Query variable X • Between them: nonevidence (hidden) variables Y = Y1…Yl • Belief network is X  E  Y ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 27

Study Lucky PassExam Win Happy Inference Example P(L) = 0.75 P(S) = 0.2 ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 29

Inference Example #1 • With only the information from the network (and no observations), what’s the probability that Bob won the lottery? • P(W) = l P(W,l)P(W) = l P(W|l)P(l) P(W) = P(W|L)P(L) + P(W|L)P(L)P(W) = 0.4*0.75 + 0.01*0.25P(W) = 0.3025 ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 30

Inference Example #2 • P(W|H) = α 0.2516493 ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 32

Inference Example #2 • P(W|H) = α 0.328878 ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 33

Inference Example #2 • P(W|H) = α <0.2516493, 0.328878>P(W|H) = <0.4335, 0.5665> • Note that P(W|H) > P(W|H) because P(W|L)  P(W|L) • The probability of Bob having won the lottery has increased by 13.1% thanks to our knowledge that he is happy! ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 34

Expert Systems • Bayesian networks used to implement expert systems • Diagnostic systems that contains subject-specific knowledge • Knowledge (nodes, relationships, probabilities) typically provided by human experts • System observes evidence by asking questions to user, then infers most likely conclusion ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 35

Pathfinder • Expert system for medical diagnostic of lymph-node diseases • Very large Bayesian network • Over 60 diseases • Over 100 features of lymph nodes • Over 30 features for clinical information • Lot of work from medical experts • 8 hours to define features and diseases • 35 hours to build network topology • 40 hours to assess probabilities ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 36

Pathfinder • One node for each disease • Assumes the diseases are mutually exclusive and exhaustive • Large domain, hard to handle • Several small networks for diagnostic tasks built individually • Then combined into a single large network ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 37

Pathfinder • Testing the network • 53 test cases (real diagnostics) • Diagnostic accuracy as good as a medical expert ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 38

Assumptions • Learning agent • Environment • Fully observable / Partially observable • Deterministic / Strategic / Stochastic • Sequential • Static / Semi-dynamic • Discrete / Continuous • Single agent / Multi-agent ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 39

Assumptions Updated • We can handle a new combination! • Fully observable & Deterministic • No uncertainty (map of Romania) • Fully observable & Stochastic • Games of chance (Monopoly, Backgammon) • Partially observable & Deterministic • Logic (Wumpus World) • Partially observable & Stochastic ECE457 Applied Artificial Intelligence R. Khoury (2008) Page 40

Probabilistic Reasoning