250 likes | 456 Views
Bayesian Networks. VISA Hyoungjune Yi. BN – Intro. Introduced by Pearl (1986 ) Resembles human reasoning Causal relationship Decision support system/ Expert System. Common Sense Reasoning about uncertainty. June is waiting for Larry and Jacobs who are both late for VISA seminar
E N D
Bayesian Networks VISA Hyoungjune Yi
BN – Intro. • Introduced by Pearl (1986 ) • Resembles human reasoning • Causal relationship • Decision support system/ Expert System
Common Sense Reasoning about uncertainty • June is waiting for Larry and Jacobs who are both late for VISA seminar • June is worried that if the roads are icy one or both of them may have crash his car • Suddenly June learns that Larry has crashed • June think: “If Larry has crashed then probably the roads are icy. So Jacobs has also crashed” • June then learns that it is warm outside and roads are salted • June Think: “Larry was unlucky; Jacobs should still make it”
Causal Relationships State of Road Icy/ not icy Jacobs Crash/No crash Larry Crash/No crash
Larry Crashed ! State of Road Icy/ not icy Information Flow Jacobs Crash/No crash Larry Crash/No crash
But Roads are dry State of Road not icy Information Flow Jacobs Crash/No crash Larry Crash/No crash
Wet grass • To avoid icy roads, Larry moves to UCLA; Jacobs moves in USC • One morning as Larry leaves for work, he notices that his grass is wet. He wondered whether he has left his sprinkler on or it has rained • Glancing over to Jacobs’ lawn he notices that it is also get wet • Larry thinks: “Since Jacobs’ lawn is wet, it probably rained last night” • Larry then thinks: “If it rained then that explains why my lawn is wet, so probably the sprinkler is off”
Larry’s grass is wet Sprinkler On/Off Rain Yes/no Information Flow Larry’s grass Wet Jacobs grass Wet/Dry
Jacobs’ grass is also wet Sprinkler On/Off Rain Yes/no Information Flow Larry’s grass Wet Jacobs grass Wet
Bayesian Network • Data structure which represents the dependence between variables • Gives concise specification of joint prob. dist. • Bayesian Belief Network is a graph that holds • Nodes are a set of random variables • Each node has a conditional prob. Table • Edges denote conditional dependencies • DAG : No directed cycle • Markov condition
Y1 Y2 X Bayesian network • Markov Assumption • Each random variable X is independent of its non-descendent given its parent Pa(X) • Formally, Ind(X; NonDesc(X) | Pa(X))if G is an I-MAP of P (<-? )I-MAP? Later
Burglary Earthquake Radio Alarm Call Markov Assumption • In this example: • Ind( E; B ) • Ind( B; E, R ) • Ind( R; A,B, C | E ) • Ind( A;R | B,E ) • Ind( C;B, E, R |A)
X Y X Y I-Maps • A DAG G is an I-Map of a distribution P if the all Markov assumptions implied by G are satisfied by P • Examples:
I-MAP • G is Minimal I-Map iff • G is I-Map of P • If G’ G then G’ is not an I-Map of P • I-Map is not unique
X Y Factorization • Given that G is an I-Map of P, can we simplify the representation of P? • Example: • Since Ind(X;Y), we have that P(X|Y) = P(X) • Applying the chain ruleP(X,Y) = P(X|Y) P(Y) = P(X) P(Y) • Thus, we have a simpler representation of P(X,Y)
Burglary Earthquake Radio Alarm Call Factorization Theorem Thm: if G is an I-Map of P, then P(C,A,R,E,B) = P(B)P(E|B)P(R|E,B)P(A|R,B,E)P(C|A,R,B,E) versus P(C,A,R,E,B) = P(B) P(E) P(R|E) P(A|B,E) P(C|A)
So, what ? • We can write P in terms of “local” conditional probabilities • If G is sparse, that is, |Pa(Xi)| < k , each conditional probability can be specified compactly e.g. for binary variables, these require O(2k) params. representation of P is compact linear in number of variables
Formal definition of BN • A Bayesian network specifies a probability distribution via two components: • A DAG G • A collection of conditional probability distributions P(Xi|Pai) • The joint distribution P is defined by the factorization • Additional requirement: G is a minimal I-Map of P
Pneumonia Tuberculosis T P(I |P, T ) P p t 0.8 0.2 Lung Infiltrates p t 0.6 0.4 p t 0.2 0.8 XRay Sputum Smear p t 0.01 0.99 Bayesian Network - Example • Each node Xi has a conditional probability distribution P(Xi|Pai) • If variables are discrete, P is usually multinomial • Pcan be linear Gaussian, mixture of Gaussians, …
P T I X S BN Semantics • Compact & natural representation: • nodes have k parents 2k n vs. 2n params conditional independencies in BN structure local probability models full joint distribution over domain = +
d-separation • d-sep(X;Y | Z, G) • X is d-separated from Y, given Z if all paths from a node in X to a node in Y are blocked given Z • Meaning ? • On the blackboard • Path • Active: dependency between end nodes in the path • Blocked: No dependency • Common cause, Intermediate, common effect • On the blackboard
BN – Belief, Evidence and Query • BN is for “Query” - partly • Query involves evidence • Evidence is an assignment of values to a set of variables in the domain • Query is a posteriori belief • Belief • P(x) = 1 or P(x) = 0
Learning Structure • Problem Definition • Given: Data D • Return: directed graph expressing BN • Issue • Superfluous edges • Missing edges • Very difficult • http://robotics.stanford.edu/people/nir/tutorial/
BN models can be learned from empirical data parameter estimation via numerical optimization structure learning via combinatorial search. BN hypothesis space biased towards distributions with independence structure. P T I X S BN Learning Inducer Data