Bayesian Networks

Bayesian Networks VISA Hyoungjune Yi

BN – Intro. • Introduced by Pearl (1986 ) • Resembles human reasoning • Causal relationship • Decision support system/ Expert System

Common Sense Reasoning about uncertainty • June is waiting for Larry and Jacobs who are both late for VISA seminar • June is worried that if the roads are icy one or both of them may have crash his car • Suddenly June learns that Larry has crashed • June think: “If Larry has crashed then probably the roads are icy. So Jacobs has also crashed” • June then learns that it is warm outside and roads are salted • June Think: “Larry was unlucky; Jacobs should still make it”

Causal Relationships State of Road Icy/ not icy Jacobs Crash/No crash Larry Crash/No crash

Larry Crashed ! State of Road Icy/ not icy Information Flow Jacobs Crash/No crash Larry Crash/No crash

But Roads are dry State of Road not icy Information Flow Jacobs Crash/No crash Larry Crash/No crash

Wet grass • To avoid icy roads, Larry moves to UCLA; Jacobs moves in USC • One morning as Larry leaves for work, he notices that his grass is wet. He wondered whether he has left his sprinkler on or it has rained • Glancing over to Jacobs’ lawn he notices that it is also get wet • Larry thinks: “Since Jacobs’ lawn is wet, it probably rained last night” • Larry then thinks: “If it rained then that explains why my lawn is wet, so probably the sprinkler is off”

Larry’s grass is wet Sprinkler On/Off Rain Yes/no Information Flow Larry’s grass Wet Jacobs grass Wet/Dry

Jacobs’ grass is also wet Sprinkler On/Off Rain Yes/no Information Flow Larry’s grass Wet Jacobs grass Wet

Bayesian Network • Data structure which represents the dependence between variables • Gives concise specification of joint prob. dist. • Bayesian Belief Network is a graph that holds • Nodes are a set of random variables • Each node has a conditional prob. Table • Edges denote conditional dependencies • DAG : No directed cycle • Markov condition

Y1 Y2 X Bayesian network • Markov Assumption • Each random variable X is independent of its non-descendent given its parent Pa(X) • Formally, Ind(X; NonDesc(X) | Pa(X))if G is an I-MAP of P (<-? )I-MAP? Later

Burglary Earthquake Radio Alarm Call Markov Assumption • In this example: • Ind( E; B ) • Ind( B; E, R ) • Ind( R; A,B, C | E ) • Ind( A;R | B,E ) • Ind( C;B, E, R |A)

X Y X Y I-Maps • A DAG G is an I-Map of a distribution P if the all Markov assumptions implied by G are satisfied by P • Examples:

I-MAP • G is Minimal I-Map iff • G is I-Map of P • If G’  G then G’ is not an I-Map of P • I-Map is not unique

X Y Factorization • Given that G is an I-Map of P, can we simplify the representation of P? • Example: • Since Ind(X;Y), we have that P(X|Y) = P(X) • Applying the chain ruleP(X,Y) = P(X|Y) P(Y) = P(X) P(Y) • Thus, we have a simpler representation of P(X,Y)

So, what ? • We can write P in terms of “local” conditional probabilities • If G is sparse, that is, |Pa(Xi)| < k ,  each conditional probability can be specified compactly e.g. for binary variables, these require O(2k) params.  representation of P is compact linear in number of variables

Formal definition of BN • A Bayesian network specifies a probability distribution via two components: • A DAG G • A collection of conditional probability distributions P(Xi|Pai) • The joint distribution P is defined by the factorization • Additional requirement: G is a minimal I-Map of P

Pneumonia Tuberculosis T P(I |P, T ) P p t 0.8 0.2 Lung Infiltrates p t 0.6 0.4 p t 0.2 0.8 XRay Sputum Smear p t 0.01 0.99 Bayesian Network - Example • Each node Xi has a conditional probability distribution P(Xi|Pai) • If variables are discrete, P is usually multinomial • Pcan be linear Gaussian, mixture of Gaussians, …

P T I X S BN Semantics • Compact & natural representation: • nodes have  k parents  2k n vs. 2n params conditional independencies in BN structure local probability models full joint distribution over domain = +

d-separation • d-sep(X;Y | Z, G) • X is d-separated from Y, given Z if all paths from a node in X to a node in Y are blocked given Z • Meaning ? • On the blackboard • Path • Active: dependency between end nodes in the path • Blocked: No dependency • Common cause, Intermediate, common effect • On the blackboard

BN – Belief, Evidence and Query • BN is for “Query” - partly • Query involves evidence • Evidence is an assignment of values to a set of variables in the domain • Query is a posteriori belief • Belief • P(x) = 1 or P(x) = 0

Learning Structure • Problem Definition • Given: Data D • Return: directed graph expressing BN • Issue • Superfluous edges • Missing edges • Very difficult • http://robotics.stanford.edu/people/nir/tutorial/

BN models can be learned from empirical data parameter estimation via numerical optimization structure learning via combinatorial search. BN hypothesis space biased towards distributions with independence structure. P T I X S BN Learning Inducer Data

Bayesian Networks