Bayesian Networks

Bayesian Networks Tamara Berg CS 590-133 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore, Percy Liang, Luke Zettlemoyer, Rob Pless, Killian Weinberger, Deva Ramanan

Announcements • Some students in the back are having trouble hearing the lecture due to talking. • Please respect your fellow students. If you have a question or comment relevant to the course please share with all of us. Otherwise, don’t talk during lecture. • Also, if you are having trouble hearing in the back there are plenty of seats further forward.

Reminder • HW3 was released 2/27 • Written questions only (no programming) • Due Tuesday, 3/18, 11:59pm

From last class

Random Variables Random variables Let be a realization of A random variable is some aspect of the world about which we (may) have uncertainty. Random variables can be: Binary (e.g. {true,false}, {spam/ham}), Take on a discrete set of values (e.g. {Spring, Summer, Fall, Winter}), Or be continuous (e.g. [0 1]).

Joint Probability Distribution Random variables Let be a realization of Joint Probability Distribution: Also written Gives a real value for all possible assignments.

Queries Also written Joint Probability Distribution: Given a joint distribution, we can reason about unobserved variables given observations (evidence): Stuff you care about Stuff you already know

Main kinds of models • Undirected (also called Markov Random Fields) - links express constraints between variables. • Directed (also called Bayesian Networks) - have a notion of causality -- one can regard an arc from A to B as indicating that A "causes" B.

Cavity Weather Toothache Catch Syntax • Directed Acyclic Graph (DAG) • Nodes: random variables • Can be assigned (observed)or unassigned (unobserved) • Arcs: interactions • An arrow from one variable to another indicates direct influence • Encode conditional independence • Weather is independent of the other variables • Toothache and Catch are conditionally independent given Cavity • Must form a directed, acyclic graph

Bayes Nets Directed Graph, G = (X,E) Nodes Edges • Each node is associated with a random variable

Example

Joint Distribution By Chain Rule (using the usual arithmetic ordering)

Directed Graphical Models Directed Graph, G = (X,E) Nodes Edges • Each node is associated with a random variable • Definition of joint probability in a graphical model: • where are the parents of

Example Joint Probability:

Example 0 0 0 0 1 1 1 1 0 0 0 0 1 0 1 1 1 1 0 1 0 0 1 1

Size of a Bayes’ Net • How big is a joint distribution over N Boolean variables? 2N • How big is an N-node net if nodes have up to k parents? O(N * 2k+1) • Both give you the power to calculate • BNs: Huge space savings! • Also easier to elicit local CPTs • Also turns out to be faster to answer queries

The joint probability distribution • For example, P(j, m, a, ¬b, ¬e) • = P(¬b) P(¬e) P(a | ¬b, ¬e) P(j | a) P(m | a)

Independence in a BN • Important question about a BN: • Are two nodes independent given certain evidence? • If yes, can prove using algebra (tedious in general) • If no, can prove with a counter example • Example: • Question: are X and Z necessarily independent? • Answer: no. Example: low pressure causes rain, which causes traffic. • X can influence Z, Z can influence X (via Y) • Addendum: they could be independent: how? X Y Z

Causal Chains • This configuration is a “causal chain” • Is Z independent of X given Y? • Evidence along the chain “blocks” the influence X: Project due Y: No office hours Z: Students panic X Y Z Yes!

Common Cause • Another basic configuration: two effects of the same cause • Are X and Z independent? • Are X and Z independent given Y? • Observing the cause blocks influence between effects. Y X Z Y: Homework due X: Full attendance Z: Students sleepy Yes!

Common Effect • Last configuration: two causes of one effect (v-structures) • Are X and Z independent? • Yes: the ballgame and the rain cause traffic, but they are not correlated • Still need to prove they must be (try it!) • Are X and Z independent given Y? • No: seeing traffic puts the rain and the ballgame in competition as explanation • This is backwards from the other cases • Observing an effect activates influence between possible causes. X Z Y X: Raining Z: Ballgame Y: Traffic

The General Case Causal Chain • Any complex example can be analyzed using these three canonical cases • General question: in a given BN, are two variables independent (given evidence)? • Solution: analyze the graph Common Cause (Unobserved) Common Effect

Bayes Ball • Shade all observed nodes. Place balls at the starting node, let them bounce around according to some rules, and ask if any of the balls reach any of the goal node. • We need to know what happens when a ball arrives at a node on its way to the goal node.

Example R B Yes T T’

Bayesian decision making • Suppose the agent has to make decisions about the value of an unobserved query variable X based on the values of an observed evidence variableE • Inference problem: given some evidence E = e, what is P(X | e)? • Learning problem: estimate the parameters of the probabilistic model P(X | E) given training samples{(x1,e1), …, (xn,en)}

Probabilistic inference • A general scenario: • Query variables:X • Evidence (observed) variables: E = e • Unobserved variables: Y • If we know the full joint distribution P(X, E, Y), how can we perform inference about X?

Inference • Inference: calculating some useful quantity from a joint probability distribution • Examples: • Posterior probability: • Most likely explanation: B E A J M

Inference – computing conditional probabilities Conditional Probabilities: Marginalization:

Inference by Enumeration • Given unlimited time, inference in BNs is easy • Recipe: • State the marginal probabilities you need • Figure out ALL the atomic probabilities you need • Calculate and combine them • Example: B E A J M

Example: Enumeration • In this simple method, we only need the BN to synthesize the joint entries

Probabilistic inference • A general scenario: • Query variables:X • Evidence (observed) variables: E = e • Unobserved variables: Y • If we know the full joint distribution P(X, E, Y), how can we perform inference about X? • Problems • Full joint distributions are too large • Marginalizing out Y may involve too many summation terms

Inference by Enumeration?

Variable Elimination • Why is inference by enumeration on a Bayes Net inefficient? • You join up the whole joint distribution before you sum out the hidden variables • You end up repeating a lot of work! • Idea: interleave joining and marginalizing! • Called “Variable Elimination” • Choosing the order to eliminate variables that minimizes work is NP-hard, but *anything* sensible is much faster than inference by enumeration

General Variable Elimination • Query: • Start with initial factors: • Local CPTs (but instantiated by evidence) • While there are still hidden variables (not Q or evidence): • Pick a hidden variable H • Join all factors mentioning H • Eliminate (sum out) H • Join all remaining factors and normalize

Example: Variable elimination P(at)=.8 P(st)=.6 attend study P(fa)=.9 fair prepared pass Query: What is the probability that a student attends class, given that they pass the exam? [based on slides taken from UMBC CMSC 671, 2005]

Join study factors P(at)=.8 P(st)=.6 attend study P(fa)=.9 fair prepared pass

Marginalize out study P(at)=.8 attend P(fa)=.9 fair prepared, study pass

Remove “study” P(at)=.8 attend P(fa)=.9 fair prepared pass

Join factors “fair” P(at)=.8 attend P(fa)=.9 fair prepared pass

Marginalize out “fair” P(at)=.8 attend prepared pass, fair

Marginalize out “fair” P(at)=.8 attend prepared pass

Join factors “prepared” P(at)=.8 attend prepared pass

Join factors “prepared” P(at)=.8 attend pass, prepared

Join factors “prepared” P(at)=.8 attend pass

Join factors P(at)=.8 attend pass

Join factors attend, pass

Bayesian network inference: Big picture • Exact inference is intractable • There exist techniques to speed up computations, but worst-case complexity is still exponential except in some classes of networks • Approximate inference • Sampling, variational methods, message passing / belief propagation…

Approximate Inference Sampling (particle based method)

Approximate Inference

Bayesian Networks