180 likes | 324 Views
BAYESIAN NETWORKS CHAPTER#4. Book: Modeling and Reasoning with Bayesian Networks Author : Adnan Darwiche Publisher: CambridgeUniversity Press 2009. Introduction.
E N D
BAYESIAN NETWORKS CHAPTER#4 Book: Modeling and Reasoning with Bayesian Networks Author : AdnanDarwiche Publisher: CambridgeUniversity Press 2009
Introduction • Joint Probability Distribution can be used to model uncertain beliefs and change them in the face of Hard and Soft Evidence. • Problem with JPD is that size grows exponentially with the number of variables which introduces modeling and computational difficulties.
Need for BN • BN is a graphical modeling tool for compactly specifying JPD • BN relies on the basic insight that: “ independence forms a significant aspect of belief” “Elicitation is relatively easily using the language of graph”
Example • BN is a Directed Acyclic Graph Earthquake (E) Burglary (B) Nodes are Propositional Variables Edges are Direct Causal Influences Radio (R) Alarm (A) Call (C)
Example • We would expect our belief in C to be influenced by some Evidence on R • For example if we get a Radio report that an Earthquake took place then our belief in Alarm triggering would increase which would increase our belief in receiving call from a neighbor • However we would not change our belief if we knew for sure that the Alarm did not trigger • Thus C would be independent of R given ¬A
Formal Representation of Independence Given a variable V in a DAG G: • Parents (V) are the parents of V [Direct Causes of V] • Descendants(V) are the set of variables N with a directed path from V to N [Effects of V] • Non_Descendants(V) are the variables other that Parents and Descendants
Independence Statement / Markovian Assumption I ( V, Parents (V), Non_Descendants(V)) ….. 4.1 • That is every variable is conditionally independent of its Non Descendants given its parents known as Markovian Assumption denoted by Markov(G) 4.1 can also be read as: • Given the direct causes of a variable, our beliefs in that variable will no longer be influenced by any other variable except possibly by its effects
Examples of Independence Statements • I (C,A, {B,E,R} ) • I (R,E, {A,B,C} ) • I (A,{B,E}, R) • I (B, ø , {E,R}) • I (E, ø , B) Earthquake (E) Burglary (B) Radio (R) Alarm (A) Call (C)
Parameterizing the Independence Structure • Parameterizing means quantifying the dependencies between Nodes and their Parents • In other words construction of CPT • For every variable X in the DAG G and its parents U, we need to provide the probability Pr(x|u) for every value x of variable X and every instantiation u of parents U
Formal Definition of Bayesian Network • A Bayesian Network for variables Z is a pair where: • G is a directed acyclic graph over variables Z called the Network Structure • is a set of CPT’s one for each variable in Z called the Network Parameterization • (X|U) would be used to denote the CPT for variable X and its parents U, and refer to the set XU as a Network Family.
Def (continue..) • denotes the value assigned by CPT to the conditional probability Pr (x|u) and call it Network Parameter • Instantiation of all the network variables are called Network Instantiations
Chain Rule for Bayesian Networks • Network Instantiations z is simply the product of all network parameters compatible with z
Properties of Probabilistic Independence • Recall : I (X,Z,Y) Pr(x|z,y) = Pr(x|z) or Pr(y|z) =0 for all instantiations x,y,z • Graphoid Axioms: • Symmetry • Weak Union • Decomposition • Contraction
Symmetry IPr(X,Z,Y) if and only if IPr(Y,Z,X) • If learning Y does not influence our belief in x then learning x does not influence our belief in y • By Markov(G) we know that: • I (A,{B,E},R) • Using Symmetry: • I (R,{B,E},A) Earthquake (E) Burglary (B) Alarm (A) Radio (R) Call (C)
Decomposition IPr(X,Z,YUW) only if IPr(X,Z,Y) and IPr(X,Z,W) • If learningyw does not influence our belief in x then learning y alone or learning w alone does not influence our belief in x
Weak Union IPr(X,Z,YUW) only if IPr(X,ZUY,W) • If the informationyw is not relevant to our belief in x then the partial information will not make the rest of the information relevant
Contraction IPr(X,Z,Y) and I (X,ZUY,W) only if IPr(X,Z,YUW) • If learning the irrelevant information y the information w is found to be irrelevant to our belief in x then the combined information must have been irrelevant from the beginning