Bayesian Networks

Bayesian Networks 主講人：虞台文大同大學資工所智慧型多媒體研究室

Contents • Introduction • Probability Theory  Skip • Inference • Clique Tree Propagation • Building the Clique Tree • Inference by Propagation

Bayesian Networks Introduction 大同大學資工所智慧型多媒體研究室

What is Bayesian Networks? • Bayesian Networks are directed acyclic graphs (DAGs) with an associated set of probability tables. • The nodes are random variables. • Certain independence relations can be induced by the topology of the graph.

Why Use a Bayesian Network? • Deal with uncertainty in inference via probability  Bayes. • Handle incomplete data set, e.g., classification, regression. • Model the domain knowledge, e.g., causal relationships.

Use a DAG to model the causality. Example Train Strike Norman Oversleep Martin Oversleep Martin Late Norman Late Boss Failure-in-Love Project Delay Office Dirty Boss Angry

Train Strike Norman Oversleep Martin Oversleep Martin Late Norman Late Boss Failure-in-Love Project Delay Office Dirty Boss Angry Attach prior probabilities to all root nodes Example

Train Strike Norman Oversleep Martin Oversleep Martin Late Norman Late Boss Failure-in-Love Project Delay Office Dirty Boss Angry Attach prior probabilities to non-root nodes Example Each column is summed to 1. Norman untidy

Train Strike Norman Oversleep Martin Oversleep Martin Late Norman Late Boss Failure-in-Love Project Delay Office Dirty Boss Angry Attach prior probabilities to non-root nodes Example Each column is summed to 1. Norman untidy What is the difference between probability & fuzzy measurements?

Medical Knowledge Example

Definition of Bayesian Networks A Bayesian network is a directed acyclic graph with the following properties: • Each node represents a random variable. • Each node representing a variableA with parent nodes representing variables B1, B2,..., Bn is assigned a conditional probability table (CPT):

Problems • How to inference? • How to learn the probabilities from data? • How to learn the structure from data? • What applications we may have? Bad news: All of them are NP-Hard

Bayesian Networks Inference 大同大學資工所智慧型多媒體研究室

Inference

Train Strike Martin Late Norman Late Example Questions: P (“Martin Late”, “Norman Late”, “Train Strike”)=? Joint distribution P(“Martin Late”)=? Marginal distribution Conditional distribution P(“Matrin Late” | “Norman Late ”)=?

Train Strike Martin Late Norman Late Example Demo C A B Questions: P (“Martin Late”, “Norman Late”, “Train Strike”)=? Joint distribution e.g.,

Train Strike Martin Late Norman Late Example Demo C A B Questions: P (“Martin Late”, “Norman Late”)=? Marginal distribution e.g.,

Train Strike Martin Late Norman Late Example C A B Demo Questions: P (“Martin Late”)=? Marginal distribution e.g.,

Train Strike Martin Late Norman Late Example C A B Questions: P (“Martin Late” | “Norman Late”)=? Conditional distribution e.g., Demo

Inference Methods • Exact Algorithms: • Probability propagation • Variable elimination • Cutset Conditioning • Dynamic Programming • Approximation Algorithms • Variational methods • Sampling (Monte Carlo) methods • Loopy belief propagation • Bounded cutset conditioning • Parametric approximation methods

The given terms are called evidences. Independence Assertions • Bayesian Networks have build-in independent assertions. • An independence assertion is a statement of the form • X and Y are independent given Z • We called that X and Y are d-separated by Z. That is, or

Y1 Y4 Y2 Y3 X1 X2 W2 X3 W1 d-Separation Z

Y1 Y4 Y2 Y3 X1 X2 W2 X3 W1 Type of Connections Serial Connections Yi– Z– Xj Converge Connections Y1/2– Z– Y3/4 Z Y3– Z– Y4 Diverge Connections Xi– Z– Xj

X X Y Z Z Z X Y Y d-Separation Serial Converge Diverge

X1 X2 X3 X4 X5 X6 X7 X9 X8 X10 X11 JPT: Joint probability table CPT: Conditional probability table Joint Distribution  With this, we can compute all probabilities By chain rule By independence assertions Parents ofXi Consider binary random variables: • To store JPT of all r.v’s : 2n1 table entries • To store CPT of all r.v’s: ? table entries

X1 X2 X3 X4 X5 X6 X7 X9 X8 X10 X11 Joint Distribution Consider binary random variables: • To store JPT of all r.v’s : 2n1 table entries • To store CPT of all r.v’s: ? table entries

X1 X2 1 1 X3 X4 1 2 X5 X6 X7 2 2 8 X9 X8 2 2 X10 X11 4 4 Joint Distribution To store JPT of all random variables: To store CPT of all random variables:

X E Y More on d-Separation • It is linear or diverge and not a member of E; or • It is converging, and either N or one of its descendants is in E. A path from X to Y is d-connecting w.r.t evidence nodes E is every interior nodesN in the path has the property that either

X E Y Identify the d-connecting and non-d-connecting paths from X to Y. More on d-Separation • It is linear or diverge and not a member of E; or • It is converging, and either N or one of its descendants is in E. A path from X to Y is d-connecting w.r.t evidence nodes E is every interior nodesN in the path has the property that either

X E Y More on d-Separation Two nodes are d-separated if there is no d-connecting path between them. Exercise: Withdraw minimum number of edges such that X and Y are d-separated.

X E Y More on d-Separation Two set of nodes, say, X={X1, …, Xm}andY={Y1, …, Yn} are d-separatedw.r.t. evidence nodesE if any pair of XiandYj are d-separated w.r.t. E. In this case, we have

Bayesian Networks Clique Tree Propagation 大同大學資工所智慧型多媒體研究室

References • Developed by Lauritzen and Spiegelhalter and refined by Jensen et al. Lauritzen, S. L., and Spiegelhalter, D. J., Local computations with probabilities on graphical structures and their application to expert systems, J. Roy. Stat. Soc. B, 50, 157-224, 1988. Jensen, F. V., Lauritzen, S. L., and Olesen, K. G., Bayesian updating in causal probabilistic networks by local computations, Comp. Stat. Quart., 4, 269-282, 1990. Shenoy, P., and Shafer, G., Axioms for probability and belief-function propagation, in Uncertainty and Articial Intelligence, Vol. 4 (R. D. Shachter, T. Levitt, J. F. Lemmer and L. N. Kanal, Eds.), Elsevier, North-Holland, Amsterdam, 169-198, 1990.

Clique Tree Propagation (CTP) • Given a Bayesian Network, build a secondary structure, called clique tree. • An undirected tree • Inference by propagation the belief potential among tree nodes. • It is an exact algorithm.

Notations

A B C G D E H F Definition: Family of a Node The family of a node V, denoted as FV, is defined by: Examples:

A B C G D E H F We will model the probability tables as potential functions. Potential and Distributions Function of a. All of these tables map a set of random variables to a real value. Prior probability Conditional probability Conditional probability Function of a and b. Function of d, e and f.

Potential Used to implement matrices or tables. Two operations: 1. Marginalization: 2. Multiplication:

Marginalization Example:

Multiplication Not necessary sum to one. x and y are consistent with z. Example:

The Secondary Structure Given a Bayesian Network over a set of variables U = {V1, …, Vn} , its secondary structure contains a graphical and a numerical component. Graphic Component: An undirected clique tree: satisfies the join tree property. Numerical Component: Belief potentials on nodes and edges.

A B C G D E H F ABD ADE ACE CEG AD AE CE EG DE DEF EGH How to build a clique tree? The Clique Tree T The clique tree Tfor a belief network over a set of variables U = {V1, …, Vn} satisfies the following properties. • Each node in T is a cluster or clique (nonempty set) of variables. • The clusters satisfy the join tree property: • Given two clusters X and Y in T, all clusters on the path between X and Y contain XY. • For each variable VU, FV is included in at least one of the cluster. • Sepsets: Each edge in T is labeled with the intersection of the adjacent clusters.

ABD ADE ACE CEG AD AE CE EG DE DEF EGH How to assign belief functions? The Numeric Component • For each cluster X and neighboring sepset S, it holds that • It also holds that Clusters and sepsets are attached with belief functions. Local Consistency Global Consistency

ABD ADE ACE CEG AD AE CE EG DE DEF EGH How to assign belief functions? The Numeric Component Clusters and sepsets are attached with belief functions. The key step to satisfy these constraints by letting and If so,

Bayesian Networks Building the Clique Tree 大同大學資工所智慧型多媒體研究室

The Steps Belief Network Moral Graph Triangulated Graph Clique Set Join Tree

Belief Network A A Moral Graph B B C C G G Triangulated Graph D D E E H H F F Clique Set Join Tree Moral Graph Belief Network Moral Graph • Convert the directed graph to undirected. • Connect each pair of parent nodes for each node.

Belief Network A A Moral Graph B B C C G G Triangulated Graph D D E E H H F F Clique Set Join Tree This step is, in fact, done by incorporating with the next step. Triangulation Moral Graph Triangulated Graph • Triangulate the cycles with length more than 4 There are many ways.

Belief Network A A Moral Graph B B C C G G Triangulated Graph D D E E H H F F Clique Set Join Tree Select Clique Set • CopyGMtoGM’. • While GM’ is not empty • select a node V from GM’, according to a criterion. • Node V and its neighbor form a cluster. • Connect all the nodes in the cluster. For each edge added to GM’, add the same edge to GM. • Remove V from GM’.

Belief Network A A Moral Graph B B C C G G Triangulated Graph D D E E H H F F Clique Set Join Tree • Criterion: • The weight of a node V is the number of values of V. • The weight of a cluster is the product of it constituent nodes. • Choose the node that causes the least number of edges to be added. • Breaking ties by choosing the node that induces the cluster with the smallest weight. Select Clique Set • CopyGMtoGM’. • While GM’ is not empty • select a node V from GM’, according to a criterion. • Node V and its neighbor form a cluster. • Connect all the nodes in the cluster. For each edge added to GM’, add the same edge to GM. • Remove V from GM’.

Bayesian Networks