1 / 81

Bayesian Networks

Explore the fundamentals of Bayesian Networks, directed acyclic graphs with probability tables, inference methods, and Clique Tree Propagation. Learn applications, difference from fuzzy measurements, and inference challenges.

Download Presentation

Bayesian Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bayesian Networks 主講人:虞台文 大同大學資工所 智慧型多媒體研究室

  2. Contents • Introduction • Probability Theory  Skip • Inference • Clique Tree Propagation • Building the Clique Tree • Inference by Propagation

  3. Bayesian Networks Introduction 大同大學資工所 智慧型多媒體研究室

  4. What is Bayesian Networks? • Bayesian Networks are directed acyclic graphs (DAGs) with an associated set of probability tables. • The nodes are random variables. • Certain independence relations can be induced by the topology of the graph.

  5. Why Use a Bayesian Network? • Deal with uncertainty in inference via probability  Bayes. • Handle incomplete data set, e.g., classification, regression. • Model the domain knowledge, e.g., causal relationships.

  6. Use a DAG to model the causality. Example Train Strike Norman Oversleep Martin Oversleep Martin Late Norman Late Boss Failure-in-Love Project Delay Office Dirty Boss Angry

  7. Train Strike Norman Oversleep Martin Oversleep Martin Late Norman Late Boss Failure-in-Love Project Delay Office Dirty Boss Angry Attach prior probabilities to all root nodes Example

  8. Train Strike Norman Oversleep Martin Oversleep Martin Late Norman Late Boss Failure-in-Love Project Delay Office Dirty Boss Angry Attach prior probabilities to non-root nodes Example Each column is summed to 1. Norman untidy

  9. Train Strike Norman Oversleep Martin Oversleep Martin Late Norman Late Boss Failure-in-Love Project Delay Office Dirty Boss Angry Attach prior probabilities to non-root nodes Example Each column is summed to 1. Norman untidy What is the difference between probability & fuzzy measurements?

  10. Medical Knowledge Example

  11. Definition of Bayesian Networks A Bayesian network is a directed acyclic graph with the following properties: • Each node represents a random variable. • Each node representing a variableA with parent nodes representing variables B1, B2,..., Bn is assigned a conditional probability table (CPT):

  12. Problems • How to inference? • How to learn the probabilities from data? • How to learn the structure from data? • What applications we may have? Bad news: All of them are NP-Hard

  13. Bayesian Networks Inference 大同大學資工所 智慧型多媒體研究室

  14. Inference

  15. Train Strike Martin Late Norman Late Example Questions: P (“Martin Late”, “Norman Late”, “Train Strike”)=? Joint distribution P(“Martin Late”)=? Marginal distribution Conditional distribution P(“Matrin Late” | “Norman Late ”)=?

  16. Train Strike Martin Late Norman Late Example Demo C A B Questions: P (“Martin Late”, “Norman Late”, “Train Strike”)=? Joint distribution e.g.,

  17. Train Strike Martin Late Norman Late Example Demo C A B Questions: P (“Martin Late”, “Norman Late”)=? Marginal distribution e.g.,

  18. Train Strike Martin Late Norman Late Example C A B Demo Questions: P (“Martin Late”)=? Marginal distribution e.g.,

  19. Train Strike Martin Late Norman Late Example C A B Questions: P (“Martin Late” | “Norman Late”)=? Conditional distribution e.g., Demo

  20. Inference Methods • Exact Algorithms: • Probability propagation • Variable elimination • Cutset Conditioning • Dynamic Programming • Approximation Algorithms • Variational methods • Sampling (Monte Carlo) methods • Loopy belief propagation • Bounded cutset conditioning • Parametric approximation methods

  21. The given terms are called evidences. Independence Assertions • Bayesian Networks have build-in independent assertions. • An independence assertion is a statement of the form • X and Y are independent given Z • We called that X and Y are d-separated by Z. That is, or

  22. Y1 Y4 Y2 Y3 X1 X2 W2 X3 W1 d-Separation Z

  23. Y1 Y4 Y2 Y3 X1 X2 W2 X3 W1 Type of Connections Serial Connections Yi– Z– Xj Converge Connections Y1/2– Z– Y3/4 Z Y3– Z– Y4 Diverge Connections Xi– Z– Xj

  24. X X Y Z Z Z X Y Y d-Separation Serial Converge Diverge

  25. X1 X2 X3 X4 X5 X6 X7 X9 X8 X10 X11 JPT: Joint probability table CPT: Conditional probability table Joint Distribution  With this, we can compute all probabilities By chain rule By independence assertions Parents ofXi Consider binary random variables: • To store JPT of all r.v’s : 2n1 table entries • To store CPT of all r.v’s: ? table entries

  26. X1 X2 X3 X4 X5 X6 X7 X9 X8 X10 X11 Joint Distribution Consider binary random variables: • To store JPT of all r.v’s : 2n1 table entries • To store CPT of all r.v’s: ? table entries

  27. X1 X2 1 1 X3 X4 1 2 X5 X6 X7 2 2 8 X9 X8 2 2 X10 X11 4 4 Joint Distribution To store JPT of all random variables: To store CPT of all random variables:

  28. X E Y More on d-Separation • It is linear or diverge and not a member of E; or • It is converging, and either N or one of its descendants is in E. A path from X to Y is d-connecting w.r.t evidence nodes E is every interior nodesN in the path has the property that either

  29. X E Y Identify the d-connecting and non-d-connecting paths from X to Y. More on d-Separation • It is linear or diverge and not a member of E; or • It is converging, and either N or one of its descendants is in E. A path from X to Y is d-connecting w.r.t evidence nodes E is every interior nodesN in the path has the property that either

  30. X E Y More on d-Separation Two nodes are d-separated if there is no d-connecting path between them. Exercise: Withdraw minimum number of edges such that X and Y are d-separated.

  31. X E Y More on d-Separation Two set of nodes, say, X={X1, …, Xm}andY={Y1, …, Yn} are d-separatedw.r.t. evidence nodesE if any pair of XiandYj are d-separated w.r.t. E. In this case, we have

  32. Bayesian Networks Clique Tree Propagation 大同大學資工所 智慧型多媒體研究室

  33. References • Developed by Lauritzen and Spiegelhalter and refined by Jensen et al. Lauritzen, S. L., and Spiegelhalter, D. J., Local computations with probabilities on graphical structures and their application to expert systems, J. Roy. Stat. Soc. B, 50, 157-224, 1988. Jensen, F. V., Lauritzen, S. L., and Olesen, K. G., Bayesian updating in causal probabilistic networks by local computations, Comp. Stat. Quart., 4, 269-282, 1990. Shenoy, P., and Shafer, G., Axioms for probability and belief-function propagation, in Uncertainty and Articial Intelligence, Vol. 4 (R. D. Shachter, T. Levitt, J. F. Lemmer and L. N. Kanal, Eds.), Elsevier, North-Holland, Amsterdam, 169-198, 1990.

  34. Clique Tree Propagation (CTP) • Given a Bayesian Network, build a secondary structure, called clique tree. • An undirected tree • Inference by propagation the belief potential among tree nodes. • It is an exact algorithm.

  35. Notations

  36. A B C G D E H F Definition: Family of a Node The family of a node V, denoted as FV, is defined by: Examples:

  37. A B C G D E H F We will model the probability tables as potential functions. Potential and Distributions Function of a. All of these tables map a set of random variables to a real value. Prior probability Conditional probability Conditional probability Function of a and b. Function of d, e and f.

  38. Potential Used to implement matrices or tables. Two operations: 1. Marginalization: 2. Multiplication:

  39. Marginalization Example:

  40. Multiplication Not necessary sum to one. x and y are consistent with z. Example:

  41. The Secondary Structure Given a Bayesian Network over a set of variables U = {V1, …, Vn} , its secondary structure contains a graphical and a numerical component. Graphic Component: An undirected clique tree: satisfies the join tree property. Numerical Component: Belief potentials on nodes and edges.

  42. A B C G D E H F ABD ADE ACE CEG AD AE CE EG DE DEF EGH How to build a clique tree? The Clique Tree T The clique tree Tfor a belief network over a set of variables U = {V1, …, Vn} satisfies the following properties. • Each node in T is a cluster or clique (nonempty set) of variables. • The clusters satisfy the join tree property: • Given two clusters X and Y in T, all clusters on the path between X and Y contain XY. • For each variable VU, FV is included in at least one of the cluster. • Sepsets: Each edge in T is labeled with the intersection of the adjacent clusters.

  43. ABD ADE ACE CEG AD AE CE EG DE DEF EGH How to assign belief functions? The Numeric Component • For each cluster X and neighboring sepset S, it holds that • It also holds that Clusters and sepsets are attached with belief functions. Local Consistency Global Consistency

  44. ABD ADE ACE CEG AD AE CE EG DE DEF EGH How to assign belief functions? The Numeric Component Clusters and sepsets are attached with belief functions. The key step to satisfy these constraints by letting and If so,

  45. Bayesian Networks Building the Clique Tree 大同大學資工所 智慧型多媒體研究室

  46. The Steps Belief Network Moral Graph Triangulated Graph Clique Set Join Tree

  47. Belief Network A A Moral Graph B B C C G G Triangulated Graph D D E E H H F F Clique Set Join Tree Moral Graph Belief Network Moral Graph • Convert the directed graph to undirected. • Connect each pair of parent nodes for each node.

  48. Belief Network A A Moral Graph B B C C G G Triangulated Graph D D E E H H F F Clique Set Join Tree This step is, in fact, done by incorporating with the next step. Triangulation Moral Graph Triangulated Graph • Triangulate the cycles with length more than 4 There are many ways.

  49. Belief Network A A Moral Graph B B C C G G Triangulated Graph D D E E H H F F Clique Set Join Tree Select Clique Set • CopyGMtoGM’. • While GM’ is not empty • select a node V from GM’, according to a criterion. • Node V and its neighbor form a cluster. • Connect all the nodes in the cluster. For each edge added to GM’, add the same edge to GM. • Remove V from GM’.

  50. Belief Network A A Moral Graph B B C C G G Triangulated Graph D D E E H H F F Clique Set Join Tree • Criterion: • The weight of a node V is the number of values of V. • The weight of a cluster is the product of it constituent nodes. • Choose the node that causes the least number of edges to be added. • Breaking ties by choosing the node that induces the cluster with the smallest weight. Select Clique Set • CopyGMtoGM’. • While GM’ is not empty • select a node V from GM’, according to a criterion. • Node V and its neighbor form a cluster. • Connect all the nodes in the cluster. For each edge added to GM’, add the same edge to GM. • Remove V from GM’.

More Related