An introduction to Bayesian networks Stochastic Processes Course Hossein Amirkhani Spring 2011

An introduction to Bayesian networks Stochastic Processes Course Hossein Amirkhani Spring 2011

Outline • Introduction, • Bayesian Networks, • Probabilistic Graphical Models, • Conditional Independence, • I-equivalence.

Introduction • Our goal is to represent a joint distribution over some set of random variables . • Even in the simplest case where these variables are binary-valued, a joint distribution requires the specification of numbers. • The explicit representation of the joint distribution is unmanageable from every perspective: • Computationally, Cognitively, and Statistically.

Bayesian Networks • Bayesian networks exploit conditional independenceproperties of the distribution in order to allow a compact and natural representation. • They are a specific type of probabilistic graphical models. • BNs are directed acyclic graphs (DAG).

Probabilistic Graphical Models • Nodes are the random variables in our domain. • Edges correspond, intuitively, to direct influence of one node on another.

Probabilistic Graphical Models • Graphs are an intuitive way of representing and visualising the relationships between many variables. • A graph allows us to abstract out the conditional independence relationships between the variables from the details of their parametric forms. • Thus we can answer questions like: “Is A dependent on B given that we know the value of C ?” just by looking at the graph. • Graphical models allow us to define general message-passing algorithms that implement probabilistic inference efficiently. Graphical models = statistics × graph theory × computer science.

Bayesian Networks

Conditional Independence: Example 1 tail-to-tail at c

Conditional Independence: Example 1

Conditional Independence: Example 1 Smoking Lung Cancer Yellow Teeth

Conditional Independence: Example 2 head-to-tail at c

Conditional Independence: Example 2 Type of Car Speed Amount of speeding Fine

Conditional Independence: Example 3 head-to-head at c v-structure

Conditional Independence: Example 3 Ability of team B Ability of team A Outcome of A vs. B game

D-separation • A, B, and C are non-intersecting subsets of nodes in a directed graph. • A path from A to B is blocked if it contains a node such that either • the arrows on the path meet either head-to-tail or tail-to-tail at the node, and the node is in the set C, or • the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C. • If all paths from A to B are blocked, A is said to be d-separated from B by C. • If A is d-separated from B by C, the joint distribution over all variables in the graph satisfies .

I-equivalence • Let be a distribution over . We define to be the set of independence assertions that hold in . • Two graph structures and over are I-equivalent if . • The set of all graphs over X is partitioned into a set of mutually exclusive and exhaustive I-equivalence classes.

The skeleton of a Bayesian network • The skeleton of a Bayesian network graph over is an undirected graph over that contains an edge for every edge in .

Immorality • A v-structure is an immorality if there is no direct edge between X and Y.

Relationship between immorality, skeleton and I-equivalence • Let and be two graphs over . Then and have the same skeleton and the same set of immoralitiesif and only if they are I-equivalent. • We can use this theorem to recognize that whether two BNs are I-equivalent or not. • In addition, this theorem can be used for learning the structure of the Bayesian network related to a distribution. • We can construct the I-equivalence class for a distribution by determining its skeleton and its immoralities from the independence properties of the given distribution. • We then use both of these components to build a representation of the equivalence class.

Identifying the Undirected Skeleton • The basic idea is to use independence queries of the form for different sets of variables . • If and are adjacent in , we cannot separate them with any set of variables. • Conversely, if and are not adjacent in , we would hope to be able to find a set of variables that makes these two variables conditionally independent: we call this set a witness of their independence.

Identifying the Undirected Skeleton • Let be an I-map of a distribution , and let and be two variables that are not adjacent in . Then either or . • Thus, if and are not adjacent in , then we can find a witness of bounded size. • Thus, if we assume that has bounded indegree, say less than or equal to d, then we do not need to consider witness sets larger than d.

Identifying Immoralities • At this stage we have reconstructed the undirected skeleton. Now, we want to reconstruct edge direction. • Our goal is to consider potential immoralitiesin the skeleton and for each one determine whether it is indeed an immorality. • A triplet of variables X, Z, Y is a potential immoralityif the skeleton contains but does not contain an edge between X and Y. • A potential immorality is an immorality if and only ifZ is not in the witness set(s) for X and Y.

Representing Equivalence Classes • An acyclic graph containing both directed and undirected edges is called a partially directed acyclic graphor PDAG.

Representing Equivalence Classes • Let be a DAG. A chain graph is a class PDAG of the equivalence class of if shares the same skeleton as , and contains a directed edge if and only if all that are I-equivalent to contain the edge . • If the edge is directed, then all the members of the equivalence class agree on the orientation of the edge. • If the edge is undirected, there are two DAGs in the equivalence class that disagree with the orientation of the edge.

Representing Equivalence Classes • Is the output of Mark-Immoralities the class PDAG? • Clearly, edges involved in immoralities must be directed in K. • The obvious question is whether K can contain directed edges that are not involved in immoralities. • In other words, can there be additional edges whose direction is necessarily the same in every member of the equivalence class?

Rules

Example

References • D. Koller and N. Friedman: Probabilistic Graphical Models. MIT Press, 2009. • C. M. Bishop: Pattern Recognition and Machine Learning. Springer, 2006.

THANKS

An introduction to Bayesian networks Stochastic Processes Course Hossein Amirkhani Spring 2011

An introduction to Bayesian networks Stochastic Processes Course Hossein Amirkhani Spring 2011

Presentation Transcript

An introduction to search and optimisation using Stochastic Diffusion Processes

Stochastic Processes

Stochastic Processes

Stochastic Processes

Stochastic Processes

Stochastic Markov Processes and Bayesian Networks

Bayesian networks, introduction

Stochastic Processes

Stochastic Processes

§❷ An Introduction to Bayesian inference

Stochastic Processes

An Introduction to Bayesian Networks

Stochastic Processes

An introduction to search and optimisation using Stochastic Diffusion Processes

An introduction to stochastic programming

An Introduction to Bayesian Networks for Multi-Agent Systems

Bayesian networks, introduction

Stochastic Processes

Stochastic Markov Processes and Bayesian Networks

An introduction to stochastic programming

An Introduction to Bayesian Networks

A Brief Introduction to Bayesian networks