360 likes | 561 Views
An introduction to Bayesian networks Stochastic Processes Course Hossein Amirkhani Spring 2011. Outline. Introduction, Bayesian Networks , Probabilistic Graphical Models, Conditional Independence, I-equivalence. Introduction.
E N D
An introduction to Bayesian networks Stochastic Processes Course Hossein Amirkhani Spring 2011
Outline • Introduction, • Bayesian Networks, • Probabilistic Graphical Models, • Conditional Independence, • I-equivalence.
Introduction • Our goal is to represent a joint distribution over some set of random variables . • Even in the simplest case where these variables are binary-valued, a joint distribution requires the specification of numbers. • The explicit representation of the joint distribution is unmanageable from every perspective: • Computationally, Cognitively, and Statistically.
Bayesian Networks • Bayesian networks exploit conditional independenceproperties of the distribution in order to allow a compact and natural representation. • They are a specific type of probabilistic graphical models. • BNs are directed acyclic graphs (DAG).
Probabilistic Graphical Models • Nodes are the random variables in our domain. • Edges correspond, intuitively, to direct influence of one node on another.
Probabilistic Graphical Models • Graphs are an intuitive way of representing and visualising the relationships between many variables. • A graph allows us to abstract out the conditional independence relationships between the variables from the details of their parametric forms. • Thus we can answer questions like: “Is A dependent on B given that we know the value of C ?” just by looking at the graph. • Graphical models allow us to define general message-passing algorithms that implement probabilistic inference efficiently. Graphical models = statistics × graph theory × computer science.
Conditional Independence: Example 1 tail-to-tail at c
Conditional Independence: Example 1 Smoking Lung Cancer Yellow Teeth
Conditional Independence: Example 2 head-to-tail at c
Conditional Independence: Example 2 Type of Car Speed Amount of speeding Fine
Conditional Independence: Example 3 head-to-head at c v-structure
Conditional Independence: Example 3 Ability of team B Ability of team A Outcome of A vs. B game
D-separation • A, B, and C are non-intersecting subsets of nodes in a directed graph. • A path from A to B is blocked if it contains a node such that either • the arrows on the path meet either head-to-tail or tail-to-tail at the node, and the node is in the set C, or • the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C. • If all paths from A to B are blocked, A is said to be d-separated from B by C. • If A is d-separated from B by C, the joint distribution over all variables in the graph satisfies .
I-equivalence • Let be a distribution over . We define to be the set of independence assertions that hold in . • Two graph structures and over are I-equivalent if . • The set of all graphs over X is partitioned into a set of mutually exclusive and exhaustive I-equivalence classes.
The skeleton of a Bayesian network • The skeleton of a Bayesian network graph over is an undirected graph over that contains an edge for every edge in .
Immorality • A v-structure is an immorality if there is no direct edge between X and Y.
Relationship between immorality, skeleton and I-equivalence • Let and be two graphs over . Then and have the same skeleton and the same set of immoralitiesif and only if they are I-equivalent. • We can use this theorem to recognize that whether two BNs are I-equivalent or not. • In addition, this theorem can be used for learning the structure of the Bayesian network related to a distribution. • We can construct the I-equivalence class for a distribution by determining its skeleton and its immoralities from the independence properties of the given distribution. • We then use both of these components to build a representation of the equivalence class.
Identifying the Undirected Skeleton • The basic idea is to use independence queries of the form for different sets of variables . • If and are adjacent in , we cannot separate them with any set of variables. • Conversely, if and are not adjacent in , we would hope to be able to find a set of variables that makes these two variables conditionally independent: we call this set a witness of their independence.
Identifying the Undirected Skeleton • Let be an I-map of a distribution , and let and be two variables that are not adjacent in . Then either or . • Thus, if and are not adjacent in , then we can find a witness of bounded size. • Thus, if we assume that has bounded indegree, say less than or equal to d, then we do not need to consider witness sets larger than d.
Identifying Immoralities • At this stage we have reconstructed the undirected skeleton. Now, we want to reconstruct edge direction. • Our goal is to consider potential immoralitiesin the skeleton and for each one determine whether it is indeed an immorality. • A triplet of variables X, Z, Y is a potential immoralityif the skeleton contains but does not contain an edge between X and Y. • A potential immorality is an immorality if and only ifZ is not in the witness set(s) for X and Y.
Representing Equivalence Classes • An acyclic graph containing both directed and undirected edges is called a partially directed acyclic graphor PDAG.
Representing Equivalence Classes • Let be a DAG. A chain graph is a class PDAG of the equivalence class of if shares the same skeleton as , and contains a directed edge if and only if all that are I-equivalent to contain the edge . • If the edge is directed, then all the members of the equivalence class agree on the orientation of the edge. • If the edge is undirected, there are two DAGs in the equivalence class that disagree with the orientation of the edge.
Representing Equivalence Classes • Is the output of Mark-Immoralities the class PDAG? • Clearly, edges involved in immoralities must be directed in K. • The obvious question is whether K can contain directed edges that are not involved in immoralities. • In other words, can there be additional edges whose direction is necessarily the same in every member of the equivalence class?
References • D. Koller and N. Friedman: Probabilistic Graphical Models. MIT Press, 2009. • C. M. Bishop: Pattern Recognition and Machine Learning. Springer, 2006.