480 likes | 498 Views
Here we start with a presentation of a wonderful graphical model called Bayesian Networks, ask ourselves what a multi-agent distributed computation setting would look like for these, and see where it brings us.
E N D
On Distributing Probabilistic Inference Thor Whalen, Data Scientist, December 2005 https://www.linkedin.com/in/thorwhalen/
Outline • Inference and distributed inference • Conditional independence (CI) • Graphical models, CI, and separability • Using CI: Sufficient information
Outline • Inference and distributed inference • Probabilistic inference • Distributed probabilistic inference • Use of distributed probabilistic inference • Conditional independence (CI) • Graphical models, CI, and separability • Using CI: Sufficient information
Probabilistic Inference • “(Probabilistic) inference, or model evaluation, is the process of updating probabilities of outcomes based upon the relationships in the model and the evidence known about the situation at hand.” (research.microsoft.com) • Let V = {X1, ..., Xn} be the set of variables of interest • We are given a (prior) probability space P (V) on V • Bayesian Inference: Given some evidence e, we compute the posterior probability space P ‘=P (V|e) Evidence
0.0120 0.0080 0.0540 0.3000 0.1260 0.9000 0.2880 0.1920 0.0960 0.2240 Probabilistic Inference • A probability space on a finite set V of discrete variables can be represented as a table containing the probability of every combination of states of V • Such a table is commonly called a “potential” • A receives evidence → have P(e|A) → want P(A,B,C|e)
0.0120 0.0080 0.0540 0.1260 0.2880 0.1920 0.0960 0.2240 Probabilistic Inference • A probability space on a finite set V of discrete variables can be represented as a table containing the probability of every combination of states of V • Such a table is commonly called a “potential” • A receives evidence → have P(e|A) → want P(A,B,C|e) • Assuming evidence e only depends on variable A, we have P(A,B,C|e) = P(A,B,C)P(e|A) 0.0120 0.3000 0.0036 0.0046 0.0080 0.3000 0.0024 0.0031 0.0540 0.3000 0.0162 0.0208 0.3000 0.1260 0.3000 0.0378 0.0485 0.9000 0.2880 0.9000 0.2592 0.3323 0.1920 0.9000 0.1728 0.2215 0.0960 0.9000 0.0864 0.1108 0.2240 0.9000 0.2016 0.2585
same Probability space same Distributed Probabilistic Inference • Several components, each inferring on a given subspace • Two components may communicate information about variables they have in common • Wish to be able to fuse evidence received throughout the system A A Evidence B B B B Evidence C C
n (assuming one million operations per second) Use of Distributed Probabilistic Inference • Be able to implement probabilistic inference • Implement multi-agent systems: • agents sense their environment and take actions intelligently • they can observe given variables of the probability space • they need to infer on other variables in order to take action • cooperate in exchanging information about the probability space
Agent 2 Agent 1 Use of Distributed Probabilistic Inference • Be able to implement probabilistic inference • Implement multi-agent systems: • agents sense their environment and take actions intelligently • they can observe given variables of the probability space • they need to infer on other variables in order to take action • cooperate in exchanging information about the probability space
Outline • Inference and distributed inference • Conditional independence (CI) • Graphical models, CI, and separability • Using CI: Sufficient information
Conditional independence • Marginalize out “A” from P(A,B,C) to get P(A,B) 0.0120 + 0.0080 • P(C|A,B) = P(A,B,C)/P(A,B) • P(C|A,B) = P(C|B)
Conditional independence Wait a minute Dr. Watson! P(C|A,B) is a table of size 8, whereas P(C|B) is of size 4! This is entropicaly impossible! Why you little!! • Marginalize out “A” from P(A,B,C) to get P(A,B) • P(C|A,B) = P(A,B,C)/P(A,B) • P(C|A,B) = P(C|B) Eh, but I see only four distinct numbers here, doc... ... and “entropicaly” is not a word!
Conditional independence • Marginalize out “A” from P(A,B,C) to get P(A,B) • P(C|A,B) = P(A,B,C)/P(A,B) • P(C|A,B) = P(C|B) i.e. Insensative to A We say that A and C are conditionally independent given B
Outline • Inference and distributed inference • Conditional independence (CI) • Graphical models, CI, and separability • Bayes nets • Markov networks • Using CI: Sufficient information
Bayes Nets B A D C E Note that P(A,B,C,D,E) = P(A) P(A) P(B|A) P(B|A) P(C|A,B) P(C|A,B) P(D|A,B,C) P(D|A,B,C) P(E|A,B,C,D) P(E|A,B,C,D) P(A) P(B) P(C|A) P(D|B,C) P(E|C)
Bayes Nets • A Bayes Net is a representation of a probability distribution P(V) on a set V=X1, ..., Xn of variables
Bayes Nets X3 X2 X1 • A Bayes Net is a representation of a probability distribution P(V) on a set V=X1, ..., Xn of variables • A BN consists of • A Directed Acyclic Graph (DAG) • Nodes: Variables of V • Edges: Causal relations X5 X4 X7 X8 X6 X9 X10 X12 X11 X13 Directed cycle A DAG is a directed graph with no directed cycles The above directed graph is a DAG Now this graph IS NOT a DAG because it has a directed cycle
Bayes Net X3 X2 X1 • A Bayes Net is a representation of a probability distribution P(V) on a set V=X1, ..., Xn of variables • A BN consists of • A Directed Acyclic Graph (DAG) • Nodes: Variables of V • Edges: Causal relations • A list of conditional probability distributions (CPDs); one for every node of the DAG X5 X4 X7 X8 X6 X9 X10 X12 X11 X13 Etc...
Bayes Nets X3 X2 X1 • A Bayes Net is a representation of a probability distribution P(V) on a set V=X1, ..., Xn of variables • A BN consists of • A Directed Acyclic Graph (DAG) • Nodes: variables of V • Edges: Causal relations • A list of conditional probability distributions (CPDs); one for every node of the DAG • The DAG exhibits particular (in)dependencies of P(V) X5 X4 X7 X8 X6 X9 X10 X12 X11 X13 A B C B A and are independent given C - i.e. P(A , C | B) = P(A | B) P(C | B) - i.e. P(C| A, B) = P(C | B) We say that B separates A and C
Bayes Nets X3 X2 X1 • A Bayes Net is a representation of a probability distribution P(V) on a set V=X1, ..., Xn of variables • A BN consists of • A Directed Acyclic Graph (DAG) • Nodes: variables of V • Edges: Causal relations • A list of conditional probability distributions (CPDs); one for every node of the DAG • The DAG characterizes the (in)dependency structure of P(V) • The CPDs characterize the probabilistic and/or deterministic relations between parent states and children states X5 X4 X7 X8 X6 X9 X10 X12 X11 X13
Bayes Nets Parentless nodes X3 X2 X1 • The prior distributions on the variables of parentless nodes, along with the CPDs of the BN, induce prior distribution—called “beliefs” in the literature—on all the variables • If the system receives evidence on a variable: • this evidence impacts its belief, • along with the beliefs of all other variables X5 X4 X7 X7 X8 X6 X9 X10 Evidence X12 X11 X13
Markov networks • The edges of a Markov network exhibit direct dependencies between variables • The absence of an edge means absence of direct dependency • If a set B of nodes separates the graph into several components then these components are independent given B
Outline • Inference and distributed inference • Conditional independence (CI) • Graphical models, CI, and separability • Using CI: Sufficient information • Specifications for a distributed inference system • A naive solution • Using separation
B A M D C L E K G J H F I Using CI: Sufficient information Specifications • A probability space • A number of agents, each having • query variables • evidence variables What variables must agent 1 represent so that it may fuse the evidence received by other agents? Query variables Agent 1 Agent 4 Agent 2 Agent 3 Evidence variables
B A M D C L E K G J H F I Using CI: Sufficient information A naïve solution • Agents contain their own query and evidence variables • In order to receive evidence from the other agents, agent 1 must represent variables E, F, G, H, I, J, K, L, and M Specifications • A probability space • A number of agents, each having • query variables • evidence variables Agent 1 A B Agent 1 E F G H I J H I J E F G K L M K L M Agent 4 Agent 2 Agent 3 Agent 3 Agent 4 Agent 2 H I J E F G K L M
B A M D C L E K G J H F I Using CI: Sufficient information A naïve solution • Agents contain their own query and evidence variables • In order to receive evidence from the other agents, agent 1 must represent variables E, F, G, H, I, J, K, L, and M Specifications • A probability space • A number of agents, each having • query variables • evidence variables Agent 1 must represent many variables! How else could the other agents communicate their evidence? Agent 1 X Z A B Y Note that H I J E F G K L M Z separates X and Y Agent 4 Agent 2 whether Y is equal to: • {K,L,M}, • {H,J,I}, or Agent 3 • {E,F,G}. H I J Y E F G K L M Y
A D C B E G F Using CI: Sufficient information → P(Y|Z) = P(Y|X,Z) Z separates X and Y → = Likelihood given Z of evidence on Y Likelihood given X and Z of evidence on Y P(eY|X,Z) P(eY|Z) → It is sufficient for agent 2 to send its posterior on Z to agent 1 for the latter to compute its posterior on X Agent 1 X = {A,B} X P(X,Z|eY) P(X,Z) Z = {C,D} x P(X,Z)P(Z)-1 Z P(Z|eY) ΣY Z = {C,D} Y P(Y,Z|eY) P(Y,Z) evidence eY Y = {E,G,F} Agent 2
A D C B E G F Using CI: Sufficient information → P(Y|X,Z) = P(Y|Z) Z separates X and Y → = P(eY|X,Z) P(eY|Z) → It is sufficient for agent 2 to send its posterior on Z to agent 1 for the latter to compute its posterior on X Agent 1 Because: X = {A,B} P(X,Z) P(Z)-1 P(Z|eY) X P(X,Z|eY) Z = {C,D} = P(X,Z) P(Z)-1 P(Z,eY) P(eY)-1 Z P(Z|eY) = P(X,Z) P(eY|Z) P(eY)-1 = P(X,Z) P(eY)-1 P(eY|X,Z) Z = {C,D} Y P(Y,Z|eY) = P(X,Z,eY) P(eY)-1 P(X,Z|eY) = Y = {E,G,F} Agent 2
Using CI: Sufficient information A naïve solution • Agents contain their own query and evidence variables • In order to receive evidence from the other agents, agent 1 must represent variables E, F, G, H, I, J, K, L, and M Specifications • A Bayes net • A number of agents, each having • query variables • evidence variables Using separation • Agent 1 only needs to represent two extra variables • Agent 1 may compute its posterior queries faster from CD than from EFGHIJK • Communication lines need to transmit two variables instead of three A B A B H I J C D E F G K L M C D C D C D C D C D C D C D C D C D H I J H I J H I J H I J E F G K L M E F G E F G E F G K L M K L M K L M
Query variable Evidence variable Other variable Using CI: Sufficient information
MatLab Tool Main functionality i + [Enter] to Initialize, v + [Enter] to view all variables (even those containing no information), e + [Enter] to enter evidence, c + [Enter] to perform a inter-agent communication, p + [Enter] to go to the previous step, n + [Enter] to go to the next step, a + [Enter] to add a sensor, r + [Enter] to remove a sensor, t + [Enter] to turn true marginals view ON, m + [Enter] to turn discrepancy marking OFF, s + [Enter] to save to a file, q + [Enter] to quit. Enter Command: • Insert evidence into given agents and propagate their impact inside the subnet • Initiate communication between agents, followed by the propagation of new information • View the marginal distributions of the different agents at every step • Step forward and backward • Save eye-friendly logs to a file
MatLab Tool: Display Indicates step number and last action that was taken Shows the marginal distributions that would have been obtained by infering on the entire Bayes Net Shows the marginal distributions of the variables represented in each subnet Prompts for new action
Communication Graph Considerations 3 4 1 6 2 5 One solution (often adopted) would be to impose a tree structure to the communication graph A communication graph Agent 6 receives info from agent 1 through both agent 4 and 5. ? How should subnet 6 deal with possible redundancy?
Communication Graph Considerations • When choosing the communication graph, one should take into consideration • The quality of the possible communication lines • The processing speed of the agents • The importance of given queries ...then this communication graph is more appropriate If this is the key decision-making agent … than this one
Problem Specification Given: • A prob. space on V={X1, ..., Xn} • A number of agents, each having: • Qi: a set of query variables • Ei: a set of evidence variables
Problem Specification Given: • A BN on V={X1, ..., Xn} • A number of agents, each having: • Qi: a set of query variables • Ei: a set of evidence variables Determine: • An agent communication graph • A subset Si of V for each agent • An inference protocol that specifies • How to fuse evidence and messages received from other agents • The content of messages between agents
Distributed Inference Design • A communication graph: • Nodes represent agents • Edges represent communication lines • Each agent i has: • Qi: a set of query variables • Ei: a set of evidence variables • Pi(Si): a probability space on a subset Si of V • An inference protocol. This includes a specification of • What to do with received evidence or messages? • What messages must be sent to other agents? Query variables Evidence variables
Distributed Bayesian Inference Problem Given a set of pairs (Q1, E1), ..., (Qk, Ek), we wish to find an inference scheme so that, given any evidence e = e1, ..., ek (where ei is the set of evidence received by subnet i), the agents may compute the correct posterior on their query variables, i.e. for all i, Pi (the probability space of agent i) must become consistent with P on its query variables i.e. agent i must be able to compute, for all query variable Q of Qi, the probability Pi(Q|e) = P(Q|e)
More Definitions Let X, Y and Z be subsets of V: • If P is a prob. space on V, I(X,Y|Z)P is the statement “X is independent of Y given Z,” i.e. P(X|Y,Z) = P(X|Z) • If Dis a DAG, I(X,Y|Z)D is the statement “X and Y are d-separated by Z” • If G a graph, I(X,Y|Z)Gis the statement “X and Y are disconnected by Z” • Theorem: If Dis a Bayes Net for P and G is the moral graph of the ancestor hull of XUYUZ, then I(X,Y|Z)G↔ I(X,Y|Z)D → I(X,Y|Z)P
Use of Conditional Independence in the Distributed Inference Problem • What should S1 send to S2 so that Q2 so may “feel” the effect of evidence received by S1 on E1? • We want S2 to be able to update its probability space so that P2(Q2 | e1) = P(Q2 | e1) • Claim: If I(E1,Q2|A,B)P then P1(A,B|e1) = is sufficient information for S2 to update its probability space • “Proof”: P(Q2 | E1,A,B) = P(Q2 | A,B) S1 S2 Q2 E1 A B
Distributed Bayesian Inference Problem Given a set of pairs (Q1, E1), ..., (Qk, Ek), we wish to find an inference scheme so that, given any evidence e = e1, ..., ek where ei is the set of evidence received by subnet i, the subnets may compute the correct posterior on their query variables, i.e. the Pi must become consistent with P on their query variables i.e. subnet i must be able to compute, for all Q of Qi, the probability Pi(Q|e) = P(Q|e)
Distributed Bayesian Inference: Inference Protocol • A message between two subnets is a joint distribution on a common subset of variables, computed from the probability space of the sender • Subnets remember the last message that each subnet sent to it • A subnet divides the new message by the old one and absorbs the result into its probability space
A D C B E G F Sufficient information → P(Y|Z) = P(Y|X,Z) Z separates X and Y → = Likelihood given Z of evidence on Y Likelihood given X and Z of evidence on Y → It is sufficient for agent 2 to send its posterior on Z to agent 1 for the latter to compute its posterior on X Agent 1 X = {A,B} X P(X,Z) P(X,Z|eY) Z = {C,D} x P(X,Z)P(Z)-1 Z P(Z|eY) ΣY Z = {C,D} Y P(Y,Z|eY) P(Y,Z) evidence eY Y = {E,G,F} Agent 2
A D C B E G F Sufficient information → P(Y|X,Z) = P(Y|Z) Z separates X and Y → = Likelihood given Z of evidence on Y Likelihood given X and Z of evidence on Y P(eY|X,Z) P(eY|Z) → It is sufficient for agent 2 to send its posterior on Z to agent 1 for the latter to compute its posterior on X Agent 1 Because: X = {A,B} P(X,Z) P(Z)-1 P(Z|eY) X P(X,Z|eY) Z = {C,D} = P(X,Z) P(Z)-1 P(Z,eY) P(eY)-1 Z P(Z|eY) = P(X,Z) P(eY|Z) P(eY)-1 = P(X,Z) P(eY)-1 P(eY|X,Z) Z = {C,D} Y P(Y,Z|eY) = P(X,Z,eY) P(eY)-1 P(X,Z|eY) = Y = {E,G,F} Agent 2
Communication Graph Considerations • In a tree communication graph every edge is the only communication line between two parts of the network • Hence it must deliver enough information so that the evidence received in one part may convey its impact to the query variables of the other part • We restrict ourselves to the case where every node represented by an agent can be queried or receive evidence • In this case it is sufficient that the set of variables Z, that will be represented in any communication line, separates the set X of variables of one side of the network from the set Y of variables of the other side Z X Y