690 likes | 757 Views
Learn the fundamentals of Bayesian networks, including formal definitions, advantages, reasoning under uncertainty, and the importance of conditional independence for efficient reasoning. Explore how to build and interpret Bayesian networks for effective decision-making.
E N D
BASICS + EXACT AND APPROXIMATE INFERENCE Pedro Larrañaga ComputationalIntelligenceGroup Departamento de Inteligencia Artificial Universidad Politécnica de Madrid Bayesian Networks: From Theory to Practice International Black Sea University Autumn School on Machine Learning 3-11 October 2019, Tbilisi, Georgia
Reasoning under uncertainty Conditional independence u-separation Bayesian networks: formal definition Building BNs Conceptos básicos Basics of Bayesian networks
ReasoningCond.Indep. u-separDefinitionBuilding Reasoning under uncertainty Advantages of BNs • Explicitrepresentation of the uncertain knowledge • Graphical, intuitive, closer to a world repres. • Deal with uncertainty for reasoninganddecision-making • Founded on probability theory, provide a clear semantics and • a sound theoretical foundation • Manage many variables • Both data and experts can be used to construct the model • Current and huge development • Support the expert; do not try to replace him
ReasoningCond.Indep. u-separDefinitionBuilding Reasoning under uncertainty The joint probability distribution (global model) is specified via marginal and conditional distributions (local models) taking into account conditionalindependence relationships among variables • This modularity: • Provides an easy maintenance • Reduces the number of parameters needed for the global model Estimation/elicitation is easier Reduction of the storing needs Efficient reasoning (inference) Modularity
ReasoningCond.Indep. u-separDefinitionBuilding Conditional independence The joint probability distribution • Dealing with a joint probability distribution • n diseases D1,…,Dn • m symptoms S1,…,Sm • Represent P(D1,…,Dn,S1,…,Sm), with 2n+m-1 parameters • E.g.: m=30, n=10, need of 240-1≈1012 That’s complete dependence: intractable in practice
ReasoningCond.Indep. u-separDefinitionBuilding Conditional independence Independence With mutual independence, only specify P(X1),…,P(Xn) n parameters (lineal) instead of 2n-1 (exponential) Unfortunately, it rarely holds in most domains Fortunately, there are some conditional independences. Exploit them (representation and inference)
ReasoningCond.Indep. u-separDefinitionBuilding Conditional independence Conditional independence Independence (marginal) sets of vars Conditional independence of X and Y given Z 3 disjoint sets of variables for all possible values x,y,z Intuitively, whenever Z=z, the information Y=y does not influence on the probability of x Notation:
ReasoningCond.Indep.u-separDefinitionBuilding Further factorizing the JPD Chain rule and factorization via c.i. Joint distribution factorized
ReasoningCond.Indep.u-separDefinitionBuilding BNs Informal definition: 2 components in a BN • Qualitativepart: a directedacyclicgraph (DAG) • Nodes = variables • Arcs = • direct dependence relations (otherwise it indicates absence of direct dependence; there may be indirect dependences and independences) YES Not necessarily causality Quantitative part: a set of conditional probabilities that determine a unique JPD
ReasoningCond.Indep.u-separDefinitionBuilding BNs: nodes Target node Parents Ancestors Children Descendants Rest Family
ReasoningCond.Indep. u-separDefinitionBuilding BNs: arcs (types of independence) Independences in a BN A BN represents a set of independences Distinguish: Basic independences: we should take care of verifying them when constructing the net Derived independences: from the previous independences, by using the properties of the independence relations Checkthembymeans of theu-separationcriterion
ReasoningCond.Indep. u-separDefinitionBuilding Basic independences Basic independence: Markov condition Xi c.i. of its non-descendants, given its parents Pa(Xi)
ReasoningCond.Indep. u-separDefinitionBuilding Basic independences Example Fever is conditionally independent of Jaundice given Malaria and Flu
ReasoningCond.Indep. u-separDefinitionBuilding Quantitative part Factorizing the JPD …Now with the quantitative part of the net, the JPD: Specify it intelligently. Use the chain rule and the Markov condition Let X1,…,Xn be an ancestral ordering (parents appear before their children in the sequence). It always exists (DAG) Using that ordering in the chain rule, in {Xi-1,…,X1} there are non-descendants of Xi, and we have
ReasoningCond.Indep. u-separDefinitionBuilding Quantitative part MODEL CONSTRUCTION EASIER: Only store local distributions at each node Fewer parameters to assign and more naturally Inference easier Factorizing the JPD Therefore, we can recover the JPD by using the following factorization:
ReasoningCond.Indep. u-separDefinitionBuilding Quantitative part B E A N 1 1 W 4 2 2 Withallbinary variables: 32=25-1 probabilities for the JPD 10 withthefactorization in the BN:
ReasoningCond.Indep. u-separDefinitionBuilding Quantitative part BN Alarm for monitoring ICU patients 237probabilitiesforthe JPD vs. 509 in BN
ReasoningCond.Indep. u-separDefinitionBuilding Independencesderivedfrom u-separation u-separation Obtain the minimum graph containing X,Y,Z and their ancestors (ancestral graph) The subgraph obtained is moralized (add a link between parents with children in common) and remove direction of arcs Zu-separatesX and Y whenever Z is in all paths between X and Y
ReasoningCond.Indep. D-separ Definition Building Independenciesderivedfrom u-separation Z W Y W Y S T u-separation X Blueu-separatedbyred? W ⊥ S |{Y,Z} ? Z W Y W ⊥ T |Y ? R S T
ReasoningCond.Indep. u-separDefinitionBuilding Joining the two parts disjoint u-separationdefinedby G c.i. defined by P Graph G representsalldependences of P Someindependences of P may be notidentifiedby u-separationin G u-separationTheorem[Verma and Pearl’90, Neapolitan’90] Let P be a prob. distribution of the variables in V and G=(V,E) a DAG. (G,P) holds the Markov condition iff
ReasoningCond.Indep. u-separDefinitionBuilding Definition of BN (taking an ancestral ordering) Formal definition • Let P be a JPD over V={X1,…,Xn}. • A BNis a tuple (G,P), where G=(V,E) is a DAG suchthat: • Eachnode of G represents a variable of V • TheMarkovconditionisheld • Eachnode has associated a localprob. • distrib. suchthat • u-separated variables in the graph are independent quantitative part
ReasoningCond.Indep. u-separDefinitionBuilding Definition of BN A property Set of nodes that makes X c.i. of the rest of the network: A node is c.i. of all other nodes in the BN, given its parents, childrenand children’s parents -itsMarkov blanket-
ReasoningCond.Indep. u-separDefinitionBuilding Definition of BN Malaria is conditionally independent of Aches given ExoticTrip, Jaundice, Fever and Flu
ReasoningCond.Indep. D-separ Definition Building Whataboutcontinuous variables? GAUSSIAN BNs • All variables are continuous • All conditional distributions as (linear) Gaussians • Define the JPD • Other: MTE, MoP • (Inference in closed form)
ReasoningCond.Indep. D-separ Definition Building Whataboutdynamicsystems? • DYNAMIC BNs: Time slices (with identical BNs) • Transition arcs toward future Prior BN Transition BN Unrolled Stationarity and first-order Markov assumptions
ReasoningCond.Indep. u-separDefinitionBuilding Building a BN Learning from a database Database algorithm Bayesian net A combination(experts → structure; database → probabilities) Expert /from data /both Manual with the aid of an expert in the domain modelisation probabilities Causal mechanisms Causal graph Bayesian net Build it in the causal direction: BNs simpler and efficient
ReasoningCond.Indep. u-separDefinitionBuilding Building a BN Summary
ReasoningCond.Indep. u-separDefinitionBuilding Building a BN Summary
ReasoningCond.Indep. u-separDefinitionBuilding Building a BN Summary
ReasoningCond.Indep. u-separDefinitionBuilding Building a BN Example: Asia BN [Lauritzen & Spiegelhalter’88]
ReasoningCond.Indep. u-separDefinitionBuilding Building a BN
Textbooks More recent: A. Darwiche (2009) Modeling and ReasoningwithBNs, Cambridge U.P. U. Kjaerulff, A. Madsen (2008) Bayesian Networks and Influence Diagrams -A Guide to Construction and Analysis. Springer D. Koller, N. Friedman (2009) ProbabilisticGraphicalModels, The MIT Press T. Koski, J. Noble (2009) Bayesian Networks: An Introduction. Wiley
Conceptos básicos Inference in Bayesian networks Types of queries Exact inference: Brute-force computation Variable elimination algorithm Message passing algorithm Approximate inference: Probabilisticlogicsampling
Queries Brute-force VE Message Approx Types of queries Burgl. Earth. Alarm News WCalls Queries: posterior probabilities Given some evidence e (observations), • Posterior probability of a target variable(s) X : • Other names: probability propagation, belief updating or revision… answer queries about P Vector ?
Queries Brute-force VE Message Approx Types of queries Burgl. Burgl. Earth. Earth. Alarm Alarm News News WCalls WCalls Semantically, for any kind of reasoning Predictive reasoning or deductive (causal inference): predict effects Symptoms|Disease • Target variable is usually • a descendant of the evidence ? Diagnostic reasoning (diagnostic inference): diagnose the causes Disease|Symptoms ? • Target variable is usually • an ancestor of the evidence
Queries Brute-force VE Message Approx Types of queries Burgl. Burgl. Earth. Earth. Alarm Alarm News News ? ? WCalls WCalls ? ? ? ? More queries: maximum a posteriori (MAP) Most likely configurations (abductive inference): event that best explains the evidence • Total abduction: searchfor all the unobserved In general, cannot be computed component-wise, with max P(xi|e) • Partialabduction:searchfor subset. of unobserved (explanation set) Use MAP for: • Classification: findmostlikelylabel, giventheevidence
Queries Brute-force VE Message Approx Exact inference [Pearl’88; Lauritzen & Spiegelhalter’88] Brute-force computation of P(X|e) First, consider P(Xi), without observed evidence e. Conceptually simple but computationally complex For a BN with n variables, each with its P(Xj|Pa(Xj)): Brute-force approach But this amounts to computing the JPD, often very inefficient and even intractable computationally CHALLENGE:Without computing the JDP, exploit the factorization encoded by the BN and the distributive law (local computations)
Queries Brute-force VE Message Approx Exact inference ? Improving brute-force Use the JPD factorization and the distributive law Table with 32 inputs (JPD) (if binary variables)
Queries Brute-force VE Message Approx Exact inference Biggesttablewith8 (likethe BN) • over X4: Improving brute-force Arrange computations effectively, moving some additions • over X5 and X3:
QueriesBrute-force VE Message Approx Exact inference Variable elimination algorithm ONE variable Wanted: A list with all functions of the problem Select an elimination order of all variables (except i) For each Xk from , if F is the set of functions that involve Xk: Repeat the algorithm for each target variable Delete F from the list Eliminate Xk= combine all the functions that contain this variable and marginalize out Xk Compute Add f’ to the list Output: combination (multiplication) of allfunctions in thecurrentlist. Normalize
QueriesBrute-force VE Message Approx Exact inference Smoking (S) Visit to Asia (A) Tuberculosis (T) Lung Cancer (L) Tub. or Lung Canc (E) Bronchitis (B) Dyspnea (D) X-Ray (X) Example with Asia network
QueriesBrute-force VE Message Approx Exact inference Brute-force approach Compute P(D) by brute-force: Complexity is exponential in the size of the graph (number of variables *number of states for each variable)
QueriesBrute-force VE Message Approx Exact inference not necessarily a probability term
QueriesBrute-force VE Message Approx Exact inference 4
QueriesBrute-force VE Message Approx Exact inference Complexity is exponential in the max # of var. in factors of the summation Size = 8 Variable elimination algorithm Local computations (due to moving the additions) Importance of the eliminationordering, but finding an optimal (minimum cost) is NP-hard [Arnborg et al.’87] (heuristics for good sequences)