330 likes | 423 Views
Bayesian Networks – Principles and Application to Modelling water, governance and human development indicators in Developing Countries. Jorge López Puga ( jpuga@ual.es ) Área de Metodología de las Ciencias del Comportamiento Universidad de Almería www.ual.es/personal/jpuga .
E N D
Bayesian Networks – Principles and Application to Modelling water, governance and human development indicators in Developing Countries Jorge LópezPuga (jpuga@ual.es) Área de Metodología de lasCiencias del Comportamiento Universidad de Almería www.ual.es/personal/jpuga February 2012
The Content of the Sections • What is probability? • The Bayes Theorem • Deduction of the theorem • The Balls problem • Introduction to Bayesian Networks • Historical background • Qualitative and quantitative dimensions • Advantages and disadvantages of Bayes nets • Software
What is Probability? • Etymology • Measure of authority of a witness in a legal case (Europe) • Interpretations of Probability • Objective probability • Aprioristic or classical • Frequentist or empirical • Subjective probability • Belief
Objective Probability • Frequentist • Random experiment • Well defined sample space • Posterior probability • Randomness • Classical (Laplace, 1812-1814) • A priory • Aprioristic • Equiprobability • Full knowledge about the sample space
Subjective Probability • It is simply an individual degree of belief which is updated based on experience • Probability Axioms • p(SE)= 1 • p(…) ≥ 0 • If two events are mutually exclusive (A B = Ø), then p(A B) = p(A) + p(B)
Cards Game Let me show you the idea of probability with a cards game Classical vs. Frequentistvs. Subjective
Which is the probability of getting an ace? • As you probably know…
Which is the probability of getting an ace? • Given that there are 52 cards and 4 aces in a French deck… • We could say… Aprioristic If we repeated the experience a finite number of times Frequentist If I subjectively assess that probability Bayesian
Which is the probability of getting an ace? • Why is useful a Bayesian interpretation of probability? – Let’s play • We could say… Probability estimations depends on our state of knowledge (Dixon, 1964)
The Bayesian Theorem Getting Evidences and Updating Probabilities
Joint and Conditional Probability • Joint probability (Distributions – of variables) • It represents the likelihood of two events occurring at the same time • It is the same that the intersection of events • Notation • p(A B), p(A,B), p(AB) • Estimation • Independent events • Dependent events
Independent events • p(AB) = p(A) × p(B) or p(BA) = p(B) × p(A) Example:which is the probability of obtaining two tails (T) after tossing two coins? p(TT) = p(T) × p(T) = 0.5 × 0.5 = 0.25 • Dependent events • Conditional probability and the symbol “|” • p(AB) = p(A|B) × p(B) or p(BA) = p(B|A) × p(A) Example:which is the probability of suffering from bronchitis(B)and being a smoker (S)at the same time? • p(B) = 0.25 • p(S|B) = 0.6 p(SB) = p(S|B) ×p(B) = 0.6 × 0.25 = 0.15
The Bayes Theorem • It is a generalization of the conditional probability applied to the joint probability • It is: • You can deduce it because: p(AB) = p(A|B) × p(B) - - - - - p(BA) = p(B|A) × p(A) p(A|B) × p(B) = p(B|A) × p(A) p(A|B) = p(B|A) × p(A) / p(B)
Example:which is the probability of a person suffering from bronchitis (B)given s/he smokes (S)? • p(B) = 0.25 • p(S|B) = 0.6 • p(S)= 0.40
The Total Probability Teorem • If we use a system based on a mutually excusive set of events = {A1, A2, A3,…An} whose probabilities sum to unity, • then the probability of an arbitrary event (B) equals to: • which means:
If = {A1, A2, A3,…An} is a mutually excusive set of events whose probabilities sum to unity, then the Bayes Theorem becomes: • Let’s use a typicalexample to see how it works
The Balls problem • Situation: we have got three boxes (B1, B2, B3) with the following content of balls: • Experiment: extracting a ball, looking at its colour and determining from which box was extracted Box 1 Box 2 Box 3 30% 60% 10% 40% 30% 30% 10% 70% 20%
Box 1 Box 2 Box 3 30% 60% 10% 40% 30% 30% 10% 70% 20% • Let’s consider that the probability of selecting each box is the same: p(Bi) = 1/3 • Imagine someone gives you a white ball, which is the probability that the ball was extracted from box 2? p(B2|W) = ????
Box 1 Box 2 Box 3 30% 60% 10% 40% 30% 30% 10% 70% 20% p(B2|W) = ???? • By definition we know that: p(W|B1) = 0.3 p(W|B2) = 0.4 p(W|B2) = 0.1 • But we do not know p(W)
Box 1 Box 2 Box 3 30% 60% 10% 40% 30% 30% 10% 70% 20% p(B2|W) = ???? • But we can use the total probability theorem to discover the value of p(W):
Posterior Prior • The following table shows changes in beliefs • Imagine we were given a red ball, what would be the updated probability for each box? Posterior Prior
Posterior Prior • Finally, what would be the probability for each box if we were said that a yellow ball was extracted? • But, is there another way to solve this problem? • Yes, there is • Using a Bayesian Network • Let’s use the Balls network
Bayesian Networks A brief Introduction
Brief Historical Background • Late 70’s – early 80’s • Artificial intelligence • Machine learning and reasoning • Expert system = Knowledge Base + Inference Engine • Diagnostic decision tree, classification tree, flowchart or algorithm (Adapted from Cowell et. al., 1999)
Rule-based expert systems or production systems • If…then • IF headache & temperature THEN influenza • IF influenza THEN sneezing • IF influenza THEN weakness • Certainty factor • IF headache & fever THEN influenza (certainty 0.7) • IF influenza THEN sneezing (certainty 0.9) • IF influenza THEN weakness (certainty 0.6) (Example adpted from Cowell et. al., 1999)
What is a Bayesian Network? • There are several names for it, among others: Bayes net, belief network, causal network, influence diagram, probabilistic expert system • “a set of related uncertainties” (Edwards, 1998) • For Xiang (2002): […] it is triad V, G, P where: • V, is a set of variables • G, is a directed acyclic graph (DAG) • P, is a set of probability distributions • To make things practical we could say: • Qualitative dimension • Quantitative dimension
Qualitative Structure • Graph: a set of vertexes (V) and a set of links (L) • Directed Acyclic Graph (DAG) • The meaning of a connection: A B • The Principle of Conditional Independence • Three types of basic connections | Evidence propagation Serial connectionCausal-chain model B A C
B Divergent connectionDiverging connectionCommon-cause model A C Convergent connectionConverging connectionCommon-effect model A C B
A Classical Example • Mr. Holmes is working in his office when he receives a phone call from his neighbour Dr. Watson, who tells him that Holmes’ burglar alarm has gone off. Convinced that a burglar has broken into his house, Holmes rushes to his car and heads for home. On his way, he listens to the radio, and in the news it is reported that there has been a small earthquake in the area. Knowing that earthquakes have a tendency to turn burglar alarms on, he returns to his work.
Quantitative Structure • Probability as a belief(Cox, 1946; Dixon, 1970) • Bayes Theorem • Each variable (node) in the model is a conditional probability function of others variables • Conditional Probability Tables (CPT)
Pros and cons of Bayes nets • Hybrid nets • Time series • Software • Qualitative - Quantitative • Missing data • Non-parametric models • Interaction–non-linearity • Inference – scenarios • Local computations • Easy interpretation
Software • Netica Application (Norsys Software Corp.) www.norsys.com • Hugin (HuginExper A/S) www.hugin.com • Ergo (Noetic Systems Inc.) www.noeticsystems.com • Elvira (Academic development) http://www.ia.uned.es/~elvira • Tetrad (CMU, NASA, ONR) http://www.phil.cmu.edu/projects/tetrad/ R MATLAB