210 likes | 230 Views
Learn about Bayesian networks, probabilistic reasoning, exact inference, variable elimination, and compact representation in AI. Study the syntax and semantics with examples for a better grasp.
E N D
Artificial IntelligenceProbabilistic reasoning Fall 2008 professor: Luigi Ceccaroni
Bayesian networks • A simple, graphical notation for conditional independence assertions and hence for compact specification of full joint distributions. • Syntax: • a set of nodes, one per variable • a directed, acyclic graph (links ≈ "directly influences") • a conditional distribution for each node given its parents: P (Xi | Parents (Xi)) • In the simplest case, conditional distribution represented as a conditional probability table (CPT) giving the distribution over Xi for each combination of parent values.
Example • Topology of network encodes conditional independence assertions: • Weather is independent of the other variables. • Toothache and Catch are conditionally independent given Cavity.
Example • What is the probability of having a heart attack? • This probability depends on “4 variables”: • Sport • Diet • Blood pressure • Smoking • Knowing the dependency among these variables let us build a Bayesian network.
Constructing Bayesian networks • 1. Choose an ordering of variables X1, … ,Xn • 2. For i = 1 to n • add Xi to the network • select parents from X1, … ,Xi-1 such that P (Xi | Parents(Xi)) = P (Xi | X1, ... Xi-1) This choice of parents guarantees: P (X1, … ,Xn) = πi =1P (Xi | X1, … , Xi-1) (chain rule) = πi =1P (Xi | Parents(Xi)) (by construction) n n
Bp Diet Smoking Diet Sport Sport Sm P(Sm) P(Sp) P(Di) P(Ha=yes) P(Bp = high) P(Bp = normal) P(Ha=no) Blood pressure Heart attack Diet high balanced yes yes 0.4 0.4 0.8 0.2 Sport Smoking yes 0.1 bal. yes 0.01 0.99 norm. unbalanced no yes 0.6 0.6 0.6 0.4 no 0.9 unbal. yes 0.2 0.8 high no 0.7 0.3 bal. no 0.25 0.75 norm. no 0.3 0.7 unbal. no 0.7 0.3 Example
Compactness • A CPT for Boolean Xi with k Boolean parents has 2k rows for the combinations of parent values. • Each row requires one number p for Xi = true(the number for Xi = false is just 1-p). • If each variable (n) has no more than k parents (k<<n), the complete network requires O(n · 2k) numbers.
Representation cost • The network grows linearly with n, vs. O(2n)for the conditional full joint distribution. • Examples: • With 10 variables and at most 3 parents: • 80 vs. 1024 • With 100 variables and at most 5 parents: • 3200 vs. 1030
Semantics The full joint distribution is defined as the product of the local conditional distributions: P (X1, … ,Xn) = πi = 1P (Xi | Parents(Xi)) Example: P (sp ∧ Di=balanced ∧ Bp=high ∧¬sm ∧¬ha) = = P (sp) P (Di=balanced) P (Bp=high | sp, Di=balanced) P (¬sm) P (¬ha | Bp=high, ¬sm) n
Bayesian networks – Joint distribution - Example P(ha ∧ Bp = high ∧ sm ∧ sp ∧ Di = balanced) = P(ha | Bp = high, sm) P(Bp = high | sp, Di = balanced) P(sm) P(sp) P(Di = balanced) = 0.8 x 0.01 x 0.4 x 0.1 x 0.4 = 0.000128
Exact inference in Bayesian networks: example • Inference by enumeration: P(X | e) = α P(X, e) = αy P(X, e, y) • Let’s calculate: P(Smoking | Heart attack = yes, Sport = no) • The full joint distribution of the network is: P(Sp, Di, Bp, Sm, Ha) = = P(Sp) P(Di) P(Bp | Sp, Di) P(Sm) P(Ha | Bp, Sm) • We want to calculate: P(Sm | ha, ¬sp).
Exact inference in Bayesian networks: example P(Sm | ha, ¬sp) = αP(Sm, ha, ¬sp) = = αDi{b, ¬b}Bp{h, n}P(Sm, ha, ¬sp, Di, Bp) = = α P(¬sp) P(Sm) Di{b, ¬b}P(Di) Bp{h,n}P(Bp | ¬sp, Di) P(ha | Bp, Sm) = = α<0.9 * 0.4 * (0.4 * (0.25 * 0.8 + 0.75 * 0.6) + 0.6 * (0.7 * 0.8 + 0.3 * 0.6)), 0.9 * 0.6 * (0.4 * (0.25 * 0.7 + 0.75 * 0.3) + 0.6 * (0.7 * 0.7 + 0.3 * 0.3)> = = α<0.253, 0.274> = <0.48, 0,52>
Variable elimination algorithm • The variable elimination algorithm let us avoid the calculation repetition of inference by enumeration. • Each variable is represented by a factor. • Intermediate results are saved to be later reused. • Non-relevant variables, being constant factors, are not directly computed.
Variable elimination algorithm CALCULA-FACTOR generates the factor corresponding to variable var in the function of the joint probability distribution. PRODUCTO-Y-SUMA multiplies factors and sums over the hidden variable. PRODUCTO multiplies a set of factors.
Variable elimination algorithm - Example α P(¬sp) P(Sm) Di{b, ¬b}P(Di) Bp{h,n}P(Bp | ¬sp, Di) P(ha | Bp, Sm) • Factor for variable Heart attack P(ha | Bp, Sm), fHa(Bp, Sm):
Variable elimination algorithm - Example • Factor for variable Blood pressure P(Bp | ¬sp, Di), fBp(Bp, Di): • To put together the factors just obtained, we calculate the product of fHa(Bp, Sm) x fBp(Bp, Di) = fHa Bp(Bp, Sm, Di)
Variable elimination algorithm - Example fHa Bp(Bp, Sm, Di) = =fHa(Bp, Sm) xfBp(Bp, Di)
Variable elimination algorithm - Example • We sum over the values of variable Bp to obtain factor fHaBp(Sm, Di) • Factor for variable Di, fDi(Di):
Variable elimination algorithm - Example • fHa Di Bp(Sm, Di) =fDi(Di) xfHaBp(Sm, Di) • We sum over the values of variable Di to obtain factor fHaDiBp(Sm)
Variable elimination algorithm - Example • Factor for variable Sm, fSm(Sm): • fHa Sm DiBp(Sm) = fSm(Sm) x fHa DiBp(Sm) • Normalizing, we obtain:
Summary • Bayesian networks provide a natural representation for (causally induced) conditional independence. • Topology + CPTs = compact representation of joint distribution. • Generally easy for domain experts to construct.