Chap 6 Bayesian Learning (3)

Chap 6 Bayesian Learning (3) 贝叶斯学习

Bayesian Belief Networks(6) • Inference • Bayes net is used to infer the (probabilities of ) values of one or more network variables, given observed value or others. • In general case, the problem is NP hard. • In practice, approximate inference methods work well for some network structure. • Monte Carlo methods simulate the network randomly to calculate the approximate solution.

Bayesian Belief Networks(7) • Learning Bayesian belief nets • Net work structure might be known or unknown. • Training examples might provide values of all network variables, or just some.

Bayesian Belief Networks(8) • Learning Methods • Gradient Ascent (Hill-climbing) • Expectation-Maximization (EM) Algorithm • …

Bayesian Belief Networks(9) • If structure known and observe all variables, then it’s easy as training a Naïve Bayes classifier. • Suppose structure known and variables partially observable. • Similar to training neural nets with hidden units. • Search for the maximum likelihood hypothesis hML=argmaxP(D|h).

Bayesian Belief Networks(10) • Gradient ascent for Bayes nets • Let wijk denote the conditionally probability that the network variable Yi will take on the value yij given that its immediate parents Ui take on the values uik. Yi= Campfire Ui=<Storm, BusTourGroup> yij= True uik=< False,False>

Bayesian Belief Networks(11) • Maximize lnP(D|h) maximize P(D|h) • The gradient of lnP(D|h) is given by the derivatives for each of the wijk. • We can proof:

Bayesian Belief Networks(12) • Proof • To simplify notation: Ph(D)=P(D|h) • Assuming the training examples d in the data set D are dawn independently.

Bayesian Belief Networks(13) • Proof(cont.) • We can now introduce the values of the variables Yi and Ui=Parents(Yi), by summing over their possible values yij’ and uik’.

Bayesian Belief Networks(14) • Given that wijkPh(yij|uik), the only term in this sum for which is nonzero is the term for which j’=j and k’=k. Therefore

Bayesian Belief Networks(15) • Applying Bayes theorem to rewrite Ph(d|yij,uik), we have

Bayesian Belief Networks(16) • Initialize the CPTs with random generated values. • Perform gradient ascent by repeatedly • Update all wijk using training data D • Then renormalize the wijk to assure

Bayesian Belief Networks(17) • When structure unknown… • Algorithms use greedy search to add or subtract edges and nodes. • Active research area ---Extend from boolean to real-valued variables ---Parameterized the distribution instead of tables

Content • Review • Minimum Description Length Principle • Bayesian Belief Networks • The EM Algorithm • Summary

The EM Algorithm(1) • Expectation maximum (Dempster 1977) • When to use? • Data is only partially observable. • Unsupervised clustering (target value unobservable) • Supervised learning (some instance attributes unobservable)

The EM Algorithm(2) • EM for estimating k Means • Given • Instances from X generated by mixture of k Gaussian distributions with same variance 2 . • Unknown means <1,…, k> of k Gaussians. • Don’t know which instance xi is generated by which Gaussian. • k=2

The EM Algorithm(3) P(x) x

The EM Algorithm(4) • To determine • Maximum likelihood hypothesis h= <1,…, k> which maximizes p(D|h). • yi=<xi, zi1, zi2> • zij is 1 if xi is generated by jth Gaussian, otherwise 0. • xi observable • zij unobservable

The EM Algorithm(5)

The EM Algorithm(7) • General statement of EM algorithm • Given • Observed data X={x1,…, xm} • Unobserved data Z={z1,…,zm} • Parameterized probability distribution P(Y|h) ---Y={y1,…,ym} is the full data yi=xizi ---h are the parameters

The EM Algorithm(8) • Determine h that (locally) maximizes E[lnP(Y|h)]

The EM Algorithm(10) • Train Bayesian belief networks • Unsupervised clustering • Learn Hidden Markov Model • …

练习 • Textbook 6.2, 6.3 • Additional Exercise( see the following slide)

A A 0.1 0.9 Additional Exercise (1) Suppose there are 5 variables A, B, C, D, E, the relationship among the 5 variables is depicted by a Bayesian Network as follows. If we already know the value of variable A, B, E, D as shown in the graph, Could you predict the value of variable C ? B B 0.2 0.8 A B C D E F F ? F T A B A B C C T T 0.9 0.1 T F 0.6 0.4 F T 0.3 0.7 F F 0.2 0.8 C D E C E E T 0.8 0.2 F 0.1 0.9 C D D T 0.9 0.1 F 0.2 0.8

Chap 6 Bayesian Learning (3)

Chap 6 Bayesian Learning (3)

Presentation Transcript

Bayesian Learning

Bayesian Learning

Bayesian Learning

Bayesian Learning

Bayesian Learning and Learning Bayesian Networks

Bayesian Learning

Bayesian Learning

Bayesian Learning

Chap 6. Regression

Chap 6

Bayesian Learning

Bayesian Learning

Machine Learning Chapter 6. Bayesian Learning

Learning Bayesian Networks

Bayesian Learning

Bayesian Learning

Bayesian Learning

Bayesian learning

Learning Bayesian Networks

Learning Bayesian Networks

Chap 6: World Agriculture

Chap 6