260 likes | 339 Views
Chap 6 Bayesian Learning (3). 贝叶斯学习. Bayesian Belief Networks(6). Inference Bayes net is used to infer the (probabilities of ) values of one or more network variables, given observed value or others. In general case, the problem is NP hard.
E N D
Bayesian Belief Networks(6) • Inference • Bayes net is used to infer the (probabilities of ) values of one or more network variables, given observed value or others. • In general case, the problem is NP hard. • In practice, approximate inference methods work well for some network structure. • Monte Carlo methods simulate the network randomly to calculate the approximate solution.
Bayesian Belief Networks(7) • Learning Bayesian belief nets • Net work structure might be known or unknown. • Training examples might provide values of all network variables, or just some.
Bayesian Belief Networks(8) • Learning Methods • Gradient Ascent (Hill-climbing) • Expectation-Maximization (EM) Algorithm • …
Bayesian Belief Networks(9) • If structure known and observe all variables, then it’s easy as training a Naïve Bayes classifier. • Suppose structure known and variables partially observable. • Similar to training neural nets with hidden units. • Search for the maximum likelihood hypothesis hML=argmaxP(D|h).
Bayesian Belief Networks(10) • Gradient ascent for Bayes nets • Let wijk denote the conditionally probability that the network variable Yi will take on the value yij given that its immediate parents Ui take on the values uik. Yi= Campfire Ui=<Storm, BusTourGroup> yij= True uik=< False,False>
Bayesian Belief Networks(11) • Maximize lnP(D|h) maximize P(D|h) • The gradient of lnP(D|h) is given by the derivatives for each of the wijk. • We can proof:
Bayesian Belief Networks(12) • Proof • To simplify notation: Ph(D)=P(D|h) • Assuming the training examples d in the data set D are dawn independently.
Bayesian Belief Networks(13) • Proof(cont.) • We can now introduce the values of the variables Yi and Ui=Parents(Yi), by summing over their possible values yij’ and uik’.
Bayesian Belief Networks(14) • Given that wijkPh(yij|uik), the only term in this sum for which is nonzero is the term for which j’=j and k’=k. Therefore
Bayesian Belief Networks(15) • Applying Bayes theorem to rewrite Ph(d|yij,uik), we have
Bayesian Belief Networks(16) • Initialize the CPTs with random generated values. • Perform gradient ascent by repeatedly • Update all wijk using training data D • Then renormalize the wijk to assure
Bayesian Belief Networks(17) • When structure unknown… • Algorithms use greedy search to add or subtract edges and nodes. • Active research area ---Extend from boolean to real-valued variables ---Parameterized the distribution instead of tables
Content • Review • Minimum Description Length Principle • Bayesian Belief Networks • The EM Algorithm • Summary
The EM Algorithm(1) • Expectation maximum (Dempster 1977) • When to use? • Data is only partially observable. • Unsupervised clustering (target value unobservable) • Supervised learning (some instance attributes unobservable)
The EM Algorithm(2) • EM for estimating k Means • Given • Instances from X generated by mixture of k Gaussian distributions with same variance 2 . • Unknown means <1,…, k> of k Gaussians. • Don’t know which instance xi is generated by which Gaussian. • k=2
The EM Algorithm(3) P(x) x
The EM Algorithm(4) • To determine • Maximum likelihood hypothesis h= <1,…, k> which maximizes p(D|h). • yi=<xi, zi1, zi2> • zij is 1 if xi is generated by jth Gaussian, otherwise 0. • xi observable • zij unobservable
The EM Algorithm(7) • General statement of EM algorithm • Given • Observed data X={x1,…, xm} • Unobserved data Z={z1,…,zm} • Parameterized probability distribution P(Y|h) ---Y={y1,…,ym} is the full data yi=xizi ---h are the parameters
The EM Algorithm(8) • Determine h that (locally) maximizes E[lnP(Y|h)]
The EM Algorithm(10) • Train Bayesian belief networks • Unsupervised clustering • Learn Hidden Markov Model • …
练习 • Textbook 6.2, 6.3 • Additional Exercise( see the following slide)
A A 0.1 0.9 Additional Exercise (1) Suppose there are 5 variables A, B, C, D, E, the relationship among the 5 variables is depicted by a Bayesian Network as follows. If we already know the value of variable A, B, E, D as shown in the graph, Could you predict the value of variable C ? B B 0.2 0.8 A B C D E F F ? F T A B A B C C T T 0.9 0.1 T F 0.6 0.4 F T 0.3 0.7 F F 0.2 0.8 C D E C E E T 0.8 0.2 F 0.1 0.9 C D D T 0.9 0.1 F 0.2 0.8