1 / 26

Chap 6 Bayesian Learning (3)

Chap 6 Bayesian Learning (3). 贝叶斯学习. Bayesian Belief Networks(6). Inference Bayes net is used to infer the (probabilities of ) values of one or more network variables, given observed value or others. In general case, the problem is NP hard.

rose-salas
Download Presentation

Chap 6 Bayesian Learning (3)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chap 6 Bayesian Learning (3) 贝叶斯学习

  2. Bayesian Belief Networks(6) • Inference • Bayes net is used to infer the (probabilities of ) values of one or more network variables, given observed value or others. • In general case, the problem is NP hard. • In practice, approximate inference methods work well for some network structure. • Monte Carlo methods simulate the network randomly to calculate the approximate solution.

  3. Bayesian Belief Networks(7) • Learning Bayesian belief nets • Net work structure might be known or unknown. • Training examples might provide values of all network variables, or just some.

  4. Bayesian Belief Networks(8) • Learning Methods • Gradient Ascent (Hill-climbing) • Expectation-Maximization (EM) Algorithm • …

  5. Bayesian Belief Networks(9) • If structure known and observe all variables, then it’s easy as training a Naïve Bayes classifier. • Suppose structure known and variables partially observable. • Similar to training neural nets with hidden units. • Search for the maximum likelihood hypothesis hML=argmaxP(D|h).

  6. Bayesian Belief Networks(10) • Gradient ascent for Bayes nets • Let wijk denote the conditionally probability that the network variable Yi will take on the value yij given that its immediate parents Ui take on the values uik. Yi= Campfire Ui=<Storm, BusTourGroup> yij= True uik=< False,False>

  7. Bayesian Belief Networks(11) • Maximize lnP(D|h) maximize P(D|h) • The gradient of lnP(D|h) is given by the derivatives for each of the wijk. • We can proof:

  8. Bayesian Belief Networks(12) • Proof • To simplify notation: Ph(D)=P(D|h) • Assuming the training examples d in the data set D are dawn independently.

  9. Bayesian Belief Networks(13) • Proof(cont.) • We can now introduce the values of the variables Yi and Ui=Parents(Yi), by summing over their possible values yij’ and uik’.

  10. Bayesian Belief Networks(14) • Given that wijkPh(yij|uik), the only term in this sum for which is nonzero is the term for which j’=j and k’=k. Therefore

  11. Bayesian Belief Networks(15) • Applying Bayes theorem to rewrite Ph(d|yij,uik), we have

  12. Bayesian Belief Networks(16) • Initialize the CPTs with random generated values. • Perform gradient ascent by repeatedly • Update all wijk using training data D • Then renormalize the wijk to assure

  13. Bayesian Belief Networks(17) • When structure unknown… • Algorithms use greedy search to add or subtract edges and nodes. • Active research area ---Extend from boolean to real-valued variables ---Parameterized the distribution instead of tables

  14. Content • Review • Minimum Description Length Principle • Bayesian Belief Networks • The EM Algorithm • Summary

  15. The EM Algorithm(1) • Expectation maximum (Dempster 1977) • When to use? • Data is only partially observable. • Unsupervised clustering (target value unobservable) • Supervised learning (some instance attributes unobservable)

  16. The EM Algorithm(2) • EM for estimating k Means • Given • Instances from X generated by mixture of k Gaussian distributions with same variance 2 . • Unknown means <1,…, k> of k Gaussians. • Don’t know which instance xi is generated by which Gaussian. • k=2

  17. The EM Algorithm(3) P(x) x

  18. The EM Algorithm(4) • To determine • Maximum likelihood hypothesis h= <1,…, k> which maximizes p(D|h). • yi=<xi, zi1, zi2> • zij is 1 if xi is generated by jth Gaussian, otherwise 0. • xi observable • zij unobservable

  19. The EM Algorithm(5)

  20. The EM Algorithm(6)

  21. The EM Algorithm(7) • General statement of EM algorithm • Given • Observed data X={x1,…, xm} • Unobserved data Z={z1,…,zm} • Parameterized probability distribution P(Y|h) ---Y={y1,…,ym} is the full data yi=xizi ---h are the parameters

  22. The EM Algorithm(8) • Determine h that (locally) maximizes E[lnP(Y|h)]

  23. The EM Algorithm(9)

  24. The EM Algorithm(10) • Train Bayesian belief networks • Unsupervised clustering • Learn Hidden Markov Model • …

  25. 练习 • Textbook 6.2, 6.3 • Additional Exercise( see the following slide)

  26. A A 0.1 0.9 Additional Exercise (1) Suppose there are 5 variables A, B, C, D, E, the relationship among the 5 variables is depicted by a Bayesian Network as follows. If we already know the value of variable A, B, E, D as shown in the graph, Could you predict the value of variable C ? B B 0.2 0.8 A B C D E F F ? F T A B A B C C T T 0.9 0.1 T F 0.6 0.4 F T 0.3 0.7 F F 0.2 0.8 C D E C E E T 0.8 0.2 F 0.1 0.9 C D D T 0.9 0.1 F 0.2 0.8

More Related