1 / 15

Advances in Bayesian Learning Learning and Inference in Bayesian Networks

Advances in Bayesian Learning Learning and Inference in Bayesian Networks. Irina Rish IBM T.J.Watson Research Center rish@us.ibm.com. “Road map”. Introduction and motivation: What are Bayesian networks and why use them? How to use them Probabilistic inference How to learn them

hansel
Download Presentation

Advances in Bayesian Learning Learning and Inference in Bayesian Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advances in Bayesian LearningLearning and Inference in Bayesian Networks Irina Rish IBM T.J.Watson Research Center rish@us.ibm.com

  2. “Road map” • Introduction and motivation: • What are Bayesian networks and why use them? • How to use them • Probabilistic inference • How to learn them • Learning parameters • Learning graph structure • Summary

  3. Smoking lung Cancer Bronchitis X-ray Dyspnoea Bayesian Networks P (lung cancer=yes | smoking=no, dyspnoea=yes ) = ?

  4. cause cause • Classification: P(class|data) Text Classification Medicine symptom Bio-informatics Speech recognition symptom Computer troubleshooting Stock market What are they good for? • Diagnosis: P(cause|symptom)=? • Prediction: P(symptom|cause)=? • Decision-making (given a cost function)

  5. P(S) P(C|S) P(B|S) • C B D=0 D=1 • 0 0 0.1 0.9 • 0 1 0.7 0.3 • 1 0 0.8 0.2 • 1 1 0.9 0.1 CPD: P(X|C,S) P(D|C,B) Conditional Independencies Efficient Representation Bayesian Networks: Representation Smoking lung Cancer Bronchitis X-ray Dyspnoea P(S, C, B, X, D) = P(S) P(C|S) P(B|S) P(X|C,S) P(D|C,B)

  6. Example: Printer Troubleshooting

  7. S C B X D “Moral” graph P(s)P(c|s)P(b|s)P(x|c,s)P(d|c,b)= P(b|s) P(c|s)P(x|c,s)P(d|c,b) Variable Elimination W*=4 ”induced width” (max clique size) Complexity: Bayesian networks: inferenceP(X|evidence)=? P(s|d=1) C B X D P(s) Efficient inference: variable orderings, conditioning, approximations

  8. “Road map” • Introduction and motivation: • What are Bayesian networks and why use them? • How to use them • Probabilistic inference • Why and how to learn them • Learning parameters • Learning graph structure • Summary

  9. Combining domain expert knowledge with data • Incremental learning: P(H) or <9.7 0.6 8 14 18> <0.2 1.3 5 ?? ??> <1.3 2.8 ?? 0 1 > <?? 5.6 0 10 ??> ………………. S C • Learning causal relationships: Why learn Bayesian networks? • Efficient representation and inference • Handling missing data: <1.3 2.8 ?? 0 1 >

  10. Known graph – learn parameters • Complete data: P(S) S parameter estimation (ML, MAP) • Incomplete data: P(C|S) P(B|S) B C non-linear parametric optimization (gradient descent, EM) P(X|C,S) P(D|C,B) D X • Unknown graph – learn graph and parameters • Complete data: optimization (search in space of graphs) S S C B B C • Incomplete data: structural EM, mixture models D D X X Learning Bayesian Networks

  11. - decomposable! Multinomial counts C B X • MAP-estimate (Bayesian statistics) Conjugate priors - Dirichlet Equivalent sample size (prior knowledge) Learning Parameters:complete data • ML-estimate:

  12. Non-decomposablemarginal likelihood (hidden nodes) Initial parameters Expectation Inference: P(S|X=0,D=1,C=0,B=1) Expected counts Current model S X D C B <? 0 1 0 1> <1 1 ? 0 1> <0 0 0 ??> <? ? 0 ? 1> ……… S X D C B 1 0 1 0 1 1 1 1 0 1 0 0 0 00 1 0 0 0 1 ……….. Data Maximization Update parameters (ML, MAP) Learning Parameters:incomplete data EM-algorithm: iterate until convergence

  13. Find NP-hard optimization S Add S->B S B C C B Delete S->B S Reverse S->B B C S B C Learning graph structure • Heuristic search: • Greedy local search • Best-first search • Simulated annealing Complete data – local computations Incomplete data (score non-decomposable): Structural EM • Constrained-based methods • Data impose independence relations (constrains)

  14. <9.7 0.6 8 14 18> <0.2 1.3 5 ?? ??> <1.3 2.8 ?? 0 1 > <?? 5.6 0 10 ??> ………………. Scoring functions:Minimum Description Length (MDL) • Learning  data compression • Other: MDL = -BIC (Bayesian Information Criterion) • Bayesian score (BDe) - asymptotically equivalent to MDL DL(Data|model) DL(Model)

  15. Summary • Bayesian Networks – graphical probabilistic models • Efficient representation and inference • Expert knowledge + learning from data • Learning: • parameters (parameter estimation, EM) • structure (optimization w/ score functions – e.g., MDL) • Applications/systems: collaborative filtering (MSBN), fraud detection (AT&T), classification (AutoClass (NASA), TAN-BLT(SRI)) • Future directions: causality, time, model evaluation criteria, approximate inference/learning, on-line learning, etc.

More Related