1 / 49

A Tutorial On Learning With Bayesian Networks

This tutorial provides an introduction to learning with Bayesian Networks, including their construction, algorithms for probabilistic inference, and learning probabilities and structure. It also explores the relationship between Bayesian Network techniques and methods for supervised and unsupervised learning.

christieg
Download Presentation

A Tutorial On Learning With Bayesian Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Tutorial On Learning WithBayesian Networks David HeckerMann Haimonti Dutta , Department Of Computer And Information Science

  2. Outline • Introduction • Bayesian Interpretation of probability and review methods • Bayesian Networks and Construction from prior knowledge • Algorithms for probabilistic inference • Learning probabilities and structure in a bayesian network • Relationships between Bayesian Network techniques and methods for supervised and unsupervised learning • Conclusion Haimonti Dutta , Department Of Computer And Information Science

  3. Introduction A bayesian network is a graphical model for probabilistic relationships among a set of variables Haimonti Dutta , Department Of Computer And Information Science

  4. What do Bayesian Networks and Bayesian Methods have to offer ? • Handling of Incomplete Data Sets • Learning about Causal Networks • Facilitating the combination of domain knowledge and data • Efficient and principled approach for avoiding the over fitting of data Haimonti Dutta , Department Of Computer And Information Science

  5. The Bayesian Approach to Probability and Statistics Bayesian Probability : the degree of belief in that event Classical Probability : true or physical probability of an event Haimonti Dutta , Department Of Computer And Information Science

  6. Some Criticisms of Bayesian Probability • Why degrees of belief satisfy the rules of probability • On what scale should probabilities be measured? • What probabilites are to be assigned to beliefs that are not in extremes? Haimonti Dutta , Department Of Computer And Information Science

  7. Some Answers …… • Researchers have suggested different sets of properties that are satisfied by the degrees of belief Haimonti Dutta , Department Of Computer And Information Science

  8. Scaling Problem The probability wheel : a tool for assessing probabilities What is the probability that the fortune wheel stops in the shaded region? Haimonti Dutta , Department Of Computer And Information Science

  9. Probability assessment An evident problem : SENSITIVITY How can we say that the probability of an event is 0.601 and not .599 ? Another problem : ACCURACY Methods for improving accuracy are available in decision analysis techniques Haimonti Dutta , Department Of Computer And Information Science

  10. Learning with Data Thumbtack problem When tossed it can rest on either heads or tails Heads Tails Haimonti Dutta , Department Of Computer And Information Science

  11. Problem ……… From N observations we want to determine the probability of heads on the N+1 th toss. Haimonti Dutta , Department Of Computer And Information Science

  12. Two Approaches Classical Approach : • assert some physical probability of heads (unknown) • Estimate this physical probability from N observations • Use this estimate as probability for the heads on the N+1 th toss. Haimonti Dutta , Department Of Computer And Information Science

  13. The other approach Bayesian Approach • Assert some physical probability • Encode the uncertainty about this physical probability using the Bayesian probailities • Use the rules of probability to compute the required probability Haimonti Dutta , Department Of Computer And Information Science

  14. Some basic probability formulas • Bayes theorem : the posterior probability for  given D and a background knowledge  : p(/D, ) = p( /  ) p (D/  ,  ) P(D / ) Where p(D/ )= p(D/ , ) p( / ) d  Note :  is an uncertain variable whose value corresponds to the possible true values of the physical probability Haimonti Dutta , Department Of Computer And Information Science

  15. Likelihood function How good is a particular value of  ? It depends on how likely it is capable of generating the observed data L ( :D ) = P( D/  ) Hence the likelihood of the sequence H, T,H,T ,T may be L ( :D ) =  . (1- ). . (1- ). (1- ). Haimonti Dutta , Department Of Computer And Information Science

  16. Sufficient statistics To compute the likelihood in the thumb tack problem we only require h and t(the number of heads and the number of tails) h and t are called sufficient statistics for the binomial distribution A sufficient statistic is a function that summarizes from the data , the relevant information for the likelihood Haimonti Dutta , Department Of Computer And Information Science

  17. Finally ………. We average over the possible values of  to determine the probability that the N+1 th toss of the thumb tack will come up heads P(X =heads / D,) = p(/D, ) d n+1 The above value is also referred to as the Expectation of  with respect to the distribution p(/D,) Haimonti Dutta , Department Of Computer And Information Science

  18. To remember… We need a method to assess the prior distribution for  . A common approach usually adopted is assume that the distribution is a beta distribution. Haimonti Dutta , Department Of Computer And Information Science

  19. Maximum Likelihood Estimation MLE principle : We try to learn the parameters that maximize the likelihood function It is one of the most commonly used estimators in statistics and is intuitively appealing Haimonti Dutta , Department Of Computer And Information Science

  20. A graphical model that efficiently encodes the joint probability distribution for a large set of variables What is a Bayesian Network ? Haimonti Dutta , Department Of Computer And Information Science

  21. Definition A Bayesian Network for a set of variables X = { X1,…….Xn} contains • network structure S encoding conditional independence assertions about X • a set P of local probability distributions The network structure S is a directed acyclic graph And the nodes are in one to one correspondence with the variables X.Lack of an arc denotes a conditional independence. Haimonti Dutta , Department Of Computer And Information Science

  22. Some conventions………. • Variables depicted as nodes • Arcs represent probabilistic dependence between variables • Conditional probabilities encode the strength of dependencies Haimonti Dutta , Department Of Computer And Information Science

  23. An Example Detecting Credit - Card Fraud Fraud Age Sex Gas Jewelry Haimonti Dutta , Department Of Computer And Information Science

  24. Tasks • Correctly identify the goals of modeling • Identify many possible observations that may be relevant to a problem • Determine what subset of those observations is worthwhile to model • Organize the observations into variables having mutually exclusive and collectively exhaustive states. Finally we are to build a Directed A cyclic Graph that encodes the assertions of conditional independence Haimonti Dutta , Department Of Computer And Information Science

  25. A technique of constructing a Bayesian Network The approach is based on the following observations : • People can often readily assert causal relationships among the variables • Casual relations typically correspond to assertions of conditional dependence To construct a Bayesian Network we simply draw arcs for a given set of variables from the cause variables to their immediate effects.In the final step we determine the local probability distributions. Haimonti Dutta , Department Of Computer And Information Science

  26. Problems • Steps are often intermingled in practice • Judgments of conditional independence and /or cause and effect can influence problem formulation • Assessments in probability may lead to changes in the network structure Haimonti Dutta , Department Of Computer And Information Science

  27. Bayesian inference On construction of a Bayesian network we need to determine the various probabilities of interest from the model Observed dataQuery Computation of a probability of interest given a model is probabilistic inference  x[m] x[m+1] x1 x2 Haimonti Dutta , Department Of Computer And Information Science

  28. Learning Probabilities in a Bayesian Network Problem : Using data to update the probabilities of a given network structure Thumbtack problem : We do not learn the probability of the heads , we update the posterior distribution for the variable that represents the physical probability of the heads The problem restated :Given a random sample D compute the posterior probability . Haimonti Dutta , Department Of Computer And Information Science

  29. Assumptions to compute the posterior probability • There is no missing data in the random sample D. • Parameters are independent . Haimonti Dutta , Department Of Computer And Information Science

  30. But…… Data may be missing and then how do we proceed ????????? Haimonti Dutta , Department Of Computer And Information Science

  31. Obvious concerns…. Why was the data missing? • Missing values • Hidden variables Is the absence of an observation dependent on the actual states of the variables? We deal with the missing data that are independent of the state Haimonti Dutta , Department Of Computer And Information Science

  32. Incomplete data (contd) Observations reveal that for any interesting set of local likelihoods and priors the exact computation of the posterior distribution will be intractable. We require approximation for incomplete data Haimonti Dutta , Department Of Computer And Information Science

  33. The various methods of approximations for Incomplete Data • Monte Carlo Sampling methods • Gaussian Approximation • MAP and Ml Approximations and EM algorithm Haimonti Dutta , Department Of Computer And Information Science

  34. Gibb’s Sampling The steps involved : Start : • Choose an initial state for each of the variables in X at random Iterate : • Unassign the current state of X1. • Compute the probability of this state given that of n-1 variables. • Repeat this procedure for all X creating a new sample of X • After “ burn in “ phase the possible configuration of X will be sampled with probability p(x). Haimonti Dutta , Department Of Computer And Information Science

  35. Problem in Monte Carlo method Intractable when the sample size is large Gaussian Approximation Idea : Large amounts of data can be approximated to a multivariate Gaussian Distribution. Haimonti Dutta , Department Of Computer And Information Science

  36. Criteria for Model Selection Some criterion must be used to determine the degree to which a network structure fits the prior knowledge and data Some such criteria include • Relative posterior probability • Local criteria Haimonti Dutta , Department Of Computer And Information Science

  37. Relative posterior probability A criteria for model selection is the logarithm of the relative posterior probability given as follows : Log p(D /Sh) = log p(Sh) + log p(D /Sh) log prior log marginal likelihood Haimonti Dutta , Department Of Computer And Information Science

  38. Local Criteria An Example : A Bayesian network structure for medical diagnosis Ailment Finding n Finding 1 Finding 2 Haimonti Dutta , Department Of Computer And Information Science

  39. Priors To compute the relative posterior probability We assess the • Structure priors p(Sh) • Parameter priors p(s /Sh) Haimonti Dutta , Department Of Computer And Information Science

  40. Priors on network parameters Key concepts : • Independence Equivalence • Distribution Equivalence Haimonti Dutta , Department Of Computer And Information Science

  41. Illustration of independent equivalence Independence assertion : X and Z are conditionally independent given Y X Z Y X X Z Y Y Z Haimonti Dutta , Department Of Computer And Information Science

  42. Priors on structures Various methods…. • Assumption that every hypothesis is equally likely ( usually for convenience) • Variables can be ordered and presence or absence of arcs are mutually independent • Use of prior networks • Imaginary data from domain experts Haimonti Dutta , Department Of Computer And Information Science

  43. Benefits of learning structures • Efficient learning --- more accurate models with less data • Compare P(A) and P(B) versus P(A,B) former requires less data • Discover structural properties of the domain • Helps to order events that occur sequentially and in sensitivity analysis and inference • Predict effect of the actions Haimonti Dutta , Department Of Computer And Information Science

  44. Search Methods Problem : We are to find the best network from the set of all networks in which each node has no more than k parents Search techniques : • Greedy Search • Greedy Search with restarts • Best first Search • Monte Carlo Methods Haimonti Dutta , Department Of Computer And Information Science

  45. Bayesian Networks for Supervised and Unsupervised learning Supervised learning : A natural representation in which to encode prior knowledge Unsupervised learning : • Apply the learning technique to select a model with no hidden variables • Look for sets of mutually dependent variables in the model • Create a new model with a hidden variable • Score new models possibly finding one better than the original. Haimonti Dutta , Department Of Computer And Information Science

  46. What is all this good for anyway???????? Implementations in real life : • It is used in the Microsoft products(Microsoft Office) • Medical applications and Biostatistics (BUGS) • In NASA Autoclass projectfor data analysis • Collaborative filtering (Microsoft – MSBN) • Fraud Detection (ATT) • Speech recognition (UC , Berkeley ) Haimonti Dutta , Department Of Computer And Information Science

  47. Limitations Of Bayesian Networks • Typically require initial knowledge of many probabilities…quality and extent of prior knowledge play an important role • Significant computational cost(NP hard task) • Unanticipated probability of an event is not taken care of. Haimonti Dutta , Department Of Computer And Information Science

  48. Conclusion Data +prior knowledge Inducer Bayesian Network Haimonti Dutta , Department Of Computer And Information Science

  49. Some Comments • Cross fertilization with other techniques? For e.g with decision trees, R trees and neural networks • Improvements in search techniques using the classical search methods ? • Application in some other areas as estimation of population death rate and birth rate, financial applications ? Haimonti Dutta , Department Of Computer And Information Science

More Related