170 likes | 523 Views
Bayesian Optimization Algorithm, Decision Graphs, and Occam’s Razor. Martin Pelikan, David E. Goldberg, and Kumara Sastry IlliGAL Report No. 2000020 May 2000. Abstract. The use of various scoring metrics for Bayesian networks.
E N D
Bayesian Optimization Algorithm, Decision Graphs, and Occam’s Razor Martin Pelikan, David E. Goldberg, and Kumara Sastry IlliGAL Report No. 2000020 May 2000.
Abstract • The use of various scoring metrics for Bayesian networks. • The use of decision graphs in Bayesian networks to improve the performance of the BOA. • BDe metric for Bayesian networks with decision graphs.
Bayesian Networks • Two basics components in Bayesian Networks • A scoring metric for discriminates the networks • A search algorithm for finding the best scoring metric value • BOA (in previous works) • The complexity of the considered models was bounded by the maximum number of incoming edges into any node. • To search the space of networks, a simple greedy algorithm was used due to its efficiency.
Bayesian-Dirichlet Metric • BDe metric combines the prior knowledge about the problem and the statistical data from a given data set. • Bayes theorem • The higher the p(B|D), the more likely the network B is a correct model of the data. Bayesian scoring metric, or the posterior probability • Even more, we use a fixed data set D.
Bayesian-Dirichlet Metric • p(B) : prior probability of the network B • BDe metric gives preference to simpler networks • But, it’s not enough!
Bayesian-Dirichlet Metric • p(B|D) • Data is a multinomial sample • Parameters are independent • The parameters associated with each variable are independent (global parameter independence) • The parameters associated with each instance of the parents of a variable are independent (local parameter independence) • Dirichlet distribution • No missing data (complete data)
Bayesian-Dirichlet Metric • Often referred to K2 metric
Minimum Description Length Metric • Not good for using prior information
Constructing a Network • Constructing a best network is NP-complete. • Most of the commonly used metrics can be decomposed into independent terms each of which corresponds to one variable. • Empirical results show that more sophisticated search algorithms do not improve the obtained result significantly.
Decision Graphs in Bayesian Networks • The use of local structures as decision trees, decision graphs, and default tables to represent equalities among parameters was proposed • The network construction algorithm takes an advantage of using decision graphs by directly manipulating the network structure through the graphs.
Decision Graphs • A decision graph is an extension of a decision tree in which each non-root node can have multiple parents.
Advantages of Decision Graph • Much less parents can be used to represent a model • Learning more complex class of models, called Bayesian multinets • Performs smaller and more specific steps what results in better models with respect to their likelihood. • Network complexity measure can be incorporated into the scoring metir
Operators on Decision Graphs split merge
Constructing BN with DG • Initialize a decision graph Gi for each node xi to a graph containing only a single leaf. • Initialize the network B into an empty network. • Choose the best split or merge that does not result in a cycle in B. • If the best operator does not improve the score, finish.
Constructing BN with DG • Execute the chosen operator • If the operator was a split, update the network B by adding a new edge. • Go to (3)
Experiments • One-max • 3-deceptive • Spin-glass • Graph bisection