Bayesian Optimization Algorithm, Decision Graphs, and Occam’s Razor

Bayesian Optimization Algorithm, Decision Graphs, and Occam’s Razor Martin Pelikan, David E. Goldberg, and Kumara Sastry IlliGAL Report No. 2000020 May 2000.

Abstract • The use of various scoring metrics for Bayesian networks. • The use of decision graphs in Bayesian networks to improve the performance of the BOA. • BDe metric for Bayesian networks with decision graphs.

Bayesian Networks • Two basics components in Bayesian Networks • A scoring metric for discriminates the networks • A search algorithm for finding the best scoring metric value • BOA (in previous works) • The complexity of the considered models was bounded by the maximum number of incoming edges into any node. • To search the space of networks, a simple greedy algorithm was used due to its efficiency.

Bayesian-Dirichlet Metric • BDe metric combines the prior knowledge about the problem and the statistical data from a given data set. • Bayes theorem • The higher the p(B|D), the more likely the network B is a correct model of the data.  Bayesian scoring metric, or the posterior probability • Even more, we use a fixed data set D.

Bayesian-Dirichlet Metric • p(B) : prior probability of the network B • BDe metric gives preference to simpler networks • But, it’s not enough!

Bayesian-Dirichlet Metric • p(B|D) • Data is a multinomial sample • Parameters are independent • The parameters associated with each variable are independent (global parameter independence) • The parameters associated with each instance of the parents of a variable are independent (local parameter independence) • Dirichlet distribution • No missing data (complete data)

Bayesian-Dirichlet Metric • Often referred to K2 metric

Minimum Description Length Metric • Not good for using prior information

Constructing a Network • Constructing a best network is NP-complete. • Most of the commonly used metrics can be decomposed into independent terms each of which corresponds to one variable. • Empirical results show that more sophisticated search algorithms do not improve the obtained result significantly.

Decision Graphs in Bayesian Networks • The use of local structures as decision trees, decision graphs, and default tables to represent equalities among parameters was proposed • The network construction algorithm takes an advantage of using decision graphs by directly manipulating the network structure through the graphs.

Decision Graphs • A decision graph is an extension of a decision tree in which each non-root node can have multiple parents.

Advantages of Decision Graph • Much less parents can be used to represent a model • Learning more complex class of models, called Bayesian multinets • Performs smaller and more specific steps what results in better models with respect to their likelihood. • Network complexity measure can be incorporated into the scoring metir

Bayesian Score for Networks with Decision Graphs

Operators on Decision Graphs split merge

Constructing BN with DG • Initialize a decision graph Gi for each node xi to a graph containing only a single leaf. • Initialize the network B into an empty network. • Choose the best split or merge that does not result in a cycle in B. • If the best operator does not improve the score, finish.

Constructing BN with DG • Execute the chosen operator • If the operator was a split, update the network B by adding a new edge. • Go to (3)

Experiments • One-max • 3-deceptive • Spin-glass • Graph bisection

Bayesian Optimization Algorithm, Decision Graphs, and Occam’s Razor

Bayesian Optimization Algorithm, Decision Graphs, and Occam’s Razor

Presentation Transcript

Bayesian Decision Theory

Pearl s algorithm

Decision Tree Algorithm

Lecture 2. Bayesian Decision Theory

Bayesian Decision Theory (Classification)

A Syntactic Justification of Occam s Razor

Dynamic Batch Bayesian Optimization

Bayesian Decision Theory Case Studies

Chap. 4 Decision Graphs

LECTURE 02: BAYESIAN DECISION THEORY

LECTURE 02: BAYESIAN DECISION THEORY

Optimization/Decision Problems

Sharpening Occam ’s razor with Quantum Mechanics

Sample size optimization in BA and BE trials using a Bayesian decision theoretic framework

Bayesian Optimization with Experimental Constraints

BOA (Bayesian Optimization Algorithm)

’s Search Algorithm

Bayesian Statistics, MCMC, and the Expectation Maximization Algorithm

The Grid-Occam Project

Bayesian Optimization (BO)

Occam and Transputers

Reconstruction Algorithm for Permutation Graphs