240 likes | 255 Views
Explore the linkage problem, distribution estimation, and Bayesian networks in evolutionary computation. Learn about solutions, algorithms, capabilities, and difficulties of these concepts.
E N D
Linkage Problem, Distribution Estimation, and Bayesian Networks Evolutionary Computation 8(3) Martin Pelikan, David E. Goldberg, and Erick Cantu-Paz
Linkage problem • The problem of building block disruption • Due to crossover • Solutions • Changing the representation of solutions • Evolving the recombination operators • Extracting some information from the entire set of promising solutions in order to generate new solutions
Evolving Representation or Operators • Representation of solutions in the algorithm is to make the interacting components of partial solutions less likely to be broken by recombination. • Various reordering and mapping operators. • Too slow, not sufficiently powerful • Premature convergence. • Messy Genetic Algorithm • Linkage Learning Genetic Algorithm
Probabilistic Modeling • Estimation of Distribution Algorithms • No crossover • New solutions are generated by using the information extracted from entire set of promising solutions. • How to extract the information?
No Interaction • Population Based Incremental Learning (PBIL) (1994) • Compact Genetic Algorithm (cGA) (1998) • Univariate Marginal Distribution Algorithm (UMDA) (1997)
Pairwise Interaction • Dependency tree (1997) • Mutual-Information-Maximization Input Clustering (MIMIC) (1997) • Bivariate Marginal Distribution Algorithm (BMDA) (1999)
Multivariate Interactions • Factorized Distribution Algorithm (FDA) (1998) • Extended Compact Genetic Algorithm (ECGA) (1999) • Bayesian Optimization Algorithm (BOA) (1999)
Multivariate Interactions • Iterative Density Estimation Evolutionary Algorithm (IDEA) (2000) • Bayesian Network (1999) • Gaussian Network (1999) • Bayesian Evolutionary Optimization (Helmholtz Machine) (2000) • Probabilistic Principle Component Analysis (PPCA) (2001)
Capabilities & Difficulties • No interactions • Efficient on linear problems. • Higher order BBs. • Pairwise • Efficient with BBs of order 2. • Higher order BBs.
Capabilities & Difficulties • FDA • Efficient on decomp. Prob. • Prior information is essential. • ECGA • Efficient on separable prob. • Highly overlapping BBs. • BOA • General.
The Bayesian Optimization Algorithm (BOA) • BOA uses the identical class of distributions as the FDA. • does not require a valid distribution factorization as input. • able to learn the distribution on the fly without the use of any problem-specific information. • Prior information can be incorporated.
BOA • Set t 0. randomly generate initial population P(0) • Select a set of promising strings S(t) from P(t). • Construct the network B using a chosen metric and constraints. • Generate a set of new strings O(t) according to the joint distribution encoded by B. • Create a new population P(t+1) by replacing some strings from P(t) with O(t). Set t t+1. • If the termination criteria are not met, go to 2.
Bayesian Networks • The Bayesian Dirichlet metric (BDe) • Parametric learning • Greedy algorithms • Structure learning
Greedy algorithm for network construction • Initialize the network B. • Choose all simple graph operations that can be performed on the network without violating the constraints. • Pick the operation that increases the score of the network the most • Perform the operation picked in the previous step. • If the network can no longer be improved under given constraints on its complexity or a maximal number of iterations has been reached, finish • Go to 2.
Generation of a new instance • Mark all variable as unprocessed. • Pick up an unprocessed variable Xi with all parents processed already. • Set Xi to xi with probability p(Xi = xi|Xi = xi). • Mark Xi as already processed. • If there are unprocessed variables left, go to 2.
Additively Decomposable Functions • Additively decomposable functions (ADF) • Can be decomposable into smaller subproblems • Order-k decomposable function • There exists a set of l functions fi over subsets of variables Si for i = 0, …, l-1, each of the size at most k,
ADF, the Interactions • ADFs that can be decomposed by using only nonoverlapping sets. • Subfunctions are independent. • Overlapping sets.
Future Works • Bayesian Optimization Algorithm, Population Sizing, and Time to convergence • Hierachical Problem Solving by the Bayesian Optimization Algorithm • Genetic Algorithms, Clustering, and Breaking of Symmetry (PPSN 2000) • Bayesian Optimization Algorithm, Decision Graphs, and Occam’s Razor