Explicit Modelling in Metaheuristic Optimization. Dr Marcus Gallagher School of Information Technology and Electrical Engineering University of Queensland Q. 4072 marcusg@itee.uq.edu.au. Talk outline: Optimization, heuristics and metaheuristics.
Talk outline: • Optimization, heuristics and metaheuristics. • “Estimation of Distribution” (optimization) algorithms (EDAs): a brief overview. • A framework for describing EDAs. • Other modelling approaches in metaheuristics. • Summary Marcus Gallagher - MASCOS Symposium, 26/11/04
“Hard” Optimization Problems • Goal: Find • where S is often multi-dimensional; real-valued or binary • Many classes of optimization problems (and algorithms) exist. • When might it be worthwhile to consider metaheuristic or machine learning approaches? Marcus Gallagher - MASCOS Symposium, 26/11/04
Finding an “exact” solution is intractable. • Limited knowledge of f() • No derivative information. • May be discontinuous, noisy,… • Evaluating f() is expensive in terms of time or cost. • f() is known or suspected to contain nasty features • Many local minima, plateaus, ravines. • The search space is high-dimensional. Marcus Gallagher - MASCOS Symposium, 26/11/04
What is the “practical” goal of (global) optimization? • “There exists a goal (e.g. to find as small a value of f() as possible), there exist resources (e.g. some number of trials), and the problem is how to use these resources in an optimal way.” • A. Torn and A. Zilinskas, Global Optimisation. Springer-Verlag, 1989. Lecture Notes in Computer Science, Vol. 350. Marcus Gallagher - MASCOS Symposium, 26/11/04
Heuristics • Heuristic (or approximate) algorithms aim to find a good solution to a problem in a reasonable amount of computation time – but with no guarantee of “goodness” or “efficiency” (cf. exact or complete algorithms). • Broad classes of heuristics: • Constructive methods • Local search methods Marcus Gallagher - MASCOS Symposium, 26/11/04
Metaheuristics • Metaheuristics are (roughly) high-level strategies that combinine lower-level techniques for exploration and exploitation of the search space. • An overarching term to refer to algorithms including Evolutionary Algorithms, Simulated Annealing, Tabu Search, Ant Colony, Particle Swarm, Cross-Entropy,… • C. Blum and A. Roli. Metaheuristics in Combinatorial Optimization: Overview and Conceptual Comparison. ACM Computing Surveys, 35(3), 2003, pp. 268-308. Marcus Gallagher - MASCOS Symposium, 26/11/04
Learning/Modelling for Optimization • Most optimization algorithms make some (explicit or implicit) assumptions about the nature of f(). • Many algorithms vary their behaviour during execution (e.g. simulated annealing). • In some optimization algorithms the search is adaptive • Future search points evaluated depend on previous points searched (and/or their f() values, derivatives of f() etc). • Learning/modelling can be implicit (e.g, adapting the step-size in gradient descent, population in an EA). • …or explicit; examples from optimization literature: • Nelder-Mead simplex algorithm. • Response surfaces (metamodelling, surrogate function). Marcus Gallagher - MASCOS Symposium, 26/11/04
EDAs: Probabilistic Modelling for Optimization • Based on the use of (unsupervised) density estimators/generative statistical models. • Idea is to convert the optimization problem into a search over probability distributions. • P. Larranaga and J. A. Lozano (eds.). Estimation of Distribution Algorithms: a new tool for evolutionary computation. Kluwer Academic Publishers, 2002. • The probabilistic model is in some sense an explicit model of (currently) promising regions of the search space. Marcus Gallagher - MASCOS Symposium, 26/11/04
GAs and EDAs compared • GA pseudocode • Initialize the population, X(t); • Evaluate the objective function for each point; • Selection(); • Crossover(); • Mutation(); • Form new population X(t+1); • While !(terminate()) Goto 2; Marcus Gallagher - MASCOS Symposium, 26/11/04
GAs and EDAs compared • EDA pseudocode • Initialize a probability model, Q(x); • Create a population of points by sampling from Q(x); • Evaluate the objective function for each point; • Update Q(x) using selected population and f() values; • While !(terminate()) Goto 2; Marcus Gallagher - MASCOS Symposium, 26/11/04
EDA Example 1 • Population-based Incremental Learning (PBIL) • S. Baluja, R. Caruana. Removing the Genetics from the Standard Genetic Algorithm. ICML’95. p1 = Pr(x1=1) p2 = Pr(x2=1) pn = Pr(xn=1) Marcus Gallagher - MASCOS Symposium, 26/11/04
EDA Example 2 • Mutual Information Maximization for Input Clustering (MIMIC) • J. De Bonet, C. Isbell and P. Viola. MIMIC: Finding optima by estimating probability densities. Advances in Neural Information Processing Systems, vol.9, 1997. Marcus Gallagher - MASCOS Symposium, 26/11/04
EDA Example 3 • Combining Optimizers with Mutual Information Trees (COMIT) • S. Baluja and S. Davies. Using optimal dependency-trees for combinatorial optimization: learning the structure of the search space. Proc. ICML’97. • Uses a tree-structured graphical model • Model can be constructed in O(n2) time using a variant of the minimum spanning tree algorithm. • Model is optimal, given the restrictions, in the sense that the Kullback-Liebler divergence between the model and a full joint distribution is minimized. Marcus Gallagher - MASCOS Symposium, 26/11/04
EDA Example 4 • Bayesian Optimization Algorithm (BOA) • M. Pelikan, D. Goldberg and E. Cantu-Paz. BOA: The Bayesian optimization algorithm. In Proc. GECCO’99. • Bayesian network model where nodes can have at most k parents. • Greedy search over the Bayesian Dirichlet equivalence metric to find the network structure. Marcus Gallagher - MASCOS Symposium, 26/11/04
Further work on EDAs • EDAs have also been developed • For problems with continuous and mixed variables. • That use mixture models and kernel estimators - allowing for the modelling of multi-modal distributions. • …and more! Marcus Gallagher - MASCOS Symposium, 26/11/04
A framework to describe building and adapting a probabilistic model for optimization • See: • M. Gallagher and M. Frean. Population-Based Continuous Optimization, Probabilistic Modelling and Mean Shift. To appear, Evolutionary Computation, 2005. • Consider a continuous EDA with model • Consider a Boltzmann distribution over f(x) Marcus Gallagher - MASCOS Symposium, 26/11/04
As T→0, P(x) tends towards a set of impulse spikes over the global optima. • Now, we have a probability distribution that we know the form of, Q(x) and we would like to modify it to be close to P(x). KL divergence: • Let Q(x) be a Gaussian; try and minimize K via gradient descent with respect to the mean parameter of Q(x). Marcus Gallagher - MASCOS Symposium, 26/11/04
The gradient becomes • An approximation to the integral is to use a sample of x from Q(x) Marcus Gallagher - MASCOS Symposium, 26/11/04
The algorithm update rule is then • Similar ideas can be found in: • A. Berny. Statistical Machine Learning and Combinatorial Optimization. In L. Kallel et al. eds, Theoretical Aspects of Evolutionary Computation, pp. 287-306. Springer. 2001. • M. Toussaint. On the evolution of phenotypic exploration distributions. In C. Cotta et al. eds, Foundations of Genetic Algorithms (FOGA VII), pp. 169-182. Morgan Kaufmann. 2003. Marcus Gallagher - MASCOS Symposium, 26/11/04
Some insights • The derived update rule is closely related to those found in Evolution Strategies and a version of PBIL for continuous spaces. • It is possible to view these existing algorithms as approximately doing KL minimization. • The objective function appears explicitly in this update rule (no selection). Marcus Gallagher - MASCOS Symposium, 26/11/04
Other Research in Learning/Modelling for Optimization • J. A. Boyan and A. W. Moore. Learning Evaluation Functions to Improve Optimization by Local Search. Journal of Machine Learning Research 1:2, 2000. • B. Anderson, A. Moore and D. Cohn. A Nonparametric Approach to Noisy and Costly Optimization. International Conference on Machine Learning, 2000. • D. R. Jones. A Taxonomy of Global Optimization Methods Based on Response Surfaces. Journal of Global Optimization 21(4):345-383, 2001. • Reinforcement learning • R. J. Williams (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8:229-256. • V. V. Miagkikh and W. F. Punch III, An Approach to Solving Combinatorial Optimization Problems Using a Population of Reinforcement Learning Agents, Genetic and Evolutionary Computation Conf.(GECCO-99), p.1358-1365, 1999. Marcus Gallagher - MASCOS Symposium, 26/11/04
Summary • The field of metaheuristics (including Evolutionary Computation) has produced • A large variety of optimization algorithms • Demonstrated good performance on a range of real-world problems. • Metaheuristics are considerably more general: • can even be applied when there isn’t a “true” objective function (coevolution). • Can evolve non-numerical objects. Marcus Gallagher - MASCOS Symposium, 26/11/04
Summary • EDAs take an explicit modelling approach to optimization. • Existing statistical models and model-fitting algorithms can be employed. • Potential for solving challenging problems. • Model can be more easily visualized/interpreted than a dynamic population in a conventional EA. • Although the field is highly active, it is still relatively immature • Improve quality of experimental results. • Make sure research goals are well-defined. • Lots of preliminary ideas, but lack of comparative/followup research. • Difficult to keep up with the literature and see connections with other fields. Marcus Gallagher - MASCOS Symposium, 26/11/04
