Optimization by Model Fitting

Optimization by Model Fitting Chapter 9 Luke, Essentials of Metaheuristics, 2011 Byung-Hyun Ha R1

Outline • Introduction • Model fitting by classification • Model fitting with distribution • Summary

Introduction • Exploring and/or exploiting solution space • Construction or composition • Tweak or mutation • Recombination or crossover • .. other ways? • In perspective of statistics • Population and sampling • e.g., a set of all students, a sample of students for examining their height • Tweaking in search (metaheuristics) • Sampling space of candidate solutions to select high-quality ones • An alternative to selecting and Tweaking by (statistical) model • Classification model • Graduate students ;-), decision trees, neural networks, … • Probability distribution

Introduction • Example: T-problem with 5 jobs • Training by sampling 15 solutions from population of 120 ones, • and question: what is the quality of 2-5-3-1-0? • How? • By classification or using probability distribution solution space as population sampling 0-2-3-4-1(23) 4-1-0-3-2(15) 1-2-3-4-0(12) 0-3-1-2-4(19) 1-4-2-3-0(11) 1-2-4-3-0(11) 1-3-4-0-2(15) 2-1-4-3-0(10) 0-3-2-4-1(24) 1-3-2-0-4(16) 3-4-2-1-0(15) 0-1-3-2-4(19) 2-0-3-1-4(16) 2-4-3-0-1(16) 3-1-0-2-4(19) 4-3-2-1-0(15) 0-4-2-3-1(20) 3-4-0-1-2(15) 4-1-3-0-2(15) 3-1-4-2-0(12) 4-1-3-0-2(15) 1-0-2-3-4(18) 0-4-3-1-2(15) 1-0-2-4-3(17) 3-4-2-1-0(15) 4-2-0-1-3(16) 1-2-3-4-0(12) 4-1-2-3-0(11) 0-4-2-1-3(19) 1-2-4-3-0(11) 0-1-4-3-2(15) 3-2-4-1-0(15) 4-2-3-0-1(17) 0-4-3-2-1(21) 3-1-2-4-0(13) 1-4-2-3-0(11) 0-3-1-2-4(19) 4-1-3-0-2(15) 3-0-1-2-4(19) 2-4-3-1-0(13) 2-3-4-0-1(17) 0-3-4-2-1(21) 0-2-4-3-1(22) 0-4-1-2-3(14) 4-3-2-0-1(18) 3-4-2-0-1(18) 1-4-2-0-3(11) 4-0-2-1-3(19) 0-1-2-3-4(18) 4-3-2-1-0(15) ... 0-1-3-2-4(19) 2-4-3-0-1(16) 2-4-3-1-0(13) 1-0-2-3-4(18) 1-0-2-4-3(17) 3-4-1-0-2(15) 0-3-4-2-1(21) 0-4-1-2-3(14) 2-1-3-0-4(14) 4-1-0-3-2(15) 0-3-1-2-4(19) 2-0-4-1-3(18) 3-2-4-1-0(15) 0-4-3-2-1(21) 4-0-3-1-2(15) a sample as representatives of population something we can do?

Model Fitting by Classification • Classification problem • Given a collection of records, to find a model for class attribute as a function of the values of other attributes • Fitting a model, or • model induction, machine learning Is 2-5-3-1-0 a good solution? query or test 0-2-3-4-1(23) 4-1-0-3-2(15) 1-2-3-4-0(12) 1-2-4-3-0(11) 1-3-4-0-2(15) 2-1-4-3-0(10) 3-4-2-1-0(15) 0-1-3-2-4(19) 2-0-3-1-4(16) 4-3-2-1-0(15) 0-4-2-3-1(20) 3-4-0-1-2(15) a classification model induction training set generation Give me a good solution!

Model Fitting by Classification • Examples of classification algorithms • Graduate students by naggings of professors ;-) • Decision trees by C4.5 and ID3 • c.f., http://www-users.cs.umn.edu/~kumar/dmbook/ch4.pdf • k-nearest-neighbor (kNN) by kNN algorithm • Neural networks by backpropagation algorithm a decision tree for classification (or prediction) records (training set)

Model Fitting by Classification • Classification problem (revisited) • Given a collection of records, to find a model for class attribute as a function of the values of other attributes • Application of classification to search • Given a collection of solutions, to find a model for fitness as a function of the values of components of solutions • Generating children from the model • Rejection sampling with discriminative models • Algorithm 115 and 117 • Region-based sampling with generative models • Algorithm 116 • Learnable Evolution Model (LEM) • Algorithm 114 Is 2-5-3-1-0 a good solution? rejection sampling discriminative model a classification model generative model region-based sampling Give me a good solution!

y 0.7 x x 0.3 0.5 0.7 y y 0.6 0.6 0.4 0.4 x 0.7 0.3 0.5 0.7 Model Fitting by Classification • Examples • Inducing a decision tree • Generating children from a decision tree y good bad 1.0 good bad bad good bad x 0.0 1.0 bad good

Model Fitting by Classification • Example (cont’d) • A model that specifies the probability y y good bad 0.7 1.0 x x 0.3 0.5 0.7 y y 0.80 0.25 0.6 0.6 0.4 0.4 0.00 0.75 0.14 x 0.7 x 0.0 0.3 0.5 0.7 1.0 0.00 1.00

Model Fitting by Classification • Example (Talbi, 2009) • Application of rule-based classifier into crossover operator • Rules • If X4 = 5 and X5 < 2, then class = best • ... • Patterns matching the rules •    5  1   • ... • Possible crossover? • 2 1 7 2 1 3 4 3 2 1 7 5 8 1 7 4 • 3 2 5 7 8 0 7 4 3 2 5 5 1 1 4 3

Model Fitting with a Distribution • An alternative form of model • A distribution of an infinite-sized population • A set of candidate solutions: a sample from population • Working with sample distribution • Estimation of Distribution Algorithm • Representing distribution of infinite population with a number of samples • Loop: sampling a set of individuals  assessing them  adjust the distribution to reflect the new fitness results • Algorithm 118: An Abstract Estimation of Distribution Algorithm (EDA)

Model Fitting with a Distribution • Representing distributions for genotype with n genes • Using n-dimensional histogram • A fairly high-resolution grid to accurately represent the distribution • c.f., kd-tree or quadtree • A fairly high amount of grid points • an when distribution of each gene is discretized into a pieces • Using parametric distribution • e.g., m number of gaussian curves • How many gaussian curves? • n-dimensional gaussian: mean vector of size n and a covariance matrix of size n2 • 1,000 genes? 1,000,000 numbers

Model Fitting with a Distribution • Representing distributions (cont’d) • Using marginal distributions • Projecting full distribution into a single dimension for each gene • Representing single distribution, again • 1-dimensional array as a histogram • 1-dimensional gaussians as a parametric representation • Size of representation? • Problems (very big)?

Model Fitting with a Distribution • Univariate Estimation of Distribution Algorithms • Population-Based Incremental Learning (PBIL) • Genes having finite discrete values • n marginal distributions with n genes, initially uniform • Representation? • Truncation selection of good solutions sampled using distribution • Gradual marginal distribution update • Algorithm 119: Population-Based Incremental Learning • Univariate Marginal Distribution Algorithm (UMDA) • A variation on PBIL • Any selection procedure, allowed • Entirely replacing distribution D each time around ( = 1) • Large sample, required (why?) • Compact Genetic Algorithm (cGA) • Genes having boolean values • Updating each marginal distribution by pairwise comparison of individuals • c.f., Modeling finite population instead of infinite one • Algorithm 120: The Compact Genetic Algorithm

Model Fitting with a Distribution • Univariate Estimation of Distribution Algorithms (cont’d) • Real-valued representations • By discretization of each marginal distribution • Histogram approach • Using PBIL directly • By parametric approach • e.g., using single gaussian • Unbiased estimators of mean and variance for parameter estimation • Updating each marginal distribution by linear combination • Using multiple distributions • c.f., Expectation Maximization (EM) algorithm

Model Fitting with a Distribution • Multivariate Estimation of Distribution Algorithms • Problems in univariate estimation (using marginal distributions) • Assumption of no linkage between genes • c.f., cooperative coevolution • An alternative • Using bivariate distributions • One distribution for every pair of genes • Using triple genes per distribution, using quadruple … • A better way • Multivariate distribution for strongly-linked genes, selectively • e.g., Bayes Network • c.f., not only about how good, but also about why it is good • (Hierarchical) Bayesian Optimization Algorithm • Algorithm 121: An Abstract Version of the Bayesian Optimization Algorithm (BOA)

Hybrid Metaheuristics (Talbi, 2009) • Combining with X • Mathematical programming approaches • Enumeration algorithms • Relaxation and decomposition methods • Branch and cut and price algorithms • Constraint programming • Data mining techniques • Multiobjective optimization • Classical hybrid approaches • Low-level relay hybrids • Low-level teamwork hybrids • High-level relay hybrids • High-level teamwork hybrids

Summary • Exploring and/or exploiting solution space • In perspective of statistics • Model fitting by classification • Employing decision trees, kNN, neural networks • Generating children from the model • Model fitting with a distribution • Estimation of Distribution Algorithm • Representing distributions • n-dimensional histogram, parametric distributions, marginal distributions • Univariate Estimation of Distribution Algorithms • Problems • Multivariate Estimation of Distribution Algorithms • Bayes Network

Optimization by Model Fitting