550 likes | 687 Views
Self-Adaptation in Evolutionary Algorithms. The Seventh EU/MEeting on Adaptive, Self-Adaptive, and Multi-Level Metaheuristics. Overview. What is Self-Adaptation? The origins … Interbreeding … Self-adapting crossover Self-Adapting multiple operators Extensions to MAs Summary.
E N D
Self-Adaptation in Evolutionary Algorithms The Seventh EU/MEeting on Adaptive, Self-Adaptive, and Multi-Level Metaheuristics Jim Smith, University of the West of England
Overview • What is Self-Adaptation? • The origins … • Interbreeding … • Self-adapting crossover • Self-Adapting multiple operators • Extensions to MAs • Summary Jim Smith, University of the West of England
What is Self-Adaptation? Informally, and rather vaguely, • A property of Natural and Artificial Systems that allows them to control the way in which they adapt to changing environments Lets start with another question, Why should we be interested? Jim Smith, University of the West of England
Motivation An EA has many strategy parameters, e.g. • mutation operator and mutation rate • crossover operator and crossover rate • selection mechanism and selective pressure (e.g. tournament size) • population size Good parameter values facilitate good performance Q1: How can we find good parameter values ? Jim Smith, University of the West of England
Motivation 2 • EA parameters are static (constant during a run) BUT • an EA is a dynamic, adaptive process THUS • optimal parameter values may vary during a run Q2: How to vary parameter values? Jim Smith, University of the West of England
An example – Magic Squares Software by M. Herdy, TU Berlin Object is to arrange [1-100] for equal row and column sums Representation: permutation Fitness: sum of squared errors Mutation: • picks value at random, • then another in a window, • and swaps their position • window size controls mutation • Exit: click on TUBerlin logo Jim Smith, University of the West of England
Parameter control Parameter control: setting values on-line, during the actual run, e.g. • predetermined time-varying schedule p = p(t) • using feedback from the search process • encoding parameters in chromosomes and rely on natural selection Problems: • finding optimal p is hard, finding optimal p(t) is harder • still user-defined feedback mechanism, how to ``optimize"? • when would natural selection work for strategy parameters? Jim Smith, University of the West of England
Scope/level The parameter may take effect on different levels: • environment (fitness function) • population • individual • sub-individual Note 1: given component (parameter) determines possibilities Note 2: Self-Adaptation relies on the implicit action of selection acting on different strategies in the same population -> can’t have population level SA Jim Smith, University of the West of England
Global taxonomy Jim Smith, University of the West of England
Features of Self-Adaptation Automatic control of operators or their parameters • Each individual encodes for its own parameters The range of values present will depend on: • the action of selection on the combined genotypes • the action of genetic operators (e.g. mutation) on those encoded values Jim Smith, University of the West of England
Some implicit Assumptions There need to be links between: • encoded parameter (might be stochastic e.g. step sizes) and change in problem encoding • Hence CMA interest in derandomised mutations • Action of operator and change in fitness • Individual fitness and action of selection Jim Smith, University of the West of England
The Origins Various historical papers First major implementation was Self-adaptation of mutation step sizes in ES’s • Continuous variables mutated by adding noise from a N(0,C) distribution • Step-size should ideally be correlated with distance to optimum • (1+1) ES used Rechenberg’s 1:5 success rule, but move to λ>1 offspring enabled other methods Jim Smith, University of the West of England
ES typical representation Chromosomes consist of three parts: • Object variables: x1,…,xn • Strategy parameters: • Mutation step sizes: 1,…,n • Rotation angles: 1,…, n • Not every component is always present • Full size: x1,…,xn,1,…,n,1,…, k where k = n(n-1)/2 (no. of i,j pairs) Jim Smith, University of the West of England
Mutation Main mechanism: • changing value by adding random noise drawn from normal distribution • x’i = xi + N(0,) Key idea: • is part of the chromosome x1,…,xn, • is also mutated into ’ Thus: mutation step size coevolves with the solution x Jim Smith, University of the West of England
Mutate first Net mutation effect: x, x’, ’ Order is important: • first ’ then x x’ = x + N(0,’) Fitness of x’ ,’ provides two lots of information • Primary: x’ is good if f(x’) is good • Secondary: ’ is good if x’ is Reversing mutation order would not work Jim Smith, University of the West of England
Uncorrelated mutation: one • Chromosomes: x1,…,xn, • ’ = • exp( • N(0,1)) • x’i = xi + ’• N(0,1) • Typically the “learning rate” 1/ n½ • And we have a boundary rule ’ < 0 ’ = 0 Jim Smith, University of the West of England
Mutants with equal likelihood Circle: mutants having the same chance to be created Jim Smith, University of the West of England
Uncorrelated mutation: n ’s • Chromosomes: x1,…,xn, 1,…, n • ’i = i• exp(’ • N(0,1) + • Ni (0,1)) • x’i = xi + ’i• Ni (0,1) • Two learning rate parameters: • ’ overall learning rate • coordinate wise learning rate • 1/(2 n)½ and 1/(2 n½) ½ • And i’ < 0 i’ = 0 Jim Smith, University of the West of England
Mutants with equal likelihood Ellipse: mutants having the same chance to be created Jim Smith, University of the West of England
Correlated mutations • Chromosomes: x1,…,xn, 1,…, n ,1,…, k where k = n • (n-1)/2 • Covariance matrix C is defined as: • cii = i2 • cij = 0 if i and j are not correlated • cij = ½•(i2 - j2 ) •tan(2 ij) if i and j are correlated • Note the numbering / indices of the ‘s Jim Smith, University of the West of England
Correlated mutations cont’d The mutation mechanism is then: • ’i = i• exp(’ • N(0,1) + • Ni (0,1)) • ’j = j + • N (0,1) • x ’ = x + N(0,C’) • x stands for the vector x1,…,xn • C’ is the covariance matrix C after mutation of the s • 1/(2 n)½ and 1/(2 n½) ½ and 5° • i’ < 0 i’ = 0 and • | ’j | > ’j =’j - 2 sign(’j) Jim Smith, University of the West of England
Mutants with equal likelihood Ellipse: mutants having the same chance to be created Jim Smith, University of the West of England
Self-adaptation illustrated • Given a dynamically changing fitness landscape (optimum location shifted every 200 generations) • Self-adaptive ES is able to • follow the optimum and • adjust the mutation step size after every shift ! Jim Smith, University of the West of England
Self-adaptation illustrated Changes in the fitness values (left) and the mutation step sizes (right) Jim Smith, University of the West of England
Prerequisites for self-adaptation • > 1 to carry different strategies • > to generate offspring surplus • Not “too” strong selection, e.g., 7 • • (,)-selection to get rid of misadapted ‘s • Mixing strategy parameters by (intermediary) recombination on them Jim Smith, University of the West of England
Similar Schemes • Fogel introduced similar methods into “Meta-EP” • Some authors use Cauchy rather than Gaussian distributions • Fast EP creates off spring using both • Closely related is Hansen and Ostermaier’s Covariance Matrix Adaptation ES • seeks to remove some of the stochasticity between step sizes and actual moves made • Uses cumulation of steps rather than single step Jim Smith, University of the West of England
Do we understand what is happening? • Continuous search spaces relatively amenable to theoretical analysis • Most search spaces can be locally modelled by simple functions (sphere, ridge etc.) • Lots of theoretical results focussing on the progress rate • Ongoing research esp. by Schwefel and Beyer’s group at Dortmund/Voralberg Jim Smith, University of the West of England
Interbreeding: ES meets GAsBäck (1991) Mutation rate encoded as binary string • Decode to [0,1] to give mutation rate P(m) • Bits encoding for mutation changed at P(m) • Decode again: P(m)’ for problem encoding 1 mutation rate vs. 1 per problem variable Fitness-proportionate vs. truncation selection Results were promising • Needed high selection pressure • Multiple rates better for complex problems Jim Smith, University of the West of England
Extensions of evolving P(m) Smith (1996) – Steady State GAs • Significantly outperformed fixed mutation rates • Also needed high selection pressure: • Create λ copies of offspring • Select best for inclusion if better than member being replaced • Gray coding of mutation rates preferable Hinterding et al 1996 – used continuous variable with log-normal adaptation Glickman and Sycara (2001) found problem with premature convergence on some problems Jim Smith, University of the West of England
Some theoretical Results Bäck (1993) • showed self-adapted mutation rates were close to theoretical optimum for OneMax Stephens, Olmedo, Vargas, and Waelbroeck (1998) • Expanded on concept of neutrality in mapping • showed optimal mutation rate is not only problem but population dependant • adaptation of mutation rates can arise from asymmetry in the genotype to phenotype redundancy Jim Smith, University of the West of England
A simpler model: Smith (2001) Analytic dynamic systemsapproach • Set of n fixed mutation rates • When changing mutation rate, pick another one uniformly at random • Very different to log-normal, or even bit-flipping • Accurately predicted behaviour of real algorithm, • Especially around transitions on dynamic function Jim Smith, University of the West of England
Self-Adaptive vs. Static mutation predicted mean fitness vs. time 50 0.0005 Adaptive 0.001 45 0.0025 0.005 40 Mean Fitness 0.0075 0.01 35 0.025 30 0.05 0.075 0.1 25 0 100 200 300 400 500 Generations Jim Smith, University of the West of England
P(m) = 0.0005 0.025 50 0.05 0.075 0.1 40 model 0.0005 model 0.025 model 0.05 30 Proportion of population in Mutation class model 0.075 20 10 0 1000 1010 1020 1030 1040 1050 Generations Evolution of Mutation ratesfollowing a transition model 0.1 Jim Smith, University of the West of England
Markov Chain model: Smith (2002) Extended model to include binary coded rates Two types of control modelled: • External “Innovation Rate” • Internal – mutation acts on its own encoding Results showed “Internal” Control will often get stuck on local optimum • Premature convergence of mutation rates • observed by several others Refined Prerequisite: Need for diversity of strategies Jim Smith, University of the West of England
But does it work in practice? • Experimental results: Smith & Stone (2002) • Simple model has higher success rates than log-normal adaptation • Much better at escaping from local optima • Explanation in terms of greater diversity of mutation rates Jim Smith, University of the West of England
Self-adapting Recombination: Schaffer and Morishima (1987) Self-adapt the form of crossover by encoding “punctuation marks” between loci Recombination: • start copying from parent 1 until you get to a punctuation mark, • then switch to copying from parent 2 … Two ways of adapting punctuation marks: • They are subject to mutation • The way they are inherited during crossover • Works ok, but it is possible for one child to inherit all the crossover points Jim Smith, University of the West of England
Self-adapting Recombination Spears (1995) Self-adapt the choice of predefined crossover • Extra bit on genome for uniform/1X • Individual level model: • Use crossover encoded by pairs of parents • Population level model • Measure proportion of parents encoding for 1X • Use this as probability of using 1x for any pair • Statistically the same ( at a coarse level) Better results with individual level adaptation Jim Smith, University of the West of England
Linkage Evolving Genetic Operator: Smith (1996,1998,2002) Model gene linkage as boolean array: A(i,j)=1 => i and j inherited together • Recombination R : R(A) ->A’, • different operators specified by random vector x The LEGO algorithm : • generalisation of Schaffer and Morisihma • encode A in genome and self-adapt it • can create any common crossover • So far only done for adjacent linkage Jim Smith, University of the West of England
LEGO process Jim Smith, University of the West of England
Some results: Series of experiments comparing • Fixed crossover strategies • Population level LEGO ( 3 variants) • Component level LEGO Results show • LEGO outperforms 1X, UX on most problems Getting the scope right is vital Jim Smith, University of the West of England
Results on concatenated traps Jim Smith, University of the West of England
Some counter examples Tuson and Ross compared adaptive vs. self-adaptive P(c) • Adaptive (using COBRA heuristic) was better One possible reason (several authors) • Self-adaptation rewards better strategies • because they produce better offspring • BUT if parents are similar many different crossover masks will produce the same offspring No selective pressure: no evolution no self-adaptation Jim Smith, University of the West of England
Crossover = Self-Adaptive mutation? Deb and Beyer (2001) ,followed by others, proposed: • Effects of operators transmission function • Selection takes care of changes in mean fitness • Variation operators should increase diversity So Self-adaptation should have the properties that • Children are more likely to be created close to parents • Mean population fitness is unchanged • Variance in population fitness should increase exponentially with time on flat landscapes Jim Smith, University of the West of England
Crossover in continuous spaces: Implicit Self-Adaptation They (and others) showed that: • for continuous variables, and other cases • (Crossover can produce new values) • appropriately defined crossover operators e.g. SBX, and others • could demonstrate these properties • “implicit self-adaptation” Jim Smith, University of the West of England
Combinations of operators Smith (1996, 1998) • Self-Adaptive P(m) rate for each block in LEGO • results better than either in their own Hinterding et al. (1996) • added population size adaptation heuristic and self-adaptation of crossover to mutation Eiben, Back et al. (1998, ongoing) • Tested lots of combinations of adaptive operators • Concluded that population resizing mechanism from Arabas’ was most important factor Jim Smith, University of the West of England
Population sizing: Arabas (1995) • Each population member is given a lifespan when it is created and has an age • Lifespan is a function of fitness relative to population mean and range • Steady state model • Each time a member is created, all ages incremented • Member deleted if age >= lifespan is this really self-adaptation? Jim Smith, University of the West of England
But why stop there ….. So far we have looked at the standard EA operators: • mutation rates / step sizes, • Recombination operator probabilities or definition • survivor selection (debatable) But we know that EA +Local Search often gives good results ( Memetic Algorithms) • How do we choose which one to use? • Is it best to choose from a fixed set? Jim Smith, University of the West of England
Multi-memetic algorithms Krasnogor and Smith (2001) • Extra gene encodes choice of LS operator • different moves and depths • With small probability changed choice • Compared with MAs using fixed strategy • Results showed that: • “Best” fixed memes changed as search progressed • the multi-meme tracked performance of current best fixed meme • Final results better over range of TSP Jim Smith, University of the West of England
Grammar for memes:Krasnogor and Gustaffson (2004) Specified grammar for memes describing • What local search method used • When in Ea cycle it is used • Etc. Proposed that memes could be self-adapted as words in this grammar some promising initial results on bio-informatics problems Jim Smith, University of the West of England
Self-adapting MemesSmith (2002, 2003,2005, 2007) CO-evolution of Memetic Algorithms (COMA) • General framework for coevolution • populations of memes and genes • Memes encoded as tuples <depth, pivot, pairing, condition, action> • Condition/action : patterns to match • If pairing = linked • Memes are inherited, recombined, mutated with genes • System is effectively self-adapting local search Jim Smith, University of the West of England