Evolutionary Algorithms

Evolutionary Algorithms LIACS Natural Computing Group Leiden University

Overview • Introduction: Optimization and EAs • Genetic Algorithms • Evolution Strategies • Theory • Examples LIACS Natural Computing Group Leiden University

Background I Biology = Engineering (Daniel Dennett) LIACS Natural Computing Group Leiden University

Input: Known (measured) Output: Known (measured) Interrelation: Unknown Input: Will be given How is the result for the input? Model: Already exists Objective: Will be given How (with which parameter settings) to achieve this objective? Introduction • Modeling • Simulation • Optimization ! ! ! ??? ! ! ! ! ! ! ! ! ! ??? ??? ! ! ! ! ! ! LIACS Natural Computing Group Leiden University

Simulation vs. Optimization … what happens if? Simulator Result Trial & Error ... how do I achieve the best result? Optimizer Simulator Optimal Result Maximization / Minimization If so, multiple objectives LIACS Natural Computing Group Leiden University

Introduction:OptimizationEvolutionary Algorithms LIACS Natural Computing Group Leiden University

Optimization • f : objective function • High-dimensional • Non-linear, multimodal • Discontinuous, noisy, dynamic • M M1 M2... Mn heterogeneous • Restrictions possible over • Good local, robust optimum desired • Realistic landscapes are like that! Local, robust optimum Global Minimum LIACS Natural Computing Group Leiden University

Optimization Creating Innovation • Illustrative Example: Optimize Efficiency • Initial: • Evolution: • 32% Improvement in Efficiency ! LIACS Natural Computing Group Leiden University

Dynamic Optimization • Dynamic Function • 30-dimensional • 3D-Projection LIACS Natural Computing Group Leiden University

Classification of Optimization Algorithms • Direct optimization algorithm: Evolutionary Algorithms • First order optimization algorithm: e.g., gradient method • Second order optimization algorithm: e.g., Newton method LIACS Natural Computing Group Leiden University

New Point Actual Point Directional vector Step size (scalar) Iterative Optimization Methods General description: • At every Iteration: • Choose direction • Determine step size • Direction: • Gradient • Random • Step size: • 1-dim. optimization • Random • Self-adaptive LIACS Natural Computing Group Leiden University

The Fundamental Challenge • Global convergence with probability one: • General, but for practical purposes useless • Convergence velocity: • Local analysis only, specific (convex) functions LIACS Natural Computing Group Leiden University

Theoretical Statements • Global convergence (with probability 1): • General statement (holds for all functions) • Useless for practical situations: • Time plays a major role in practice • Not all objective functions are relevant in practice LIACS Natural Computing Group Leiden University

f(x1,x2) f(x*1,x*2) x2 (x*1,x*2) x1 An Infinite Number of Pathological Cases ! • NFL-Theorem: • All optimization algorithms perform equally well iff performance is averaged over all possible optimization problems. • Fortunately: We are not Interested in „all possible problems“ LIACS Natural Computing Group Leiden University

Theoretical Statements • Convergence velocity: • Very specific statements • Convex objective functions • Describes convergence in local optima • Very extensive analysis for Evolution Strategies LIACS Natural Computing Group Leiden University

Evolutionary AlgorithmPrinciples LIACS Natural Computing Group Leiden University

Model-Optimization-Action Model from Data Simulation Function Function(s) Subjective Experiment Evaluation Business Process Model Optimizer LIACS Natural Computing Group Leiden University

Evolutionary Algorithms Taxonomy Evolutionary Algorithms Evolution Strategies Genetic Algorithms Other • Mixed-integer capabilities • Emphasis on mutation • Self-adaptation • Small population sizes • Deterministic selection • Developed in Germany • Theory focused on convergence velocity • Discrete representations • Emphasis on crossover • Constant parameters • Larger population sizes • Probabilistic selection • Developed in USA • Theory focused on schema processing • Evolutionary Progr. • Differential Evol. • GP • PSO • EDA • Real-coded Gas • … LIACS Natural Computing Group Leiden University

Generalized Evolutionary Algorithm t := 0; initialize(P(t)); evaluate(P(t)); while not terminate do P‘(t) := mating_selection(P(t)); P‘‘(t) := variation(P‘(t)); evaluate(P‘‘(t)); P(t+1) := environmental_selection(P‘‘(t)  Q); t := t+1; od LIACS Natural Computing Group Leiden University

Genetic Algorithms LIACS Natural Computing Group Leiden University

Genetic Algorithms: Mutation • Mutation by bit inversion with probability pm. • pm identical for all bits. • pm small (e.g., pm = 1/n). 0 1 1 1 0 1 0 1 0 0 0 0 0 1 0 0 1 1 1 0 0 0 1 0 1 0 0 0 1 0 LIACS Natural Computing Group Leiden University

Genetic Algorithms: Crossover • Crossover applied with probability pc. • pc identical for all individuals. • k-point crossover: k points chosen randomly. • Example: 2-point crossover. LIACS Natural Computing Group Leiden University

Genetic Algorithms: Selection • Fitness proportional: • f fitness, l population size • Tournament selection: • Randomly select q << l individuals. • Copy best of these q into next generation. • Repeat l times. • q is the tournament size (often: q = 2). LIACS Natural Computing Group Leiden University

Some Theory LIACS Natural Computing Group Leiden University

Convergence Velocity Analysis • (1+1)-GA, (1,l)-GA, (1+l)-GA • For counting ones function • Convergence velocity: LIACS Natural Computing Group Leiden University

Convergence Velocity Analysis II • Optimum mutation rate ? • Absorption times from transition matrix in block form, using LIACS Natural Computing Group Leiden University

Convergence Velocity Analysis III • P too large: • Exponential • P too small: • Almost constant • Optimal: • O(l ln l) p LIACS Natural Computing Group Leiden University

Convergence Velocity Analysis IV • (1,l)-GA (kmin = -fa), (1+l)-GA (kmin = 0) : LIACS Natural Computing Group Leiden University

Convergence Velocity Analysis V • (1,l)-GA, (1+l)-GA (1,l)-ES, (1+l)-ES • Conclusion: Unifying, search-space independent theory ? LIACS Natural Computing Group Leiden University

Convergence Velocity Analysis VI • (m,l)-GA, (m+l)-GA • Theory • Experiment LIACS Natural Computing Group Leiden University

Evolution Strategies LIACS Natural Computing Group Leiden University

Evolution Strategy – Basics • Mostly real-valued search space IRn • also mixed-integer, discrete spaces • Emphasis on mutation • n-dimensional normal distribution • expectation zero • Different recombination operators • Deterministic selection • (m, l)-selection: Deterioration possible • (m+l)-selection: Only accepts improvements • l >> m, i.e.: Creation of offspring surplus • Self-adaptation of strategy parameters. LIACS Natural Computing Group Leiden University

Representation of search points • Simple ES with 1/5 success rule: • Exogenous adaptation of step size s • Mutation: N(0, s) • Self-adaptive ES with single step size: • One s controls mutation for all xi • Mutation: N(0, s) LIACS Natural Computing Group Leiden University

Representation of search points • Self-adaptive ES with individual step sizes: • One individual si per xi • Mutation: Ni(0, si) • Self-adaptive ES with correlated mutation: • Individual step sizes • One correlation angle per coordinate pair • Mutation according to covariance matrix: N(0, C) LIACS Natural Computing Group Leiden University

Evolution Strategy:AlgorithmsMutation LIACS Natural Computing Group Leiden University

Individual before mutation Individual after mutation 1.: Mutation of step sizes 2.: Mutation of objective variables Here the new s‘ is used! Operators: Mutation – one s • Self-adaptive ES with one step size: • One s controls mutation for all xi • Mutation: N(0, s) LIACS Natural Computing Group Leiden University

*H.-P. Schwefel: Evolution and Optimum Seeking, Wiley, NY, 1995. Operators: Mutation – one s • Thereby t0 is the so-called learning rate • Affects the speed of the s-Adaptation • t0 bigger: faster but more imprecise • t0 smaller: slower but more precise • How to choose t0? • According to recommendation of Schwefel*: LIACS Natural Computing Group Leiden University

Position of parents (here: 5) Contour lines of objective function Offspring of parent lies on the hyper sphere (for n > 10); Position is uniformly distributed Operators: Mutation – one s LIACS Natural Computing Group Leiden University

Pros and Cons: One s • Advantages: • Simple adaptation mechanism • Self-adaptation usually fast and precise • Disadvantages: • Bad adaptation in case of complicated contour lines • Bad adaptation in case of very differently scaled object variables • -100 < xi < 100 and e.g. -1 < xj < 1 LIACS Natural Computing Group Leiden University

Individual before Mutation Individual after Mutation 1.: Mutation of individual step sizes 2.: Mutation of object variables The new individual si‘ are used here! Operators: Mutation – individual si • Self-adaptive ES with individual step sizes: • One si per xi • Mutation: Ni(0, si) LIACS Natural Computing Group Leiden University

*H.-P. Schwefel: Evolution and Optimum Seeking, Wiley, NY, 1995. Operators: Mutation – individual si • t, t‘ are learning rates, again • t‘: Global learning rate • N(0,1): Only one realisation • t: local learning rate • Ni(0,1): n realisations • Suggested by Schwefel*: LIACS Natural Computing Group Leiden University

Position of parents (here: 5) Contour lines Offspring are located on the hyperellipsoid (für n > 10); position equally distributed. Operators: Mutation – individual si LIACS Natural Computing Group Leiden University

Pros and Cons: Individual si • Advantages: • Individual scaling of object variables • Increased global convergence reliability • Disadvantages: • Slower convergence due to increased learning effort • No rotation of coordinate system possible • Required for badly conditioned objective function LIACS Natural Computing Group Leiden University

Individual before mutation Individual after mutation 1.: Mutation of Individual step sizes 2.: Mutation of rotation angles 3.: Mutation of object variables New convariance matrix C‘ used here! Operators: Correlated Mutations • Self-adaptive ES with correlated mutations: • Individual step sizes • One rotation angle for each pair of coordinates • Mutation according to covariance matrix: N(0, C) LIACS Natural Computing Group Leiden University

Dx1 s2 a12 s1 Dx2 Operators: Correlated Mutations • Interpretation of rotation angles aij • Mapping onto convariances according to LIACS Natural Computing Group Leiden University

Operators: Correlated Mutation • t, t‘, b are again learning rates • t, t‘ as before • b = 0,0873 (corresponds to 5 degree) • Out of boundary correction: LIACS Natural Computing Group Leiden University

Position of parents (hier: 5) Contour lines Offspring is located on the Rotatable hyperellipsoid (for n > 10); position equally distributed. Correlated Mutations for ES LIACS Natural Computing Group Leiden University

Operators: Correlated Mutations • How to create ? • Multiplication of uncorrelated mutation vector with n(n-1)/2 rotational matrices • Generates only feasible (positiv definite) correlation matrices LIACS Natural Computing Group Leiden University

Operators: Correlated Mutations • Structure of rotation matrix LIACS Natural Computing Group Leiden University

Generation of the uncorrelated mutation vector Rotations Operators: Correlated Mutations • Implementation of correlated mutations nq := n(n-1)/2; for i:=1 to n do su[i] := s[i] * Ni(0,1); for k:=1 to n-1 do n1 := n-k; n2 := n; for i:=1 to k do d1 := su[n1]; d2:= su[n2]; su[n2] := d1*sin(a[nq])+ d2*cos(a[nq]); su[n1] := d1*cos(a[nq])- d2*sin(a[nq]); n2 := n2-1; nq := nq-1; od od LIACS Natural Computing Group Leiden University

Evolutionary Algorithms