Managing Director / CTO NuTech Solutions GmbH / Inc. Martin-Schmeißer-Weg 15 D – 44227 Dortmund

Problem Solving by Evolution: One of Nature’s UPPs Thomas Bäck UPP 2004 Le Mont Saint Michel, September 15, 2004 Full Professor for „Natural Computing“ Leiden Institute for Advanced Computer Science (LIACS) Niels Bohrweg 1 NL-2333 CA Leiden baeck@liacs.nl Tel.: +31 (0) 71 527 7108 Fax: +31 (0) 71 527 6985 Managing Director / CTO NuTech Solutions GmbH / Inc. Martin-Schmeißer-Weg 15 D – 44227 Dortmund baeck@nutechsolutions.de Tel.: +49 (0) 231 72 54 63-10 Fax: +49 (0) 231 72 54 63-29

Overview • Optimization and Evolutionary Computation • Genetic Algorithms, Evolution Strategies, Self-Adaptation • Convergence Velocity Theory • Applications: Some Examples • Applications: Programming of CA • Links to Bio- and Pharminformatics • Drug Design • Classification • Evolutionary DNA-Computing

Natural Computing • Computing Paradigms after Natural Models • NN, EC, Simulated Annealing, Swarm & Ant Algorithms, DNA Computing, Quantum Computing, CA, ... • Journals • Journal of Natural Computing (Kluwer). • Theoretical Computer Science C (Elsevier). • Book Series on Natural Computing (Springer). • Leiden Center of Natural Computing (NL).

Unifying Evolutionary Algorithm t := 0; initialize(P(t)); evaluate(P(t)); while notterminatedo P‘(t) := mating_selection(P(t)); P‘‘(t) := variation(P‘(t)); evaluate(P‘‘(t)); P(t+1) := environmental_selection(P‘‘(t) Q); t := t+1; od

Genetic Algorithms vs. Evolution Strategies Genetic Algorithm Evolution Strategies • Binary representation • Fixed mutation rate pm (= 1/n) • Fixed crossover rate pc • Probabilistic selection • Identical population size • No self-adaptation • Real-valued representation • Normally distributed mutations • Fixed recombination rate (= 1) • Deterministic selection • Creation of offspring surplus • Self-adaptation of strategy parameters: Variance(s), Covariances

0 1 1 1 0 1 0 1 0 0 0 0 0 1 0 Genetic Algorithms: Mutation 0 1 1 1 0 0 0 1 0 1 0 0 0 1 0 • Mutation by bit inversion with probability pm. • pm identical for all bits. • pm small (e.g., pm = 1/n).

Genetic Algorithms: Crossover • Crossover applied with probability pc. • pc identical for all individuals. • k-point crossover: k points chosen randomly. • Example: 2-point crossover.

Genetic Algorithms: Selection • Fitness proportional: • f fitness • l population size • Tournament selection: • Randomly select q << l individuals. • Copy best of these q into next generation. • Repeat l times. • q is the tournament size (often: q = 2).

Evolution Strategies: Mutation Creation of a new solution: s-adaptation by means of • 1/5-success rule. • Self-adaptation. More complex / powerful strategies: • Individual step sizes si. • Covariances. • Convergence speed: •  Ca. 10  n down to 5  n is possible.

Self-Adaptation: • Motivation: General search algorithm • Geometric convergence: Arbitrarily slow, if s wrongly controlled ! • No deterministic / adaptive scheme for arbitrary functions exists. • Self-adaptation: On-line evolution of strategy parameters. • Various schemes: • Schwefel one s, n s, covariances; Rechenberg MSA. • Ostermeier, Hansen: Derandomized, Covariance Matrix Adaptation. • EP variants (meta EP, Rmeta EP). • Bäck: Application to p in GAs. Step size Direction

Evolution Strategies: Self-Adaptation • Learning while searching: Intelligent Method. • Different algorithmic approaches, e.g: • Pure self-adaptation: • Mutational step size control MSC: • Derandomized step size adaptation • Covariance adaptation

Evolution Strategies: Self-Adaptive Mutation n = 2, ns = 1, na = 0 n = 2, ns = 2, na = 0 n = 2, ns = 2, na = 1

Self-Adaptation: Dynamic Sphere • Optimum s: • Transition time proportionate to n. • Optimum s learned by self-adaptation.

Evolution Strategies: Selection (m,l) (m+l)

Possible Selection Operators • (1+1)-strategy: one parent, one offspring. • (1,l)-strategies: one parent, l offspring. • Example: (1,10)-strategy. • Derandomized / self-adaptive / mutative step size control. • (m,l)-strategies:m>1 parents, l>m offspring • Example: (2,15)-strategy. • Includes recombination. • Can overcome local optima. • (m+l)-strategies: elitist strategies.

Vision: Self-adaptive software • Self-adaptation is the ability of an algorithm to iteratively make the solution of a problem more likely. • Software that monitors its performance, improves itself, learns while it interacts with its user(s). [Robertson, Shrobie, Laddaga, 2001] • Self-adaptation in ES: Evolution of solutions and solution search algorithms.

Robust vs. Fast Optimization: • Global convergence with probability one: General, but for practical purposes useless. • Convergence velocity: Local analysis only, specific functions only.

GA Convergence Velocity Analysis: • (1+1)-GA, (1,l)-GA, (1+l)-GA. • For counting ones function: • Convergence velocity: • Mutation rate p, q = 1 – p, kmax = l – fa.

Convergence Velocity Analysis: • Optimum mutation rate ? • Absorption times from transition matrix in block form, using where

Convergence Velocity Analysis: p • p too large: Exponential • p too small: Almost constant. • Optimal: O(l ln l) .

Convergence Velocity Analysis: • (1,l)-GA (kmin = -fa), (1+l)-GA (kmin = 0) :

Convergence Velocity Analysis: (1,l)-GA, (1+l)-GA: (1,l)-ES, (1+l)-ES: Conclusion: Unifying, search-space independent theory !?

Convergence Velocity Analysis: • (m,l)-GA (kmin = -fa), (m+l)-GA (kmin = 0) : • Theory • Experiment

Convergence Velocity for Bimodal Function: • A generalized Trap Function (u = number of ones):

Transition Probabilities for Bimodal Function: • Probability to mutate u1 ones into u2 ones: • Probability that one step of the algorithm changes parent (u1 -> u2):

Convergence Velocity for Bimodal Function: Convergence velocity: (1+1), z2=100, current position varies (5,20,...). (1+l), z2=100, position 20, lambda varies (1,2,...). (1+l), z2=100, position 35, lambda varies (1,2,...). Global max. Jump to local max.

Convergence Velocity for Bimodal Function: • New Algorithm: Several mutation rates. • Expands theory to all counting ones functions (including moving ones). • Optimal lower mutation rate: 1/l. • Currently further analyzed / tested on NP-complete problems.

Optimization Problem: • f: Objective function, can be • Multimodal, with many local optima • Discontinuous • Stochastically perturbed • High-dimensional • Varying over time. • can be heterogenous. • Constraints can be defined over

Optimization Algorithms: • Direct optimization algorithm: Evolutionary Algorithms • First order optimization algorithm: e.g, gradient method • Second order optimization algorithm: e.g., Newton method

Model from Data Simulation Function Function(s) Subjective Experiment Evaluation Business Process Model EA-Optimizer Applications: General Aspects

Overview of Examples • Dielectric filter design (40-dimensional). • Quality improvement by factor 2. • Car safety optimization (10-30 dim.) • 10% improvement. • Traffic control (elevators, planes, cars) • 3-10% improvement. • Telecommunication • Metal stamping • Nuclear reactors,...

Unconventional Programming ? • „Normal“ EA application: • EA as Programming Paradigm: EA Task EA Other Algorithm Task

UP of CAs (= Inverse Design of CAs) • 1D CAs: Earlier work by Mitchell et al., Koza, ... • Transition rule: Assigns each neighborhood configuration a new state. • One rule can be expressed by bits. • There are rules for a binary 1D CA. 1 0 0 0 0 1 1 0 1 0 1 0 1 0 0 Neighborhood (radius r = 2)

UP of CAs (rule encoding) • Assume r=1: Rule length is 8 bits • Corresponding neighborhoods 1 0 0 0 0 1 1 0 000 001 010 011 100 101 110 111

Inverse Design of CAs: 1D • Time evolution diagram:

Inverse Design of CAs: 1D • Majority problem: • Particle-based rules. • Fitness values: 0.76, 0.75, 0.76, 0.73

Inverse Design of CAs: 1D Block expanding rules Don‘t care about initial state rules Particle communication based rules

Inverse Design of CAs: 1D Majority Records • Gacs, Kurdyumov, Levin 1978 (hand-written): 81.6% • Davis 1995 (hand-written): 81.8% • Das 1995 (hand-written): 82.178% • David, Forrest, Koza 1996 (GP): 82.326%

Inverse Design of Cas: 2D • Generalization to 2D (nD) CAs ? • Von Neumann vs. Moore neighborhood (r = 1) • Generalization to r > 1 possible (straightforward) • Search space size for a GA: vs. 1 1 1 0 0 1 1 0 1 1 0 1 0 0

Inverse Design of CAs • Learning an AND rule. • Input boxes are defined. • Some evolution plots:

Inverse Design of CAs • Learning an XOR rule. • Input boxes are defined. • Some evolution plots:

Inverse Design of CAs • Learning the majority task. • 84/169 in a), 85/169 in b). • Fitness value: 0.715

Inverse Design of CAs • Learning pattern compression tasks.

Current Drug Targets: http://www.gpcr.org/ GPCR

Goals (in Cooperation with LACDR): • CI Methods: • Automatic knowledge extraction from biological databases. • Automatic optimisation of structures – evolution strategies. • Exploration for • Drug Discovery, • De novo Drug Design. Initialisation Final (optimized)

Class A amine dopamine trace amine peptide angiotensin chemokine CC other melanocortin viral (rhod)opsin vertebrate other unclassified Class B corticotropic releasing factor Clustering GPCRs: New Ways • SOM based on sequence homology, family clusters marked. • Overlay with phylogenetic (sub-)tree.

Evolutionary DNA-Computing (with IMB): • DNA-Molecule = Solution candidate ! • Potential Advantage: > 1012 candidate solutions in parallel. • Biological operators: • Cutting, Splicing. • Ligating. • Amplification. • Mutation. • Current approaches very limited. • Our approach: • Suitable NP-complete problem. • Modern technology. • Scalability (n > 30).

1 2 3 4 5 6 7 8 Evolutionary DNA-Computing: • Example: Maximum Clique Problem • Problem Instance: Graph • Feasible Solution:V‘ such that • Objective Function: Size |V‘| of clique V‘ • Optimal Solution: Clique V‘ that maximizes |V‘| . • Example: {2,3,6,7}: Maximum Clique (01100110) {4,5,8}: Clique. (00011001)

DNA-Computing: Classical Approach 1: X := randomly generate DNA strands representing all candidates; 2: Remove the set Y of all non-cliques from X: C = X – Y; 3: Identify with smallest length (largest clique); • Based on filtering out the optimal solution. • Fails for large n (exponential growth). • Applied in the lab for n=6 (Ouyang et el., 1997); limited to n=36 (nanomole operations).

DNA-Computing: Evolutionary Approach 1: Generate an initial random population P, ; 2: while not terminate do 3: P := amplify and mutate P; 4: Remove the set Y of all non-cliques from P: P := P - Y; 5: P‘ := select shortest DNA strands from P; 6: od • Based on evolving an (near-) optimal solution. • Also applicable for large n. • Currently tested in the lab (Leiden, IMB).

Managing Director / CTO NuTech Solutions GmbH / Inc. Martin-Schmeißer-Weg 15 D – 44227 Dortmund