320 likes | 445 Views
Introduction to Evolutionary Algorithms Lecture 2. Jim Smith University of the West of England, UK May/June 2012. Overview. Recap of EC metaphor Recap of basic behaviour Role of fitness function Dealing with constraints Representation as key to problem solving Integer Representations
E N D
Introduction to Evolutionary AlgorithmsLecture 2 Jim Smith University of the West of England, UK May/June 2012
Overview • Recap of EC metaphor • Recap of basic behaviour • Role of fitness function • Dealing with constraints • Representation as key to problem solving • Integer Representations • Permutation Representations • Continuous Representations • Tree-based Representations
Recap of EC metaphor • A population of individuals exists in an environment with limited resources • Competition for those resources causes selection of those fitter individuals that are better adapted to the environment • These individuals act as seeds for the generation of new individuals through recombination and mutation • The new individuals have their fitness evaluated and compete (possibly also with parents) for survival. • Over time Natural selection causes a rise in the fitness of the population
Typical behaviour of an EA Phases in optimising on a 1-dimensional fitness landscape Early phase: quasi-random population distribution Mid-phase: population arranged around/on hills Late phase: population concentrated on high hills
Best fitness in population Time (number of generations) Typical run: progression of fitness Typical run of an EA shows so-called “anytime behavior”
Progress in 2nd half Best fitness in population Progress in 1st half Time (number of generations) Are long runs beneficial? • Answer: • - how much do you want the last bit of progress? • - it may be better to do more shorter runs
Evolutionary Algorithms in Context • There are many views on the use of EAs as robust problem solving tools. • For most problems a problem-specific tool may: • perform better than a generic search algorithm on most instances, • have limited utility, • not do well on all instances • Goal is to provide robust tools that provide: • evenly good performance • over a range of problems and instances
What are the different types of EAs • Historically different flavours of EAs have been associated with different representations • Binary strings : Genetic Algorithms • Real-valued vectors : Evolution Strategies • Finite state Machines: Evolutionary Programming • LISP trees: Genetic Programming • These differences are largely irrelevant, best strategy • choose representation to suit problem • choose variation operators to suit representation • Selection operators only use fitness and so are independent of representation
Role of fitness function • The more fitness levels you have available, the more information is potentially available to guide search • EAs can cope with fitness functions that are: • Noisy, • Time dependant, • Discontinuous • and have Multiple optima,
Problems with Constraints • “Constrained Optimisation Problems” • Some problems inherently have constraints as well a fitness functions • Can incorporate into fitness functions (indirect) • Can also incorporate into representation (direct) • Constraint Satisfaction Problems • Seek solution which meets set of constraints • Transform to COP by minimising constraints (indirect method), • Might be able to use good representations (direct)
S F X s n U Feasible & Unfeasible Regions • Space will be split into two disjoint sets of spaces: • F (the feasible regions) –may be connected • U (the unfeasible regions).
Methods for constraint handling Direct Indirect
Example: N Queens • Place N queens on a chess board so they cannot take each other • 64*63*62*61*60*59*58*56 solutions for N=8 • =64!/ (56! * 8!) • = 4.4 * 109
Designing an EA • Fitness function: N – num_vulnerable_queens • Transforms CSP to COP • Population and Selection? • Whatever we like, e.g: • Population size100, • tournament selection of 2 parents • Replace two least fit from population if better • Representation?
Possible Representations • Method 1: Based on the board • 64-bit Binary string: 0/1 empty /occupied • 264 possibilities –more than problem! • Introduces extra constraint that only 8 cells occupied • Repair function or specialised operators? • Or fractional penalty function • Based on the pieces • More natural and problem focussed • Avoids extra constraint
Operators for binaryrepresentations Binary • Recombination • One point, N-point • Uniform • Randomly choose parent 1 or 2 for each gene • Mutation • Independent flip 0 1 for each gene
Integer Representations • Label cells 1-64 • Method 2: one gene per piece encodes cell • 64N = for 1.7 x 1014 forN=8 (potential duplicates) • 1pt crossover, extended random mutation • Ok, but huge space with only 9 fitness levels • Could make think about constraints • Rows, columns, diagonals • Indirect – penalise all • Direct – can we avoid some?
Integer representations • Some problems naturally have integer variables, e.g. image processing parameters • Others take categorical values from a fixed set e.g. {blue,green,yellow, pink} • N-point / uniform crossover operators work • Extend bit-flipping mutation to make • “creep” i.e. more likely to move to similar value • Random choice (esp. categorical variables) • For ordinal problems, it is hard to know correct range for creep, so often use two mutation operators in tandem
Partially direct representations • Method 3 • Row constraint <=> each queen on different row • Let value off gene I = column of queen in row I • Solution space size 8N = 1.67x107 • One point crossover, extended randomised mutation • Method 4 • As above but also meet column constraints • Permutation: N! = 40320 • Now need specialised crossover and mutation
Permutation Representations • Ordering/sequencing problems form a special type. • Solution= arrangement objects in a certain order. • Example: sort algorithm: important thing is which elements occur before others (order), • Example: Travelling Salesman Problem (TSP) : important thing is which elements occur next to each other (adjacency), • These problems are generally expressed as a permutation: • if there are n variables then the representation is as a list of n integers, each of which occurs exactly once
Variation operators for permutations • Normal mutation operators don’t work: • e.g. bit-wise mutation : let gene i have value j • changing to some other value k would mean that k occurred twice and j no longer occurred • Therefore must change at least two values • Various mechanisms exist (swap, invert, ...). • Similar arguments mean specialised crossovers are needed.
1 2 3 4 5 1 2 3 2 1 5 4 3 2 1 5 4 3 4 5 Crossover operators for permutations • “Normal” crossover operators will often lead to inadmissible solutions • Many specialised operators have been devised which focus on combining order or adjacency information from the two parents
Another Example of Binary Encoding:Feature Selection for Machine Learning • Many successful Machine Learning / Data Mining algorithms use greedy search: • Decision trees add most informative nodes • Rule Induction: add most useful next rule • Bayesian networks: to identify co-related features • Distance-based methods measure difference along each axis • All these can be improved by using global search in the feature selection process • Use a binary coded GA: 0/1 : use/don’t use feature • M. Tahir and J.E. Smith. Creating Diverse Nearest Neighbour Ensembles using Simultaneous Metaheuristic Feature Selection. 2010. Pattern Recognition Letters, 31(11):1470--1480. • Smith, M. & Bull, L. (2005) Genetic Programming with a Genetic Algorithm for Feature Construction and Selection. Genetic Programming and Evolvable Machines 6(3): 265-281.
Example of Integer Encoding • Protein Structure Prediction: • Proteins are created as strings of amino acid “residues” • Behaviour of a protein is determined by its 3-D structure • Proteins naturally “fold” to lowest energy structure • Model as a fixed-length path through a 3D grid • Representation: sequence of “up/down/L/R/forward” to specify a path • Fitness based on pairwise interactions between residues • N. Krasnogor and W. Hart and J.E. Smith and D. Pelta . Protein Structure Prediction With Evolutionary Algorithms. Proc. GECCO 1999 , pages 1596--1601. Morgan Kaufmann. • R. Santana, P. Larrañaga, and J. A. Lozano. Protein folding in simplified models with estimation of distribution algorithms. IEEE Transactions on Evolutionary Computation. Vol. 12. No. 4. Pp. 418-438.
Example of 2D HP model Dark boxes represent hydrophobic residues (H): H-H contacts add +1 to fitness
Example 2: Microprocessor Design Verification • Need to “drive” processor into a variety of states • to make sure it does the right thing in each. • Test = sequence of assembly code instructions • Traditional methods generate millions of random tests, weren’t reaching all states • UWE solution: evolve sequences of tests • Integer encoding (fixed number of instructions) • Specialisedmutation: group instruction in classes, • more likely to move to similar type of instruction • J.E. Smith and M. Bartley and T.C. Fogarty. Microprocessor Design Verification by Two-Phase Evolution of Variable Length Tests.Proc.1997 IEEE Conference on Evolutionary Computation, pages 453--458. IEEE Press
Summary • Fitness function should provide as much information as possible • Could penalise infeasible solutions • Selection / Population management is independent of representation • Representation should suit the problem • Can take constraints into account (direct) • Recombination/Mutation defined by representation • Could be problem specific (direct constraint handling)