410 likes | 675 Views
BCTCS 2005. Geometric Interpretation of Crossover. Alberto Moraglio amoragn@essex.ac.uk. Contents. I – Quick Preliminaries II – Geometric Interpretation of Crossover Extremely quick overview of its implications: III – Unification of Major Representations IV – Crossover Principled Design
E N D
BCTCS 2005 Geometric Interpretation of Crossover Alberto Moraglio amoragn@essex.ac.uk
Contents I – Quick Preliminaries II – Geometric Interpretation of Crossover Extremely quick overview of its implications: III – Unification of Major Representations IV – Crossover Principled Design V – Is Biological Recombination Geometric? VI – Unity of Evolutionary Search
Evolutionary Algorithms… • Arefunction optimizers • Mimic biologicalevolution • Arerobust, hence preferred for real world problems • Havelittle theoryto explain how and why they work • There arevarious flavours
Evolutionary Algorithm Template Problem & representation independent
Standard representations & EAs flavours/dialects • Binary strings (genetic algorithms, the classic) • Real code vectors (evolution strategies, continuous optimization) • Permutations (order-based GAs, combinatorial optimization) • Parse trees (genetic programming, evolution of computer programs) Algorithmically irrelevant differences: name/authorship/solution interpretation/domain of application Algorithmicallyrelevant differences:solution representation/genetic operators
100000011101000 100110011101000 100111100011100 100001100011100 Is there any Crossover common aspect ? Is it possible to give a representation- independent definition of crossover and mutation? What is crossover?
Mutation & Crossover for binary strings • Mutation = bit flip at random position 101001 101101 • Crossover = selection crossover point at random swap tails 1010|01 101000 1110|00 111001 1*10|0* 1*100* • All offspring match the parent schema
Genetic operators & Neighbourhood structure • Forget the representation andconsiderthe neighbourhood structure (= search space structure) • Mutation: offspring are “close to” their parent in the direct neighbourhood
Direct Neighbour Mutation Representation: Binary String Move: Bit Flip Neighbourhood: Hamming Representation + Move = Neighbourhood 100 101 000 001 110 111 ? 010 011 Mutation: Offspring in the direct neighbourhood What is crossover?
Neighbourhood and Crossover Crossover idea: combining parents genotypes to get children genotypes “somewhere in between” them Topologically speaking, “somewhere in between” = somewhere on a shortest path Why on a shortest path?
D0 : P1 D1 D2 : P2 011001 010001 011101 011011 010101 011111 010011 010111 Shortest Path Crossover Parent1: 011101 Parent2: 010111 Children: 01*1*1 Children are on shortest paths More than one shortest path in general
Interpretation & Generalization • Traditional mutation & crossover have a natural interpretation in the neighbourhood structure in terms of closeness and betweenness • Given any representation plus a notion of neighbourhood (move), mutation & crossover operators are well-defined
From graphs to geometry • Forget the neighbourhood structure andconsiderthe metric space (= space with a notion of distance) • The distance in the neighbourhood is the length of the shortest path connecting two solutions • Mutation Direct neighbourhood Ball • Crossover All shortest paths Line Segment
Balls & Segments In a metric space (S, d) the closed ball is the set of the form where x belongs to S and r is a positive real number called the radius of the ball. In a metric space (S, d) the line segmentor closed interval is the set of the form where x and y belong to S and are called extremes of the segment and identify the segment.
Line segments Balls 100 100 101 101 000 000 001 001 2 2 111 111 110 110 3 1 3 1 010 011 011 010 1 3 1 3 3 3 [(1, 1); (3, 2)] 1 geodesic Euclidean space B((3, 3); 1) Euclidean space [(1, 1); (3, 2)] = [(1, 2); (3, 1)] infinitely many geodesics Manhattan space B((3, 3); 1) Manhattan space [000; 011] = [001; 010] 2 geodesics Hamming space B(000; 1) Hamming space Squared balls & Chunky segments
Uniform Mutation & Uniform Crossover Uniform topological crossover: Uniform topological ε-mutation: Genetic operators have a geometric nature
Representation independentand rigorous definition ofcrossover and mutation in the neighbourhood seen as a geometric space…
This is cheating! I have generalized from a single example of solution representation!
Balls Line segments 2 2 2 2 2 2 1 1 1 2 2 2 1 1 3 1 3 3 B((2, 2); 1) Chessboard space B((2, 2); 1) Euclidean space B((2, 2); 1) Manhattan space [(1, 1); (3, 2)] 1 geodesic Euclidean space [(1, 1); (3, 2)] = [(1, 2); (3, 1)] infinitely many geodesics Manhattan space [(1, 1); (3, 2)] infinitely many geodesics Chessboard space Minkowski spaces – real vectors Representation: real vectors Neighbourhoods:continuous(3 types) Distances: Minkowski distances Implementation: algebraic manipulation of real vector (equation of line passing through two points) Pre-existing recombination operators:- both blend crossovers and discrete crossovers fit geometric definition- extended blend crossovers do not fit
100 101 00 00 01 01 02 02 000 001 12 12 10 10 11 11 110 111 20 20 22 22 21 21 010 011 B(00;1) Hamming space H(2,3) B(000; 1) Hamming space H(3,2) 100 101 000 001 111 110 010 011 [00;11]=[01;10] 2 geodesics Hamming space H(2,3) [000; 011] = [001; 010] 2 geodesics Hamming space H(3,2) Hamming spaces – binary strings Representation: binary/multary strings Neighbourhoods:bit-flip/site substitution Distances: Hamming distances Implementation: symbolic manipulation of multary strings (mask-based crossovers) Pre-existing recombination operators:- all binary crossovers fit the geometric definition
abc abc abc abc abc abc bac bac bac bac acb acb acb acb bac bac cba cba acb acb cab cab cab cab bca bca bca bca cab cab bca bca cba cba cba cba B(abc; 1) Swap space & Reversal space B(abc; 1) Adjacent swap space B(abc; 1) Insertion space [abc; bca] 3 geodesics Swap space & Reversal space [abc; bca] 1 geodesic Adjacent swap space [abc; bca] 1 geodesic Insertion space Cayley spaces - permutations Representation: permutations Neighbourhoods:adj. swap, swap, reversal, insertion Distances: corresponding distances Implementation: “minimal permutation sorting by X move” algorithms:- adj. swap = bubble sort- swap = selection sort - insertion = insertion sort - reversal = approximated MPS by reversals (NP-Hard)) Pre-existing recombination operators:various pre-existing crossover operators are sorting algorithm in disguise (because sorting permutations is easier than sorting vectors of other items)
* * + + sin + * sin * * * * * + Parent 1 Parent 2 x x x y y x y y Alignment Crossover Point Swap Offspring 2 Offspring 1 x x y y x x y y Syntactic tree spaces Representation: syntactic tree (lisp expression) Neighbourhood:weighted sub-tree neighbourhood Distance: structural distance Implementation: - sub-tree swap crossover - common region mask based crossover Pre-existing recombination operators:- traditional crossover (non-geometric)- homologous crossover - the geometric framework can help to clarify what is the landscape and distance related to homologous crossover and a distance connected with a geometric crossover which traditional crossover is an approximation
Significance of Unification • Most of the pre-existing crossover operators for major representations fit geometric definition • Established pre-existing operators have emerged from experimental work done by generations of practitioners over decades • Geometric crossover compresses in a simple formula an empirical phenomenon
Crossover Principled Design • Domain specific solution representation is effective • Problem: for non-standard representations it is not clear how crossover should look like • But: given a combinatorial problem you may know already a good neighbourhood structure • Geometric Interpretation of Crossover Give me your neighbourhood definition and I give you a crossover definition
Crossover Design Example + = ?
2 3 2 1 0 1 Non-labelled graph neighbourhood MOVE: Insert/remove an edge Fixed number of nodes
+ Offspring
Levenshtein spaces – sequences Representation: multary sequences (DNA/amino acids) Neighbourhood:insertion + deletion + substitution (compound edit move) Distance: Levenshtein distance Implementation: inexact sequence alignment (dynamic programming) and sites exchange (crossover mask) Pre-existing recombination operators:- none- it could be a good crossover for linear GP- it could be a better model of biological crossover to study molecular evolution because it keeps into account the inexact alignment due to molecular annealing of DNA strands that producesevolution of size variation Parent1=AGCACACA Parent2=ACACACTA best inexact alignment (with gaps): AGCA|CAC-A Child1=AGCACACTA A-CA|CACTA Child2=ACACACA
A simple model of (homologous) biological recombination fits the geometric definition under a DNA distance used in bioinformatics
Abstract convex evolutionary search Main result: an evolutionary algorithm using geometric crossover with any probability distribution, any kind of representation, any problem, any selection and replacement mechanism, does the same search: convex search Proof based on abstract convexity (axiomatic geodesic convexity) and axiomatization of search process (abstract search process)
Future work THEORY: Generalizing and accommodating pre-existent theories into geometric framework (schema theorem, fitness landscapes, representation theories…) PRACTICE: Testing crossover principled design on important problems with non-standard representation(problem domain representation)