240 likes | 259 Views
Delve into the intricacies of genetics, DNA, chromosomes, mutation, and evolution processes. Learn how genetic material evolves, leading to survival advantages through crossover and mutation.
E N D
Part 1 - Natural Genetics Ben Paechter with thanks to the EvoNet Training Committee and its “Flying Circus”
Natural Genetics • The information required to build a living organism is coded in the DNA and other genetic material found in the cells of that organism • Within a species, most of the genetic material is the same • Small changes in the genetic material give rise to small changes in the organism • E.g height, hair colour
DNA and Genes • DNA is a large molecule made up of fragments. There are several fragment types, each one acting like a letter in a long coded message: -A-B-A-D-C-B-B-C-C-A-D-B-C-C-A- • Certain groups of letters are meaningful together - a bit like words. • These groups are called genes • The DNA is made up of genes and rubbish
Example: Human Reproduction • Human DNA is organised into chromosomes • Most human cells contains 23 pairs of chromosomes which together define the physical attributes of the person:
Reproductive Cells • Sperm and egg cells contain 23 individual chromosomes rather than 23 pairs • Reproductive cells are formed by one cell splitting into two • During this process the pairs of chromosome undergo an operation called crossover
Crossover During crossover the chromosome pairs link up and swap parts of themselves: Before After After crossover one of each pair goes into each cell
Fertilisation Sperm cell from Father Egg cell from Mother New person cell
Mutation • Occasionally some of the genetic material changes very slightly during this process • This means that the child might have genetic material information not inherited from either parent • This is most likely to be catastrophic
Theory of Evolution • From time to time, reproduction, crossover and mutation produce new genetic material or new combinations of genes • Usually this reduces the organism’s ability to survive and so reproduce • Occasionally the new genetic material increases the organism’s ability survive and so reproduce • If it allows the organism to reproduce more then this leads to more and more organisms have the “new improved” genetic make-up • “Good” sets of genes get reproduced more • “Bad” sets of genes get reproduce less
Theory of Evolution (2) • The organisms as a whole get better and better at surviving in their environment • Evolutionists claim that all the species of plants and animals have been produced by this slow changing of genetic material - with organisms becoming better and better at surviving in their niche, and new organisms evolving to fill any vacant niche • They agree that evolution requires reproduction, selection and mutation • Some say evolution also requires crossover
Evolution as Search • We can think of evolution as a search through the enormous genetic parameter space for the genetic make-up that best allows an organism to reproduce in its changing environment • Since it seems pretty good at doing this job, we can borrow ideas from nature to help us solve problems that have an equally large search spaces or similarly changing environment
Dr. Eick’s Transparencies:Genetics and What EC AlgorithmDesigners can learn from it
More Genetics: Diploidy and Dominance • Diploidy: Most chromosomes in biological systems are double-stranded(diploid) and not single-standed(haploid) carrying pairs of chromosomes each containing information for the same function. • The primary mechanism to select which genotypical information will be expressed in the phenotype is dominance: • AbCDe + aBCde ABCDe • Diploidy provides a mechanism for remembering alleles and allel combinations that were previously useful; dominance provides a mechanism to shield those remembered alleles from harmful selection in a current hostile environment (increasing implicitly the richness of the genes expressed in the current population by providing a shield against overselection). • Dominance relationships frequently adapt in biological systems when the need arises. • Hollstien(1971) simulated dominance using a three letter instead of a binary alphabet consisting of: dominant 1, non-dominant 1, and 0 with: 1dom > 0 and 1rec < 0.
Dominance and Diploidy (Continued) • Other research represents the dominance information separately from the gene and lets it undergo evolution --- a kind of co-evolution approach. • In the late 70s, Smith and Goldberg explored the use of redundancy for the normal knapsack problem with dynamic weight changes: • Holstein’s triadic scheme showed improvement over a static dominance scheme. • it turned out that the diploid approach coped better with ascillations in the weight function. • decreases the probability that desired schemas are lost “forever”. • In summary, there seems to be some evidence that exploiting diploidy can be beneficiary for GAs in dynamically changing environments, especially if scenarios encountered in the past have a tendency to reoccur in the future; on the other hand, diploidy is quite expensive, and not too much research has been performed in the last 15 years that explores its use for GAs.
What can GA-designer learn from plant genetics and horticulture? • polyploidy and dominance • gametogenesis is used as the crossover operator • use of selfing • unusual ways to prevent self fertilization • use of intercrossing (create cartesian products of good initial solutions) • preference for heterozygous sources and rich gene pools • plant breeders employ complex search strategies to breed the best possible plant (such as recurrent selection, which will be the topic of this talk). • mutation not very important, because it is hard to control; large population sizes are difficult to handle because of pragmatic reasons.
Polyploidy Polyploidy: using two are more complete sets of chromosomes; the phenotype of an organism is determined through dominance of alleles. Advantages: adaptation to changing environments, “memorize” alleles that worked successfully in the past, richer gene pool. Previous Research on Polyploidy: two major approaches to simulate polyploidy in GAs: • using an extra chromosome to represent dominance information [Brindel, this talk] • extending the alphabet to distinguishes between dominant and recessive elements [Holstein, Smith&Goldberg, Ng&Wong]
Features of our Approach • uses at least 2 sets of chromosomes • uses a dominance vector as a tie breaker • uses a crossover control vector to restrict possible crossover points • dominance vectors and crossover control vectors take part of the evolution • gametogenesis is used as the crossover operator
3. Experiments • Benchmarks: • Knapsack problem with dynamically changing weight constraints • Schwefel function • Evaluation is performed with respect to the following measure: M2=(Ti-Xi)2/G where Ti is the true optimimum for generation i and Xi is the best solution found in generation i, and G is the number of generations.
4. Summary • proposed an approach to support polyploidy that uses dominance vectors • demonstrated the benefits of the approach in oscillating environments which cycle among several different states. • crossover control vectors are employed to provide linkage between the dominance vector and the chromosomes themselves. • approach facilitates maintaining diversity in relatively small populations • our experiments at least partially explain why diploidy and polyploidy exist in biological systems.
Literature • Ben S. Hadad and Christoph F. Eick: Using Recurrent Selection to Improve GA-performance, ISMIS, Charlotte, October 1997. • Ben S. Hadad and Christoph F. Eick: Supporting Polyploidy in Genetic Algorithms Using Dominance Vectors, EP’97, Indianapolis, April 1997. • Ben S. Hadad: Extending Genetic Algorithms Using Ideas Borrowed from Plant Genetics and Horticulture, Master’s Thesis, University of Houston, December 1996.
Inversion and Other Reordering Operators • Reordering operators change the position/location of genes in a chromosome, but do not change the composition of the chromosome: • consequently, reordering operators do not directly affect the fitness. • however, crossover is effected: namely, the defining length of a schema is changed by applying reordering operators, which increases or decreases the probability that instances of a particular schema reoccur in the future. • reordering causes that genes are nolonger lined up corrrectly, which, in many applications, causes problems with the crossover operator: • necessary genes might be missing: non-complete gene combinations can occur. • duplicated genes can occur, wbich is usually not desirable. • The most popular reordering operators are inversion and swapping: 1 2 3 | 4 5 6 7 | 8 inversion: 12376548 swap: 12375648 • Empirical evidence seem to indicate that at least in some applications reordering operators are useful “secondary” operator, whose employment induces slight improvements in the overall performance.
Niche and Speciation • We can view a niche as an organism’s job or role in an environment, and we can think of a species as a class of organisms with common characteristics. • Niche Methods in Genetic Search: • crowding (DeJong(1975)) and sharing functions (Goldberg(1987)). • external schemes (Perry(1984)) which are similarity templates that define species membership that have be provided by the GA-developer. • Mating restrictions in genetic search: • line breading (breed the champion repeatedly with others) • Hollstein’s inbreeding with intermittent crossbreeding (close individuals still bread as long as their family average fitness continues to improve; otherwise, crossbreeding between different families is used). • Booker introduces mating templates that are mate selection mechamisms that become part of the individual (which themselves undergo evolution) and proposes different mating rules: • bidirectional match • unidirectional match • best partial matches • disallow breeding of simimlar indiduals (e.g. incest)
Example of a Booker Mating Template • Assume we have chromosomes over alphabet A with chromosome length n, and let A’=union(A,{#}). Extend chromosomes tripling their length to: ind=a1...anb1...bnc1...cn with aiA, bi and ciA’ (i=1,n) with the meaning: ind is allowed to mate with ind’: if ind’Schema(b1...bn ) or ind’Schema(c1...cn ). • Example: Let n=4 and A be the binary alphabet: ind1=0010 0000 1111 ind2=0000 1### 0111 ind3=0111 001# 1111 • Bidirectional match requests that “a must want b” and “b must want a”, whereas in unidirectional match it is sufficient that one partner wants the other. • Many other matching schemes are possible; e.g. more complicated ones that operate on scores and thresholds.
Artificial Mating Tags • the problem with Booker’s approach is that mating templates have the same length as the chromosomes themselves, producing a significant overhead. To reduce this overhead Holland proposed to use a three-part strings consisting of: • a short mating template(used to test suitability of other mates) • a short mating tag(used by others to match, characterizes the string) • the functional substring Example: #10#:1010:111111000011 #0##:1100:011111110001 • mating tags effect the compatibility with other strings, but do not effect the fitness. • usually, the three-part string is evolved. • Holland’s scheme of using artificial mating tags can also be used to define mating niches abstractly, similar to Perry’s external schema approach, by freezing particular positions in templates and tags. For example, mating can easily restricted to particular subsets of the population. Mating tags can also be used to simulate distributed GAs.