110 likes | 130 Views
CS 478 - Machine Learning. Genetic Algorithms (II). Schema (I). A schema H is a string from the extended alphabet {0, 1, *}, where * stands for “don't-care” (i.e., a wild card) A schema represents or matches a number of strings:. There are 3 L schemata over strings of length L.
E N D
CS 478 - Machine Learning Genetic Algorithms (II)
Schema (I) • A schema H is a string from the extended alphabet {0, 1, *}, where * stands for “don't-care” (i.e., a wild card) • A schema represents or matches a number of strings: • There are 3L schemata over strings of length L CS 478 - Machine Learning
Schema (II) • Since each position in a string may take on either its actual value or a *, each binary string in a GA population contains, or is a representative of, 2L schemata • Hence, a population with n members contains between 2L and min(n2L, 3L) schemata, depending on population diversity. (The upper bound is not strictly n2L as there are a maximum of 3L schemata) • Geometrically, strings of length L can be viewed as points in a discrete L-dimensional space (i.e., the vertices of hypercubes). Then, schemata can be viewed as hyperplanes (i.e., hyper-edges and hyper-faces of hypercubes) CS 478 - Machine Learning
Schema Order • The order of a schema H is the number of non * symbols in H • It is denoted by o(H): • A schema of order o over strings of length L represents 2L-o strings CS 478 - Machine Learning
Schema Defining Length • The defining length of a schema H is the distance the first and last non * symbols in H • It is denoted by (H): CS 478 - Machine Learning
Intuitive Approach • Schemata encode useful/promising characteristics found in the population. • What do selection, crossover and mutation do to schemata? • Since more highly fit strings have higher probability of selection, on average an ever-increasing number of samples is given to the observed best schemata. • Crossover cuts strings at arbitrary sites and swaps. Crossover leaves a schema unscathed if it does not cut the schema, but it may disrupt a schema when it does. For example, 1***0 is more likely to be disrupted than **11* is. In general, schemata of short defining length are unaltered by crossover. • Mutation at normal, low rates does not disrupt a particular schema very frequently. CS 478 - Machine Learning
Intuitive Conclusion Highly-fit, short-defining-length schemata (called building blocks) are propagated generation to generation by giving exponentially increasing samples to the observed best • …And all this takes place in parallel, with no memory other than the population. This parallelism as been termed implicit as n strings of length L actually allow min(n2L, 3L) schemata to be processed. CS 478 - Machine Learning
Formal Account • See the PDF document containing a formal account of the effect of selection, crossover and mutation, culminating in the Schema Theorem. CS 478 - Machine Learning
Prototypical Steady-state GA • P p randomly generated hypotheses • For each h in P, compute fitness(h) • While maxhfitness(h) < threshold (*) • Ps Select r.p individuals from P (e.g., FPS, RS, tournament) • Apply crossover to random pairs in Ps and add all offspring to Po • Select m% of the individuals in Po with uniform probability and apply mutation (i.e., flip one of their bits at random) • Pw r.p weakest individuals in P • P P – Pw + Po • For each h in P, compute fitness(h) CS 478 - Machine Learning
Influence of Learning • Baldwinian evolution: learned behaviour causes changes only to the fitness landscape • Lamarckian evolution: learned behaviour also causes changes to the parents' genotypes • Example: • … calculating fitness involves two steps, namely k-means clustering and NAP classification. The effect of k-means clustering is to refine the starting positions of the centroids to more “representative” final positions. At the individual's level, this may be viewed as a form of learning, since NAP classification based on the final centroids' positions is most likely to yield better results than NAP classification based on their starting positions. Hence, through k-means clustering, an individual improves its performance. As fitness is computed after learning, GA-RBF makes implicit use of the Baldwin effect. (Here, we view the result of k-means clustering, namely the improved positions of the centroids, as the learned “traits”). A straightforward way of implementing Lamarckian evolution consists of coding the new centroids’ positions back onto the chromosomes of the individuals of the current generation, prior to genetic recombination. CS 478 - Machine Learning
Conclusion • Genetic algorithms are used primarily for: • Optimization problems (e.g., TSP) • Hybrid systems (e.g., NN evolution) • Artificial life • Learning in classifier systems CS 478 - Machine Learning