430 likes | 620 Views
Genetic Algorithms. CIS 488/588 Bruce R. Maxim UM-Dearborn. Genetic Algorithms. What are they? Evolutionary algorithms that make use of operations like mutation, recombination, and selection Uses? Difficult search problems Optimization problems Machine learning Adaptive rule-bases.
E N D
Genetic Algorithms CIS 488/588 Bruce R. Maxim UM-Dearborn
Genetic Algorithms • What are they? • Evolutionary algorithms that make use of operations like mutation, recombination, and selection • Uses? • Difficult search problems • Optimization problems • Machine learning • Adaptive rule-bases
Theory of Evolution • Every organism has unique attributes that can be transmitted to its offspring • Offspring are unique and have attributes from each parent • Selective breeding can be used to manage changes from one generation to the next • Nature applies certain pressures that cause individuals to evolve over time
Evolutionary Pressures • Environment • Creatures must work to survive by finding resources like food and water • Competition • Creatures within the same species compete with each other on similar tasks (e.g. finding a mate) • Rivalry • Different species affect each other by direct confrontation (e.g. hunting) or indirectly by fighting for the same resources
Natural Selection • Creatures that are not good at completing tasks like hunting or mating have fewer chances of having offspring • Creatures that are successful in completing basic tasks are more likely to transmit their attributes to the next generation since there will be more creatures born that can survive and pass on these attributes
Genetics • Genome (class) • Sequence of genes describing the overall structure of the genetic for a particular species • Genomics • Study of the meaning of the genes for a particular species • Alleles • Values that can be assigned to a given gene • Genotype (instance) • Sequence of alleles
Physical Properties • Phenetics • Study of physical properties and morphology of creatures independent of genetic information • Phenome • General structure of creatures body and attributes • Phenotype • Particular instance of phenome realized as a unique creature • Product of genotype and environment forces
Conversions • In real-world mapping between genotypes and phenotypes is hard • In AI work it can be done by defining a convenient function or even designing encodings by hand • It is often easier to adapt genetic operators to work with the evolutionary data structure used to represent the phenotype than to encode and decode phenotypes
Genetic Algorithmic Process • Potential solution for problem domains are encoded using machine representation (e.g. bit strings) that supports variation and selection operations • Mating and mutation operations produce new generation of solutions from parent encodings • Fitness function judges the individuals that are “best” suited (e.g. most appropriate problem solution) for “survival”
Initialization • Initial population must be a representative sample of the search space • Random initialization can be a good idea (if the sample is large enough) • Random number generator can not be biased (e.g. Mersenne Twister) • Can reuse or seed population with existing genotypes based on algorithms or expert opinion or previous evolutionary cycles
Evaluation • Each member of the population can be seen as candidate solution to a problem • The fitness function determines the quality of each solution • The fitness function takes a phenotype and returns a floating point number as its score • It is problem dependent so can be very simple • It can be a botteneck if it is not carefully thought out (there are no magic ways to create them)
Selection • Want to to give preference to “better” individuals to add to mating pool • Mating can be sexual or asexual • If entire population ends up being selected it may be desirable to conduct a tournament to order individuals in population • Would like to keep the best in the mating pool and drop the worst (elitism) • Elitism is trade-off with search space completeness
Crossover - 1 • In sexual reproduction the genetic codes of both parents are combined to create offspring • Asexual crossover has no impact on the mating pool • Would like to keep 60/40 split between parent contributions • 95/5 splits negate the benefits of crossover (too much like asexual reproduction)
Crossover - 2 • If we have selected two strings A = 11111 and B = 00000 • We might choose a uniformly random site (e.g. position 3) and trade bits • This would create two new strings A’ =11100 and B’ = 00011 • These new strings might then be added to the mating pool if they are “fit”
Mutation • Mutations happen at the genome level (rarely and not good) and the genotype level (better for the GA process) • Mutation is important for maintaining diversity in the genetic code • In humans, mutation was responsible for the evolution of intelleigence • Example: The occasional (low probably) alteration of a bit position in a string
Operators • Selection and mutation • When used together give us a genetic algorithm equivalent of to parallel, noise tolerant, hill climbing algorithm • Selection, crossover, and mutation • Provide an insurance policy against losing population diversity and avoiding some of the pitfalls of ordinary “hill climbing”
Replacement • Determine when to insert new offspring into the mating pool and which individuals to drop out based on fitness • Steady state evolution calls for the same number of individuals in the population, so each new offspring processed one at a time so fit individuals can remain a long time • In generational evolution, the offspring are placed into a new population with all other offspring (genetic code only survives in kids)
Genetic Algorithm Set time t = 0 Initialize population P(t) While termination condition not met Evaluate fitness of each member of P(t) Select members from P(t) based on fitness Produce offspring form the selected pairs Replace members of P(t) with better offspring Set time t = t + 1
Why use genetic algorithms? • They can solve hard problems • Easy to interface genetic algorithms to existing simulations and models • GA’s are extensible • GA’s are easy to hybridize • GA’s work by sampling, so populations can be sized to detect differences with specified error rates • Use little problem specific code
TSP • To use a genetic algorithm to solve the traveling salesman problem we could begin by creating a population of candidate solutions • We need to define mutation, crossover, and selection methods to aid in evolving a solution from this population
TSP • For crossover we might take two paths (P1 and P2) break them at arbitrary points and define new solutions Left1+Right2 and Left2+Right1 • For mutation we might randomly switch two cites in an existing path
Evolve Algorithm for TSP • Set up initial population • For G generations • Create M mutations and add them to the population • Subject mutations to population constraints and determine their relative fitness • Create C crossovers and add them to the population • Subject crossovers to population constraints and determine their relative fitness
Premature Convergence - 1 • Occasionally a gene takes over because it is so much fitter than all others (genetic drift) • If this is the best solution, that may be OK (if not you will may never find the optimal solution if this happens too soon) • Large populations genetic drift is less likely to happen • Using higher mutation rates can combat genetic drift
Premature Convergence - 2 • High levels of randomness are not always helpful to GA • To prevent genetic drift • You might have several small populations and cross-breed individuals from them • Take game of life approach, pretend individuals live on 2D grid and only allow breeding between neighbors (spatial organizational structure)
Slow Convergence • Some GA will simply fail to converge • Similar to plateau problem in hill climbing (need to add noise to fitness values to make them converge) • Can increase elitism to encourage fitter individuals to spread their genes (at the risk of premature convergence) • Increasing level of random mutations sometimes helps
Parameters • Require lots of parameters (mutation rate, crossover type, population size, fitness scaling policy) • Can make use of a hierarchy of GA’s with a master GA setting the parameters for an ordinary GA • Parameterless GA have default values chosen for parameters so that human interaction is not needed for fine tuning
Domain Knowledge • GA do not exploit domain knowledge unless the KE designs special policies and operators • During initialization there can be a bias toward certain genotypes selected by the domain expert • Can use gene dependent mutation rates and heuristic crossover split points • The choice of representation can affect the size and search efficiency of the problem space
GA Strengths • Do well at avoiding local minima and can often times find near optimal solutions since search is not restricted to small search areas • Easy to extend by creating custom operators • Perform well for global optimizations • Work required to to choose representations and conversion routines is acceptable
GA Weaknesses • Do not take advantage of domain knowledge • Not very efficient at local optimization (fine tuning solutions) • Randomness inherent in GA make them hard to predict (solutions can take a long time to stumble upon) • Require entire populations to work (takes lots of time and memory) and may not work well for real-time applications
Evolvee • Uses existing representations (like NN) • Realism is relatively poor • Attack simple tasks (e.g. attack behaviors) do not pose any problems for it • (not found in current archive)
Actions and Parameters • Limited action set needed • Look parameter: direction • Single value: up, ahead, down • Move parameter: weights • Vector (projectile, collision point, impact location) • Fire parameter: • Jump parameter:
Sequences • Contained in simple arrays of actions and times • Times can be associated with actions in two ways • Time offset relative to previous action • Absolute time since start of sequence • The order of sequences in an array is not important (this allows symmetric solutions but avoids the cost of sorting actions before evolution is complete)
Random Generation - 1 • Time offset will be a randomly generated values within maximum sequence length • Action type can be encoded as a symbol randomly chosen from set of all possible actions • Parameters values are action specific and need to be chosen after action is selected and given in range values
Random Generation - 2 • The length of all action sequences can also be generated randomly (with an maximum upper bound) • The sequences of actions will be housed in a dynamic array • Start time of first action in a sequence can be reset to zero
Crossover • Simple one point crossover • Randomly split two move sequences from parents and swap subarrays to create two new children • Fairly easy to program using arrays
Mutation • A low probability mutation might be to change the length of a sequence • Empty spaces can be filled with random action • Excess actions are simply ignored • A low probability mutation might be to replace individual actions within existing sequences • Gene storage time follows normal Guassian distribution
Evolution • Population size will remain constant • Evolution happens on request • If individual unassigned fitness exists chose it otherwise choos two parents with probabilities proportional to their fitness for crossover/mutation • Individuals are removed from the population using random selection based on inverse fitness • To diversify the population remove the poorer of two similar behaviors
GA Module Interfaces • Exported GA inteface void ReportFitness(const float f); • Evolvable interface methods void Crossover(const Individual& a, const Individual& b, const Individual& c); void Mutate(const Individual& a, const Individual& b, const Individual& c); void Randomize(const Individual& a); void Allocate(vector<Individual>& population); void Deallocate(vector<Individual>& population); void Evaluate(const Individual& a);
Computing FitnessRocket Jumping • Assign rewards only for upward movement when animat is not touching the floor, to avoid rewarding running up the stairs • Reward high jump a lot more than lower jumps
Computing FitnessDodging Fire • Provide 0 reward when hit and high reward when animat escapes with no damage • Must include distance of dodging movement away from point of impact to avoid rewarding “standing still” • Damage to animat must also be measured and subtracted from fitness value • Use time as a 4th dimension to resolve ties
Kanga • Makes use of genetic algorithm • Learns it jumping and dodging behaviors during the game • Fitness function provides rewards on a per jump or per dodge basis
Evaluation - 1 • Learns to jump fairly quickly • Multiple jumps are no problem • Dodging behavior is also learned quickly • Any balanced combination of vector weights (estimated point of impact, closest collision point, project attributes) that causes movement to safety work well • Approach is sub-optimal but acceptable
Evaluation - 2 • Continuous fitness values are more helpful to the genetic algorithm than Boolean success indicators • Scheme reveals how well it is possible to evolve behaviors using genetic operators • The representation is better suited to modeling sequences than either decision trees or fuzzy rules • Representation is incompatible with rule-based schemes