380 likes | 826 Views
Lecture 06 Outline. IntroductionGA terminologyGA basic descriptionEncoding of chromosomesSelection operator in GACrossover and mutation operators in GAApplicationsEvolving ANNGenetic ProgrammingToy exampleAdvantages and disadvantage of GA. Lecture 6 slides for CC282 Machine Learning, R.
E N D
1. CC282 Genetic Algorithm Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 1
2. Lecture 06 Outline Introduction
GA terminology
GA basic description
Encoding of chromosomes
Selection operator in GA
Crossover and mutation operators in GA
Applications
Evolving ANN
Genetic Programming
Toy example
Advantages and disadvantage of GA
Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 2
3. Genetic Algorithm (GA) - Introduction GA is a part of evolutionary computation
GA is inspired by Darwins theory of evolution - problems are solved by an evolutionary process resulting in the survival of the fittest
EC was introduced in 1960s by Recheneberg
J. Holland invented GA in the 70s
J. Koza used GA to evolve programs (GP) in 1992
4. Genetic Algorithm (GA) - Terminology Living organisms consist of cells. Cells contains DNA
carrying the genetic material of the organism defining its
traits
Chromosomes are strings of DNA and serve as a model for the whole organism (genetic material)
Genes - blocks of DNA of which the chromosomes consist. It can be said that each gene encodes a trait or feature
Alleles are possible values for a trait (i.e. the gene)
Genome - a complete set of genetic material (i.e. all chromosomes), this is called a population in GA
Crossover is the operation when genes from parents combine to form a whole new chromosome during reproduction producing offspring
Mutation is when some elements of the genetic material is changed (normally through a random procedure)
Fitness of an organism is measured by its degree of success/failure in survival
5. Hypothesis/search space - revisited Each point is a possible solution and has a fitness value
Fitness measure how good the solution is
Fitness in this case is opposite to error measure
GA searches for the best/optimal solution, though there is no guarantee that it will find it
GA finds a solution in a evolutionary manner
Other similar methods are hill climbing, tabu search, simulated annealing
6. GA Basic description Steps in brief:
GA begins with an initial population, i.e. a set of solutions/chromosomes
Fitness of each chromosome is computed
Selection operators are applied that favours more fit chromosomes
Crossover - with the hope that by recombination of parents, offspring produced may be fitter than the parents -> chromosomes recombine to produce offspring
Mutation operator is applied
Assess the fitness of the new population stop if the optimal solution is achieved or if the maximum generation number is reached
Else, repeat to next generation with selection, crossover, mutation operators
7. The GA algorithm GA(Fitness, Fitness_threshold, max_generation, popsize, Pc, Pm)
Fitness: A function that assigns an evaluation score, given a hypothesis
Fitness_threshold: A threshold specifying the termination criterion
Max_generation: The maximum generation number to terminate GA
popsize: The size of the population
Pc: Crossover probability, i.e. the fraction of the population to be replaced by crossover operator at each generation
Pm: Mutation probability, i.e. the fraction of the population to be replaced by mutation operator at each generation
Initialise population: P ? Generate popsize random hypotheses
Evaluate: for each h in P, compute Fitness(h)
While [maxh Fitness(h)] < Fitness_threshold | generation < max_generation
1. Selection: Select popsize members of P (with replacement) to add to Pnext
2. Crossover: Pairs of hypotheses are randomly selected using Pc. For each pair, <h1,h2>, produce two offspring by applying the crossover operator. Add all offspring to Pnext
3. Mutate: Invert a randomly selected bit in random members of Pnext using probability Pm
4. Update: P ? Pnext
5. Evaluate: for each h in P, compute Fitness(h)
Return the hypothesis from P that has the highest fitness
8. GA Some preliminary design questions Encoding
GA operates on the coding of parameters rather than the parameter itself
These parameters are called chromosomes and are a string of values which represent potential solutions to the given problem
The encoding could be binary, decimal or continuous which to use?
Constraints - Any constraint to the gene values?
Fitness How to obtain the fitness for each chromosome?
Selection - How to select candidate chromosomes?
The other two operators - How to perform Crossover and Mutation?
9. Chromosomes binary representation Chromosomes are mostly represented by a string of bits
Each bit/group of bits represents some characteristic/attribute/feature
Values of each feature are checked
represent each feature with enough bits to cover all possible values
Recall the play-tennis example:
Wind : {strong, weak} can be represented by two bits
Example:
Wind =strong, ?{10}, , Wind =weak, ?{01}, Wind =strong or weak ?{11}
Outlook: {cloudy, rainy, sunny} can be represented by three bits
eg: Outlook =cloudy or rainy then this is represented as 110
So, a rule such as (Outlook=cloudy ? rain) ? (Wind=strong) ? the chromosome representation is 11010
10. Binary and decimal coding chromosomes Let us consider a more general situation
Assume we have three variables, x, y and z
Decimal coding is simply the integer values for genes, eg: x=35, y=191, z=5
Binary coding the genes are coded in binary form
Let us assume that these variables can take integer values from 0 to 255
So, we need 8 bits for each variable (i.e. gene)
If x =35, y=191, z=5, we have
x=00100011, y=10111111, z=00000101
And the chromosome ?001000111011111100000101
But why go through the hassle of representing integers using binary coding?
Answer (see Exercise 6, question 4)
Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 10
11. Continuous coding chromosomes But what if we want genes to represent continuous values eg: x=0.67, y=1.56, z=3.45
Solution: use binary chromosome with approximation or use continuous valued chromosomes
We will not cover continuous valued chromosomes in this course
As they require special type of GA operators
Binary chromosome with approximation eg: x=0.145 (assume 8 bits per gene)
Use the general equation:
With 8 bits, xmax=255 and xmin=0
0.145*255=36.975, round this to 37, so x =00100101
So, x=00100101 is an approximation of x=0.145
More bits will improve the approximation but computation becomes time consuming Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 11
12. Fitness function and gene contraints an example Let us consider a linear programming problem, which arise naturally in production planning:
Suppose a particular Ford plant can build Escorts at the rate of one per minute, Explorer at the rate of one every 2 minutes, and Lincoln Navigators at the rate of one every 3 minutes. The vehicles get 8, 5, and 4 miles per litre, respectively, and Parliament mandates that the average fuel economy of vehicles produced be at least 6 miles per litre. Ford loses 1000 on each Escort, but makes a profit of 5000 on each Explorer and 15,000 on each Navigator. What is the maximum profit this Ford plant can make in one 8-hour day?
The fitness function here is the cost function, i.e. the profit Ford can make by building x Escorts, y Explorers, and z Navigators
And we want to maximize it
The fitness function is f=-1000x+5000y+15000z
Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 12
13. Gene constraints Using the same example in the previous slide:
The constraints arise from the production times and Parliament mandate on fuel economy
There are 480 minutes in an 8-hour day, and so the production times for the vehicles lead to the following limit:
x+2y+3z ? 480
The average fuel economy restriction can be written:
8x+5y+4z ? 6(x+y+z) which simplifies to 2x-y-z ? 0
There is an additional implicit constraint that the variables are all non-negative:
x, y, z ? 0 Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 13
14. Selection Selection (aka reproduction) operator is applied many times to produce a mating pool of the new population
There are a number of ways to do selection to ensure that the members of the population are drawn with the correct probability
Roulette wheel (fitness proportionate) selection
Tournament selection
Steady-state selection
Rank selection
Elitism
15. Roulette wheel (fitness proportionate) selection Chromosomes are selected according to their proportionate fitness
The higher fitness they are, the more chances they have to be selected
Sampling can be viewed as playing a game of roulette where the pocket sizes are proportional to the probability of selecting a particular individual
Each new member of the population is drawn independently when the roulette wheel is spun randomly
In computer, this spin is done using a randomly generated number [0,1]
But the best (so far) found solution may be lost, eg: Pnext={B,B,C}
16. Selection (ctd) Tournament selection
Pick a few chromosomes (say, popsize/4 chromosomes) at random from the population
From these few, select the one fittest (i.e. with highest fitness), replace the rest and repeat the process popsize times
This method can retain some good chromosomes while giving chance for other weaker chromosomes to take part in mating
Steady-state selection
A few good (with high fitness) chromosomes are selected to replace the few bad (with low fitness) chromosomes
The rest of population (the in-between fitness ones) are selected by other methods or all are selected to remain in Pnext
17. Selection (ctd) Rank selection
The other selection methods will have problems if the fitness differs a lot
For example, if the best chromosome fitness is 90% of all the rest, then using roulette wheel, the other chromosomes will have very few chances to be selected
Rank selection first ranks the population and then every chromosome receives fitness from this ranking (i.e. probability of selection is proportional to rank)
The worst will have fitness 1, second worst 2 etc and the best will have fitness N (number of chromosomes in population)
Then, using these new fitness values, roulette wheel selection method is performed
Using this, all the chromosomes have a fair chance to be selected
But this method can lead to slower convergence, because the best chromosomes do not differ so much from other ones
Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 17
18. Crossover Even though reproduction increases the percentage of better fitness chromosomes, the procedure is considerably sterile; it cannot create new and better chromosomes
This function is left over to crossover and to a lesser but critical extent, to mutation
Crossover process simulates the exchange of genetic material that occurs during biological reproduction
In this process pairs in the breeding population are mated randomly with a crossover rate, Pc
Typical crossover properties include that an offspring inherits the common feature from the parents along with the ability of the offspring to inherit two completely different features
Popular crossover techniques: one point, two point and uniform crossover Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 18
19. Crossover (ctd) First, randomly select a pair of parents (i.e. two chromosomes)
Perform crossover (swapping of bits) to obtain offspring, repeat this process Pc*popsize/2 times with the used parent chromosomes not included
Example: if Pc=0.5 and popsize=20, then do crossover 5 times
Single point and two-point crossover:
Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 19
20. Crossover (ctd) The uniform crossover scheme works as follows
A randomly generated bit string called the crossover mask generalises the process
A bit value of 1 in this bit string indicates that corresponding bits in the parents are to be exchanged while a 0 bit indicates no bit interchange Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 20
21. Mutation Mutation consists of making small alterations to the values of one or more genes in a chromosome
Mutation randomly perturbs the populations characteristics, and prevents evolutionary dead ends
Most mutations are damaging rather than beneficial and hence mutation rate must be low to avoid the destruction of species
It works by randomly selecting a bit with a certain mutation rate in the string and reversing its value
Mutation is applied to the randomly chosen bit in a chromosome chosen randomly
If Pm is 0.01, with a popsize of 20 with 18 bits each, then the mutation is repeated for 0.01 x 18 x 20 =3.6 4 times
Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 21
22. Applications The possible applications of genetic algorithm are immense
Any problem that has a large search domain could be suitably tackled by GA
We shall explore (very briefly) on the use of GA to evolve neural network weights and to evolve function/programs in genetic programming
Well also look at a simple toy example
23. Evolving NN weights using GA a simple example GA has been used successfully to evolve NN weights
GA is suitable for evolving the weights of a neural network standard learning techniques such as backpropagation would take thousands upon thousands of iterations to converge
But GA could (given the appropriate direction) evolve suitable weights within a hundred or so iterations
Example
Obtain the weights for perceptron unit for learning the OR function (we saw this in the previous lecture)
But rather than using backpropagation to update the weights, we can use GA
24. Evolving NN weights using GA a simple example Initial parameters
Fitness function: 1/MSE of desired to actual output, GA will maximise this fitness function
Coding, binary approximation: w1, w2 and w0 weights, say with each 6 bits, so chromosome length is 18
Popsize=20, i.e. 20 chromosomes, initially generated randomly
Pc=0.5, Pm=0.01
MSE_limit=0.1, so, fitness_threshold=10; max_generation=100
Gene constraints, w1, w2 and w0 in the range [-1,1]
Apply selection (say, tournament selection), crossover (say one point) and mutation to produce a new population
Repeat step 3 until convergence to an acceptable solution (fitness>fitness_threshold or generation>max_generation)
Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 24
25. Genetic programming (GP) An example In programming languages such as LISP, the mathematical notation is not written in standard notation, but in prefix notation
Examples:
+ 1 2 : 1+2
* + 1 2 2 : (1+2)*2
* + - 2 1 4 9 : ((2-1)+4)*9
Notice the difference between the lefthand side and the right? Apart from the order being different, there are no use of parenthesis
The prefix method makes life a lot easier for programmers and compilers alike, because order precedence is not an issue
You can build expression trees out of these strings that then can be easily evaluated. For example, the trees for the previous three expressions are.
26. Genetic programming (GP) An example (ctd) Having numerical data and primitive functions, but no expression to conjoin the data with the primitive functions, a genetic algorithm can be used to evolve an expression tree to create a very close fit to the data
By splicing and grafting the trees and evaluating the resulting expression with the data and testing it to the primitive functions, the fitness function can return how close the expression is
The limitations of genetic programming lie in the huge search space the GA have to search for - an infinite number of equations
Therefore, normally before running a GA to search for an equation, the user tells the program which primitive functions to search under
27. Genetic programming (GP) An example (ctd) Assume we have data like the following and we wish to obtain the function that maps z using x and y
Assume the only available primitive functions are sin,?, sqr, sqrt
GP will splice and graft the trees using these primitive functions with the fitness function to minimise prediction error of z using x and y data as above
Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 27
28. Genetic programming (GP) example (ctd) Crossover example in GP ->
Mutation randomly changes the primitive function
The actual function is Lecture 6 slides for CC282 Machine Learning, R. Palaniappan, 2008 28
29. Toy example Consider: a + 2b + 3c + 4d = 30, where a, b, c, d are positive integers
Use GA to find a, b, c and d
Assume decimal coding is used
Choose say 5 random initial solution sets (i.e. popsize=5) forming the initial population with the constraint 1 = a, b, c, d = 30
30. Example (ctd) Calculate the fitness value for each chromosome, i.e. calculate the absolute difference of each expression to 30, take inverse, this will be our fitness value
Eg: Chromosome 1, expression=1+2*28+3*15+4*3=114
Since expression values that are lower are closer to the desired answer (30), these values are more desirable
So, take the inverse of the absolute difference as fitness value
Now, GA will try to maximise higher fitness values
In order to create a system where chromosomes with more desirable fitness values are more likely to be chosen as parents, we have to do selection
Assume we use the roulette wheel (fitness proportionate) method
31. Example (ctd) Calculate the fitness proportion (likelihood) for each chromosome to be picked/selected as parent. e.g. take the sum of the all fitness values (0.135266), and calculate the percentages from there
Use
32. Example (ctd) Spin the roulette wheel for 5 times
Assume the result was
Since chromosome 4 had a poor fitness, its chances of survival was slim and died out in the selection process
33. Example (ctd) Do crossover, say single point
The offspring of each of these parents contains the genetic information of both father and mother
For example;
a father has the solution set a1, b1, c1, d1, and a mother has the solution set a2, b2, c2, d2, then there can be three pairs of possible crossovered offspring (| = crossover point):
34. Example (ctd) Assume that through random parent selections, we have the following parent chromosomes
Applying crossover to our example to produce one offspring for each pair of parents (assuming the crossover points are chosen randomly):
Note: normally, there would be two offspring from parents but for simplicity of discussion, assume only one offspring is produced here
35. Example (ctd) Apply mutation to a randomly chosen chromosome, say gene a in chromosome 1
Mutation here would change the randomly selected gene value from 0 to 30
(13, 28, 15, 3) ? (8, 28, 15, 3)
Recalculate the fitness value for the offspring representing the new generation:
36. Example - Commentary The average fitness value for the offspring chromosomes were 0.026, while the average fitness value for the parent chromosomes were 0.017
Progressing at this rate, one chromosome should eventually reach a very high fitness value (i.e. when absolute difference is close= 0), that is when an optimal solution is found
If you tried and simulated this yourself, you may actually get a fitness average that is lower on some generations, but on the longrun, the fitness levels will increase
For systems where the population is larger (say 50, instead of 5), the fitness levels should be more steadily and stably approach the desired level, i.e. nearly every generation will have better solutions than previous ones
37. GA strengths and weaknesses Advantage
Often achieves good results
In most cases, fitness function can be designed easily to fit the hypothesis (solution)
Can be easily hybridised with many other ML algorithms to yield improved results
There is no hard and fast rules, many users use variations freely in their applications
Disadvantage
There is no guarantee that GA converges to the optimal solution
Because of incomplete searches
Because of hypothesis crowding, i.e. most chromosomes become similar and the fitness is high but not best and GA cant progress further due to lack of variety
38. Lecture 6: Study guide At the end of this section, you should be able to
Define chromosome, gene, allele, crossover, mutation, fitness function
Describe how GA work using a flowchart or an algorithm
Explain how chromosomes and hypothesis are represented in GA, i.e. coding in GA
Estimate the fitness function of a given population
Describe chromosome selection mechanisms
Perform crossover between two chromosomes using a single, two-point and uniform masks
Perform mutation
Explain how GA can be used to evolve NN weights
State the main advantages and disadvantage of GA