220 likes | 239 Views
STOCHASTIC GRAMMAR-BASED GENETIC PROGRAMMING. SC-LAB Lee YunGeun. What is SG-GP. Stochastic Grammar-Based Genetic Programming = distribution-based evolution + grammar-based genetic programming. Why is SG-GP used. SG-GP remedies two main limitations of GP.
E N D
STOCHASTICGRAMMAR-BASED GENETIC PROGRAMMING SC-LAB Lee YunGeun
What is SG-GP Stochastic Grammar-Based Genetic Programming = distribution-based evolution + grammar-based genetic programming
Why is SG-GP used SG-GP remedies two main limitations of GP. It allows for exploiting powerful background knowledge, such as dimensional consistency. It successfully resists the bloat phenomenon, avoiding the intron growth.
Example S = a*t*t/2+V0*t • N = {< E >, < Op >, < V >} • T = {+, −, ×, /, a, t, V0 } • P = • S := < E > ; • < E > := < Op > < E > < E > | < V > ; • < Op > := + | − | ×| / ; • < V > := a | t | V0 ;
Dimensional constraints • The solution is expressed in displacement, • so the start symbol is defined as: • S := <E+0+l+0>; • < E+0-2-2 > := < Op > < E > < E > | < V > ; : : : < E+0+2+2 > := < Op > < E > < E > | < V > ; :total 5^2 in this problem • < Op > := + | − | ×| / ; • < V > := a+0+1-2 | t+0+0+1 | V0+0+1-1 ;
Stochastic Grammar-Based GP • 1. Representation of the Distribution • 2. Generation of the Population • 3. Updating the Distribution
1. Representation of the Distribution • Each derivation di in a production rule is attached a weight wi • All Wi are initialized to 1.
2. Generation of the Population • For each occurrence of a non-terminal symbol, all admissible derivations are determined from the maximum tree size allowed and the position of the current non-terminal symbol. • the selection of the derivation di is done with probability pi, where
Example S = a*t*t/2+V0*t • < E > := < Op > < E > < E >,0.8 | < V >,0.2 ; • < Op > := +,1.2 | −,1.4 | ×,0.6| / ,0.4; • < V > := a,1 | t,1 | V0,1 ;
3. Updating the Distribution • All individuals in the current population have been evaluated. • The probability distribution is updated from the Nb best and Nw worst individuals according to the following rules: • – Let b denotes the number of individuals among the Nb best individuals that carry derivation di; weight wi is multiplied by (1 + ε)^b • – Let w denotes the number of individuals among the Nw worst individuals that carry derivation di; weight wi is divided by (1 + ε)^w; • – Last, weight wi is mutated with probability pm; the mutation either multiplies or divides wi by factor (1 + εm).
Example S = a*t*t/2+V0*t • Initial Wi < E > := < Op > < E > < E >,1 | < V >,1 ; < Op > := +,1 | −,1 | ×,1| / ,1; < V > := a+0+1-2,1 | t+0+0+1,1 | V0+0+1-1,1 ; • If ε=0.001, Wb=3, Ww=3 and b=2,w=1 of operator + wi <- 1* (1+0.001)^2/(1+0.001)^1
Vectorial SG-GP vs Scalar SG-GP • Distribution vector Wi is attached to the i-th level of the GP trees (i ranging from 1 to Dmax). • This scheme is referred to as Vectorial SG-GP, as opposed to the previous scheme referred to as Scalar SG-GP.
Result • GP vs SG-GP
Result • Better results are obtained with a low learning rate and a sufficiently large mutation amplitude. • This can be interpreted as a pressure toward the preservation of diversity in the population.
Result • The maximum derivation depth, Dmax • Too short, the solution will be missed. • Too large, the search will take a prohibitively long time.
Result • Vectorial SG-GP vs Scalar SG-GP
Result • Resisting the Bloat