620 likes | 765 Views
树结构编码进化优化算法. 济南大学 计算智能实验室 陈月辉 yhchen@ujn.edu.cn http://cilab.ujn.edu.cn. Genetic Programming. Developed: USA in the 1990’s Early names: J. Koza Typically applied to: machine learning tasks (prediction, classification …) Attributed features: competes with neural nets and alike
E N D
树结构编码进化优化算法 济南大学 计算智能实验室 陈月辉 yhchen@ujn.edu.cnhttp://cilab.ujn.edu.cn
Genetic Programming • Developed: USA in the 1990’s • Early names: J. Koza • Typically applied to: • machine learning tasks (prediction, classification …) • Attributed features: • competes with neural nets and alike • needs huge populations (thousands) • slow • Special: • non-linear chromosomes: trees, graphs • mutation possible but not necessary (disputed!)
Introductory example: credit scoring • Bank wants to distinguish good from bad loan applicants • Model needed that matches historical data
Introductory example: credit scoring • A possible model: • IF (NOC = 2) AND (S > 80000) THEN good ELSE bad • In general: • IF formula THEN good ELSE bad • Only unknown is the right formula, hence • Our search space (phenotypes) is the set of formulas • Natural fitness of a formula: percentage of well classified cases of the model it stands for • Natural representation of formulas (genotypes) is: parse trees
AND = > NOC 2 S 80000 Introductory example: credit scoring IF (NOC = 2) AND (S > 80000) THEN good ELSE bad can be represented by the following tree
Tree based representation • Trees are a universal form, e.g. consider • Arithmetic formula • Logical formula • Program (x true) (( x y ) (z (x y))) i =1; while (i < 20) { i = i +1 }
Tree based representation (x true) (( x y ) (z (x y)))
Tree based representation i =1; while (i < 20) { i = i +1 }
Tree based representation • In GA, ES, EP chromosomes are linear structures (bit strings, integer string, real-valued vectors, permutations) • Tree shaped chromosomes are non-linear structures • In GA, ES, EP the size of the chromosomes is fixed • Trees in GP may vary in depth and width
Tree based representation • Symbolic expressions can be defined by • Terminal set T • Function set F (with the arities of function symbols) • Adopting the following general recursive definition: • Every t T is a correct expression • f(e1, …, en) is a correct expression if f F, arity(f)=n and e1, …, en are correct expressions • There are no other forms of correct expressions • In general, expressions in GP are not typed (closure property: any f F can take any g F as argument)
Offspring creation scheme • Compare • GA scheme using crossover AND mutation sequentially (be it probabilistically) • GP scheme using crossover OR mutation (chosen probabilistically)
Flowchart GA flowchart GP flowchart
Mutation • Most common mutation: replace randomly chosen subtree by randomly generated tree
Mutation cont’d • Mutation has two parameters: • Probability pm to choose mutation vs. recombination • Probability to chose an internal point as the root of the subtree to be replaced • Remarkably pm is advised to be 0 (Koza’92) or very small, like 0.05 (Banzhaf et al. ’98) • The size of the child can exceed the size of the parent
Recombination = Crossover • Most common recombination: exchange two randomly chosen subtrees among the parents • Recombination has two parameters: • Probability pc to choose recombination vs. mutation • Probability to chose an internal point within each parent as crossover point • The size of offspring can exceed that of the parents
Crossover Parent 2 Parent 1 Child 1 Child 2
Selection • Parent selection typically fitness proportionate • Over-selection in very large populations • rank population by fitness and divide it into two groups: • group 1: best x% of population, group 2 other (100-x)% • 80% of selection operations chooses from group 1, 20% from group 2 • for pop. size = 1000, 2000, 4000, 8000 x = 32%, 16%, 8%, 4% • motivation: to increase efficiency, %’s come from rule of thumb • Survivor selection: • Typical: generational scheme (thus none) • Recently steady-state is becoming popular for its elitism
Initialization • Maximum initial depth of trees Dmax is set • Full method (each branch has depth = Dmax): • nodes at depth d < Dmax randomly chosen from function set F • nodes at depth d = Dmax randomly chosen from terminal set T • Grow method (each branch has depth Dmax): • nodes at depth d < Dmax randomly chosen from F T • nodes at depth d = Dmax randomly chosen from T • Common GP initialisation: ramped half-and-half, where grow & full method each deliver half of initial population
Bloat (膨胀) • Bloat = “survival of the fattest”, i.e., the tree sizes in the population are increasing over time • Ongoing research and debate about the reasons • Needs countermeasures, e.g. • Prohibiting variation operators that would deliver “too big” children • Parsimony pressure: penalty for being oversized
Problems involving “physical” environments • Trees for data fitting vs. trees (programs) that are “really” executable • Execution can change the environment the calculation of fitness • Example: robot controller • Fitness calculations mostly by simulation, ranging from expensive to extremely expensive (in time) • But evolved controllers are often to very good
Example application: symbolic regression • Given some points in R2, (x1, y1), … , (xn, yn) • Find function f(x) s.t. i = 1, …, n : f(xi) = yi • Possible GP solution: • Representation by F = {+, -, /, sin, cos}, T = R {x} • Fitness is the error • All operators standard • pop.size = 1000, ramped half-half initialisation • Termination: n “hits” or 50000 fitness evaluations reached (where “hit” is if | f(xi) – yi | < 0.0001)
Discussion • Is GP: • The art of evolving computer programs ? • Means to automated programming of computers? • GA with another representation?
CREATING RANDOM PROGRAMS • Available functions F = {+, -, *, %, IFLTE} • IFLTE – if arg1 <= arg2 return arg3 else return arg4 • Available terminals T = {X, Y, Random-Constants} • The random programs are: • Of different sizes and shapes • Syntactically valid • Executable
MUTATION OPERATION • Select 1 parent probabilistically based on fitness • Pick point from 1 to NUMBER-OF-POINTS • Delete subtree at the picked point • Grow new subtree at the mutation point in same way as generated trees for initial random population (generation 0) • The result is a syntactically valid executable program • Put the offspring into the next generation of the population
CROSSOVER OPERATION • Select 2 parents probabilistically based on fitness • Randomly pick a number from 1 to NUMBER-OF-POINTS for 1st parent • Independently randomly pick a number for 2nd parent • The result is a syntactically valid executable program • Put the offspring into the next generation of the population • Identify the subtrees rooted at the two picked points
Architecture-Altering Operations • 1.subroutine duplication operation
Architecture-Altering Operations • 2. Argument duplication
Architecture-Altering Operations • 3.Subroutine creation operation
Architecture-Altering Operations • 4. Subroutine deletion
Architecture-Altering Operations • 5. Argument deletion
FIVE MAJOR PREPARATORY STEPS • Determining the set of terminals • Determining the set of functions • Determining the fitness measure • Determining the parameters for the run • Determining the method for designating a result and the criterion for terminating a run
概率增强式程序进化(PIPE) • Salustowicz & Schmidhuber (1997) • Probabilistic incremental program evolution (PIPE) • Model: • Probabilistic prototype tree (PPT) • Each node: Distribution over instruction set • Can grow and shrink (variable size) • Update algorithm • Similar to PBIL • Elitism is incorporated
Probability Prototype Tree • Complete n-ary tree • Each node Nd,w contains • Random constant, Rd,w • Variable probability vector • l+k components (instructions) • d : Node’s depth, w : Horizontal position • pd,w(i) : probability of choosing instruction i
Program Generation • Start with root node: d = w = 0 • Depth first, left-to-right traversal • Choose instruction i with pd,w(i) • If i is a random constant • If pd,w(i) > Tr use Rd,w • Uniformly random number
PIPE Algorithm • Initialize probabilistic prototype tree • Repeat until termination criteria is met • Create population of programs • Grow PPT if required • Evaluate population • Favor smaller programs if all is equal • Update & mutate PPT • Prune PPT
初始化 迭代次数=0 否 是 基于种群的学习 迭代次数+1 精华学习 否 找到满意解 是 停 止 PIPE算法程序流程图 Flowchart 迭代次数!=0
PPT Initialization • Random constant Rd,w = U[0,1) • pt= Probability of using terminal set • For all terminal instructions • pd,w(i) = pt/l • For all function instruction • pd,w(i) = (1-pt)/k
PPT Growth Growth “on demand”
Updating & Mutating PPT • Want best tree probability to be PT • pd,w(I) updated iteratively pd,w(i) = pd,w(i) + (1-pd,w(i)) • Mutation pd,w(i) = pd,w(i) + m (1-pd,w(i)) • Normalize probabilities
PPT Pruning • Prune if any pd,w(i) > Tp • Tp = 0.9
Summary: PIPE • PBIL like algorithm for evolving programs • Probabilistic prototype tree • Variable length • PBIL updation rule • Resultsbetter than GP • Many user defined constants • Effects are not understood
例子1:曲线拟和 • Sin(x)可以展开成标准泰勒公式 3 5 7 x x x = - + - + sin( x ) x , Forx Î R … 3 ! 5 ! 7 ! • 运算符集即可以选为 • 设计一个适应值函数(在这个实验中取期望输出与模型输出之间的绝对误差之和为适应值函数)计算问题的个体的适应值
Result 正旋函数的曲线拟和