树结构编码进化优化算法

树结构编码进化优化算法 济南大学计算智能实验室陈月辉 yhchen@ujn.edu.cnhttp://cilab.ujn.edu.cn

Genetic Programming • Developed: USA in the 1990’s • Early names: J. Koza • Typically applied to: • machine learning tasks (prediction, classification …) • Attributed features: • competes with neural nets and alike • needs huge populations (thousands) • slow • Special: • non-linear chromosomes: trees, graphs • mutation possible but not necessary (disputed!)

GP technical summary tableau

Introductory example: credit scoring • Bank wants to distinguish good from bad loan applicants • Model needed that matches historical data

Introductory example: credit scoring • A possible model: • IF (NOC = 2) AND (S > 80000) THEN good ELSE bad • In general: • IF formula THEN good ELSE bad • Only unknown is the right formula, hence • Our search space (phenotypes) is the set of formulas • Natural fitness of a formula: percentage of well classified cases of the model it stands for • Natural representation of formulas (genotypes) is: parse trees

AND = > NOC 2 S 80000 Introductory example: credit scoring IF (NOC = 2) AND (S > 80000) THEN good ELSE bad can be represented by the following tree

Tree based representation • Trees are a universal form, e.g. consider • Arithmetic formula • Logical formula • Program (x  true)  (( x  y )  (z  (x  y))) i =1; while (i < 20) { i = i +1 }

Tree based representation

Tree based representation (x  true)  (( x  y )  (z  (x  y)))

Tree based representation i =1; while (i < 20) { i = i +1 }

Tree based representation • In GA, ES, EP chromosomes are linear structures (bit strings, integer string, real-valued vectors, permutations) • Tree shaped chromosomes are non-linear structures • In GA, ES, EP the size of the chromosomes is fixed • Trees in GP may vary in depth and width

Tree based representation • Symbolic expressions can be defined by • Terminal set T • Function set F (with the arities of function symbols) • Adopting the following general recursive definition: • Every t  T is a correct expression • f(e1, …, en) is a correct expression if f  F, arity(f)=n and e1, …, en are correct expressions • There are no other forms of correct expressions • In general, expressions in GP are not typed (closure property: any f  F can take any g  F as argument)

Offspring creation scheme • Compare • GA scheme using crossover AND mutation sequentially (be it probabilistically) • GP scheme using crossover OR mutation (chosen probabilistically)

Flowchart GA flowchart GP flowchart

Mutation • Most common mutation: replace randomly chosen subtree by randomly generated tree

Mutation cont’d • Mutation has two parameters: • Probability pm to choose mutation vs. recombination • Probability to chose an internal point as the root of the subtree to be replaced • Remarkably pm is advised to be 0 (Koza’92) or very small, like 0.05 (Banzhaf et al. ’98) • The size of the child can exceed the size of the parent

Recombination = Crossover • Most common recombination: exchange two randomly chosen subtrees among the parents • Recombination has two parameters: • Probability pc to choose recombination vs. mutation • Probability to chose an internal point within each parent as crossover point • The size of offspring can exceed that of the parents

Crossover Parent 2 Parent 1 Child 1 Child 2

Selection • Parent selection typically fitness proportionate • Over-selection in very large populations • rank population by fitness and divide it into two groups: • group 1: best x% of population, group 2 other (100-x)% • 80% of selection operations chooses from group 1, 20% from group 2 • for pop. size = 1000, 2000, 4000, 8000 x = 32%, 16%, 8%, 4% • motivation: to increase efficiency, %’s come from rule of thumb • Survivor selection: • Typical: generational scheme (thus none) • Recently steady-state is becoming popular for its elitism

Initialization • Maximum initial depth of trees Dmax is set • Full method (each branch has depth = Dmax): • nodes at depth d < Dmax randomly chosen from function set F • nodes at depth d = Dmax randomly chosen from terminal set T • Grow method (each branch has depth  Dmax): • nodes at depth d < Dmax randomly chosen from F  T • nodes at depth d = Dmax randomly chosen from T • Common GP initialisation: ramped half-and-half, where grow & full method each deliver half of initial population

Bloat （膨胀） • Bloat = “survival of the fattest”, i.e., the tree sizes in the population are increasing over time • Ongoing research and debate about the reasons • Needs countermeasures, e.g. • Prohibiting variation operators that would deliver “too big” children • Parsimony pressure: penalty for being oversized

Problems involving “physical” environments • Trees for data fitting vs. trees (programs) that are “really” executable • Execution can change the environment  the calculation of fitness • Example: robot controller • Fitness calculations mostly by simulation, ranging from expensive to extremely expensive (in time) • But evolved controllers are often to very good

Example application: symbolic regression • Given some points in R2, (x1, y1), … , (xn, yn) • Find function f(x) s.t. i = 1, …, n : f(xi) = yi • Possible GP solution: • Representation by F = {+, -, /, sin, cos}, T = R {x} • Fitness is the error • All operators standard • pop.size = 1000, ramped half-half initialisation • Termination: n “hits” or 50000 fitness evaluations reached (where “hit” is if | f(xi) – yi | < 0.0001)

Discussion • Is GP: • The art of evolving computer programs ? • Means to automated programming of computers? • GA with another representation?

CREATING RANDOM PROGRAMS • Available functions F = {+, -, *, %, IFLTE} • IFLTE – if arg1 <= arg2 return arg3 else return arg4 • Available terminals T = {X, Y, Random-Constants} • The random programs are: • Of different sizes and shapes • Syntactically valid • Executable

CREATING RANDOM PROGRAMS

MUTATION OPERATION • Select 1 parent probabilistically based on fitness • Pick point from 1 to NUMBER-OF-POINTS • Delete subtree at the picked point • Grow new subtree at the mutation point in same way as generated trees for initial random population (generation 0) • The result is a syntactically valid executable program • Put the offspring into the next generation of the population

MUTATION OPERATION

CROSSOVER OPERATION • Select 2 parents probabilistically based on fitness • Randomly pick a number from 1 to NUMBER-OF-POINTS for 1st parent • Independently randomly pick a number for 2nd parent • The result is a syntactically valid executable program • Put the offspring into the next generation of the population • Identify the subtrees rooted at the two picked points

CROSSOVER OPERATION

Architecture-Altering Operations • 1.subroutine duplication operation

Architecture-Altering Operations • 2. Argument duplication

Architecture-Altering Operations • 3.Subroutine creation operation

Architecture-Altering Operations • 4. Subroutine deletion

Architecture-Altering Operations • 5. Argument deletion

FIVE MAJOR PREPARATORY STEPS • Determining the set of terminals • Determining the set of functions • Determining the fitness measure • Determining the parameters for the run • Determining the method for designating a result and the criterion for terminating a run

概率增强式程序进化（PIPE） • Salustowicz & Schmidhuber (1997) • Probabilistic incremental program evolution (PIPE) • Model: • Probabilistic prototype tree (PPT) • Each node: Distribution over instruction set • Can grow and shrink (variable size) • Update algorithm • Similar to PBIL • Elitism is incorporated

Probability Prototype Tree • Complete n-ary tree • Each node Nd,w contains • Random constant, Rd,w • Variable probability vector • l+k components (instructions) • d : Node’s depth, w : Horizontal position • pd,w(i) : probability of choosing instruction i

Program Generation • Start with root node: d = w = 0 • Depth first, left-to-right traversal • Choose instruction i with pd,w(i) • If i is a random constant • If pd,w(i) > Tr use Rd,w • Uniformly random number

Example: PPT & Generation

PIPE Algorithm • Initialize probabilistic prototype tree • Repeat until termination criteria is met • Create population of programs • Grow PPT if required • Evaluate population • Favor smaller programs if all is equal • Update & mutate PPT • Prune PPT

初始化 迭代次数=0 否是基于种群的学习迭代次数+1 精华学习否找到满意解是停止 PIPE算法程序流程图 Flowchart 迭代次数！=0

PPT Initialization • Random constant Rd,w = U[0,1) • pt= Probability of using terminal set • For all terminal instructions • pd,w(i) = pt/l • For all function instruction • pd,w(i) = (1-pt)/k

PPT Growth Growth “on demand”

Updating & Mutating PPT • Want best tree probability to be PT • pd,w(I) updated iteratively pd,w(i) = pd,w(i) +  (1-pd,w(i)) • Mutation pd,w(i) = pd,w(i) + m (1-pd,w(i)) • Normalize probabilities

PPT Pruning • Prune if any pd,w(i) > Tp • Tp = 0.9

Summary: PIPE • PBIL like algorithm for evolving programs • Probabilistic prototype tree • Variable length • PBIL updation rule • Resultsbetter than GP • Many user defined constants • Effects are not understood

例子1:曲线拟和 • Sin(x)可以展开成标准泰勒公式 3 5 7 x x x = - + - + sin( x ) x , Forx Î R … 3 ! 5 ! 7 ! • 运算符集即可以选为 • 设计一个适应值函数（在这个实验中取期望输出与模型输出之间的绝对误差之和为适应值函数）计算问题的个体的适应值

Parameter Setting

Result 正旋函数的曲线拟和

树结构编码进化优化算法

树结构编码进化优化算法

Presentation Transcript