650 likes | 801 Views
Evolution of Numeric Constants in Genetic Programming. Florida Atlantic University. Thomas Fernandez. Genetic Programming (GP). How can computers solve problems without being explicitly programmed?
E N D
Evolution of Numeric Constants in Genetic Programming Florida Atlantic University Thomas Fernandez
Genetic Programming (GP) • How can computers solve problems without being explicitly programmed? • GP is a domain independent method for inducing programs by searching the space of S-expressions.
Motivation • One of the weaknesses of GP is the difficulty it suffers in discovering useful numeric constants for the terminal nodes of the S-expression trees. • The goal of this work is to improve the effectiveness of GP by creating hybrid GP systems that use hill climbing and other local search techniques to improve these numeric constants.
Contributions • A cross-platform, object oriented, multi-processor enabled, GP/GA system implemented in C++. • A simple experimental design useful for analyzing different GP parameters, strategies and enhancements.
Contributions (continued) • Three hybrid GP systems using local search algorithms • Multi-Dimensional Hill Climbing (MDHC) • Vector Hill Climbing (VHC) • Numeric Mutation (NM)
Contributions (continued) • We analyze the performance of each of the three hybrid systems when applied to three problems with different difficulty. • We discuss the future directions for hybrid GP systems.
Charles Darwin’s principle of Natural Selection • Individuals having any advantage, however slight, over others, have the best chance of surviving and of procreating their kind. • Over time, the population as a whole will gradually evolve to have more combinations of these good attributes. • GP simulates this process of natural selection in a computer. • “…from so simple a beginning endless forms most beautiful and most wonderful have been, and are being evolved.”
Genetic Algorithms (GA) • Forerunner of GP • Evolve binary strings that represent the solutions to problems • Start with a population of random strings • During each generation a new population is formed. • Hopefully the solutions will improve.
Three steps to setting up a GA • 1) Devise a binary encoding representing the potential solutions to a problem. • 2) Define a fitness function. • 3) Set control parameters. • population size • maximum generations • probability of mutation and crossover • others
Running a GA • Generating an initial population of random binary strings • Create next generation • Randomly select elements for direct inclusion • Randomly select pairs of elements for mating • Offspring will have a combination of randomly selected parts of the binary strings of both parents • Select some elements for mutation. • Typically one or two random bits will be flipped
Running a GA (continued) • Repeatedly create new generations. • Terminate when an acceptable solution is been found or when the specified maximum number of generations is reached.
Genetic Programming Elements • The individual elements in the GA’s population are binary strings. The elements in the population of a GP are trees. • This is an example of an S-expression tree • (sqrt ( (a + b) /2.0 ) ) terminal set = {a, b, c, 0, 1, 2} function set ={+, -, *, /, SQRT}
Representation of a Program Structure as an S-expression tree. float treeFunc(float a) { if ( a>10.0) { return 20.0; } else { return a/2.0; } } Looping Constructs and Subroutine Calls are also possible.
Three steps to setting up a GP • Define an appropriate set of functions and terminals. • Functions must have closure. • Functions and Terminals must be sufficient. • Define a fitness function. • Set control parameters. • like GA population size, • maximum size or depth of the individual trees • size and the shape of the original trees • others
Running a GP • Generating a initial population of random S-expression trees. • Create next generation • Randomly select elements for direct inclusion • Randomly select pairs of elements for mating • Different from GA. • Select some elements for mutation. • Also different from GA.
Running a GP (continued) • Repeatedly create new generations. • Terminate when an acceptable solution is been found or when the specified maximum number of generations is reached. • The termination criteria is based on the number of hits, where a hit is defined as the successful completion of some subgoal.
Mutation with GP • Elements that are selected for mutation will have some randomly selected node (and any subtree under it) replaced with a randomly generated subtree. • Often mutation is not used in the GP process. • New research has indicated that it may be beneficial.
Early work with numeric constants • Arithmetic genesis • A terminal a variable be divided by itself in an S-expression resulting in the number 1.0. • 2.0 can evolve via an S-expression that adds 1.0 to itself. • 0.5 can evolve via an S-expression that divides 1.0 by 2.0. • An arbitrary number of constants can be created this way. • Arithmetic combination. • A number of numeric constants can be included in the terminal set. • They can be combined in S-expressions. • Any subtree with only numeric constants for terminal nodes can itself be thought of as a numeric constant node.
Arithmetic Genesis of Numeric Constants 1.0 2.0 0.5
Arithmetic Combination • Terminal Set ={ x, y, 0, 1, 2, 3 } -1.5
The Ephemeral Random Constant • Each time the ephemeral random constant is selected as a terminal in the creation of the initial population, it is replaced with a randomly generated number within some specified range. • Even with the use of the ephemeral random constant and/or the presence of predefined constants in the terminal set GP still has difficulty generating sufficient numeric constants.
The Ephemeral Random Constant Terminal Set = {a, 1, 2, R } R is replaced with random numbers
Numeric constants are a problem for GP • A problem consisting of discovering just a single numeric constant required 14 generations to create an S-expression comprising almost half a page. • John Koza, the discoverer of GP, said: “The finding of numeric constants is a skeleton in the GP closet... [and an] area of research that requires more investigation.”
A Simple Symbolic Regression Problem • We will try to evolve a function which passes through eleven given target points. The target points all lie on the curve defined by the function y=x2 + 3.141592654. y x
A More Difficult Symbolic Regression Problem • We will again try to evolve a function which passes through eleven given target points. This time the target points all lie on the curve defined by the function • y = x3 - 0.3x2 - 0.4x - 0.6 y x
Financial Symbolic Regression Problem • Our third problem is taken from the domain of financial analysis. The goal is to do symbolic regression, where the target points are a financial time series. • In this case we are using a target time series that is derived from the daily closing prices of the S&P 500 from the years 1994 and 1995. • Instead of using just one independent variable like the last two problems, we use 33 independent variables taken from time series that that are derived from the S&P 500 itself and from the closing daily prices of 32 Fidelity Select Mutual Funds.
Preprocessing the Financial Data • The first step to preprocessing the data is to take the 21 day moving average. • This is to reduce the effect of day to day fluctuations. • The second step is to take the percent change in the financial time series between the current day and 21 days prior. • This is to normalize the data within each series and between different series. • The target data points are taken from the S&P 500 and preprocessed in the same way but are also shifted 10 days into the past. • The result of this is that the system is trying to predict the general trend of this time series, two weeks into the future.
The Target data for the Financial Problem • The top line in the graph is the daily closing price of the S&P 500. The solid line below it is the graph of the target time series after the preprocessing. • The dotted line is a function evolved using GP. It is included here only as an example to illustrate that criterion for success does not require a great deal of accuracy.
The Example Evolved Function • y = (((0.38)-((-0.20923)-(FSPTX-(((-0.79706) /(0.38))*((FSUTX-FSCSX)*(FSCGX-(-0.34247))))))) *(SPX*((0.82794)/(0.54431)))) • The independent variables that were used by this evolved function are derived from the following time series. • FSPTX Fidelity select Technology Portfolio. • FSUTX Fidelity Select Utility Portfolio • FSCSX Fidelity Select Software Portfolio • FSCGX Fidelity Select Capital Goods Portfolio • SPX S&P 500 Index
Related Work • Adaptive Restrictive Tournament Selection and a Local Hill Climbing hybrid for the Identification of Multiple Good Design Solutions • R. Roy and I.C. Parmee • Application of a Hybrid Genetic Algorithm to Airline Crew Scheduling • David Levine • Hybridized Crossover-Based Search Techniques for Program Discovery • Una-May O’Reilly and Franz Oppacher • Exploring Alternative Operators and Search Strategies in Genetic Programming • Kim Harries and Peter Smith
Overview of Local Search • GP is more adept at finding the general form of an S-expression, but was less efficient at determining the appropriate numeric constants. • The three techniques for assisting the GP to find numeric constants are: • Multi-Dimensional Hill Climbing (MDHC) • Vector Hill Climbing (VHC) • Numeric Mutation (NM)
Features That They Have in Common • Selection of Individuals for Local Search • SelectionCount elements are randomly selected for the operation from the top SelectionGroup elements • Control of the Temperature Factor • temperature factor = • (bestElementFitnessScore *TemperatureScoreParm ) • temperature factor = • ( ( ( maxHits - hitsFromBestElement ) / maxHits ) * TemperatureHitsParm)
Multi-Dimensional Hill Climbing • For each selected numeric constant a delta value is randomly selected between zero and the current temperature factor. • For each selected constant three values will be considered, the original value, the original value plus the delta, and the original value minus the delta. • All combinations of one of these three values for each of the selected constants is then evaluated using the fitness function. • The best of these combinations is then selected to replace the original constants.
Multi-Dimensional Hill Climbing Example • Select only two constants so the hyper cuboid has two dimensions, and is a rectangle. • The original value of the constants is 10.0 and 20.0. • The temperature factor is currently 4.0. • Deltas for the two constants will be randomly selected from 0.0 to 4.0. Let us say that the delta for the first constant is 3.0 and the delta for the second constant is 1.0. • The drawing below shows all combinations of the original values plus and minus the deltas and the unchanged original values. • The best of all these combinations would replace the original values.
Multi-Dimensional Hill Climbing( 3D ) Each point represents an S-expression with the same form but different constants.
The Problem with MDHC • Unless HillClimbConstantCount is very small, the multi-dimensional hill climbing will require many calls to the fitness function . • The number of calls to the fitness function is calculated by raising 3 to the power of the number of numeric constants selected (HillClimbConstantCount ) and then multiplying this by the number of elements that are selected. • For example if 5 constants are selected and applied to 10 elements. 2430 ( 35 * 10 ) calls to the fitness function are required. • For this reason in our experiments we only applied MDHC to the highest scoring element. We also limited the number of constants selected to a maximum of 4.
Vector Hill Climbing • For each numeric constants, the fitness function is applied after adding a very small epsilon. The change in the fitness score is called the Delta for that constant. • All of the Deltas together are called the Delta Vector. • The Delta Vector is normalized by dividing each value in it by the largest absolute value in the vector. • A new position in the search space is determined by multiplying the Delta Vector by a temperature factor and adding it to the current constants. • If the fitness score of the new position is better than the original position all constants are replaced by the constants from the new position and this is then considered the current position in the search space. • The Delta Vector is then repeatedly added to the current position until the fitness function results in no further improvement or a maximum of 100 such additions is reached.
Vector Hill Climbing Example • Let us consider a simple two dimensional example, of an S-expression with two constants 50.0 and 12.5, and that the current fitness score is 100 and the temperature factor is 2.0. • We add a small epsilon say 0.01 to 50.0 and obtain a fitness score of 103.0, so s the delta for 50.0 is equal to 3. We restore the value of to 50.0 and add the epsilon to 12.5 and this time we get a fitness score of 98.5 so the delta for 12.5 is -1.5. • The delta vector (3, -1.5) is then normalized by dividing it by the largest absolute value in it, resulting in (1,-0.5). We can now determine the new position by multiplying the delta vector by the temperature factor (2.0) and adding it to the current position. • The delta vector times the temperature will equal (2,-1). When we add this to the current position our new position will be (52.0, 11.5). • The fitness function is called after replacing the constants with 52.0 and 11.5. If the fitness score was better than 100.0 we would keep the new constants. • Then the delta vector would be added to the constants again. This is repeated 100 times or till the fitness score shows no further improvement.
Numeric Mutation • NM replaces all of the numeric constants with new numeric constants, chosen at random from a uniform distribution within a specific selection range. • The selection range for each numeric constant is specified as the old value of that constant plus or minus the temperature factor. • NM is the simplest of the three techniques described here. • NM is not concerned with improving the fitness score and so it requires only one call to the fitness function. • It is feasible to apply the method to many elements during each generation of the GP.
Measuring GP Systems • Can these three local search techniques improve the GP process? • We need a means for measuring the effectiveness of the hybrid and non-hybrid GP systems. • A GP run is not always successful. We can estimate the success ratio for a GP system by making repeated runs and dividing the number of successful runs by the total number of runs.
The Experimental Design • Using the three problems we described, we will compare the success ratios of the three hybrid GP systems with the success ratios of a non-hybrid GP system. • We generally make multiple sets of runs with the hybrid systems using different values for some of the parameters. • To determine if the differences are statistically significant at a 95% confidence level we use the Two Tailed Large-Sample Statistical Test for Comparing Two Binomial Proportions.
MDHC and Simple Symbolic Regression Success Ratios