170 likes | 331 Views
The Assessment and Application of Lineage Information in Genetic Programs for Producing Better Models. Gary D. Boetticher Boetticher@uhcl.edu Univ. of Houston - Clear Lake, Houston, TX, USA. Kim Kaminsky Kaminsky@uhcl.edu Univ. of Houston - Clear Lake, Houston, TX, USA.
E N D
The Assessment and Application of Lineage Information in Genetic Programs for Producing Better Models Gary D. Boetticher Boetticher@uhcl.edu Univ. of Houston - Clear Lake, Houston, TX, USA Kim Kaminsky Kaminsky@uhcl.edu Univ. of Houston - Clear Lake, Houston, TX, USA http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference onInformation Reuse and Integration
About the Author: Gary D. Boetticher • Ph.D. in Machine Learning and Software Engineering A neural network-based software reuse economic model • Executive member of IEEE Reuse Standard Committees (1990s) • Commercial consultant: U.S. Olympic Committee, LDDS Worldcom, Mellon Mortgage, … • Currently: Associate Professor Department of Comp. Science/Software Engineering University of Houston - Clear Lake, Houston, TX, USA boetticher@uhcl.edu • Research interests: Data mining, ML, Computational Bioinformatics, and Software metrics http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference onInformation Reuse and Integration
Motivating Questions Does chromosome lineage information within a Genetic Program (GP) provide any insight into the effectiveness of solving problems? If so, how could these insights be utilized to make better breeding decisions? http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference onInformation Reuse and Integration
X + Y (Z-X)+ Y (Z-X) * Y+X X* Y+X Genetic Program Overview X, Y, andZRESULT? 1) Create a population of equations 3) BreedEquations 4) Generate new populations and breed until a solution is found 2) Determine the fitness for each (1 /Stand. Error) http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference onInformation Reuse and Integration
Genetic Program Overview Generation N+1 Generation N Why discard legacy information? http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference onInformation Reuse and Integration
Goal: Examine fitness patterns over time Generation 3 Generation 1 Generation 2 Localized? Volatile? http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference onInformation Reuse and Integration
5 experiments using synthetic equations: Z = W + X + Y Z = 2 * X + Y – W Z = X / Y Z = X3 Z = W2 + W * X - Y Proof of Concept Experiments - 1 Data slightly perturbed to prevent premature convergence Genetic Program 1000 Chromosomes (Equations) 50 Generations Breeding based on fitness rank http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference onInformation Reuse and Integration
Proof of ConceptExperiments - 2 For the 1000 Chromosomes: • Divide into 5 groups of 200 (by fitness) • Focus on the best, middle, and worst groups • See where each group’s offspring occur in the next generation http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference onInformation Reuse and Integration
Results for Z = W + X + Y Best Middle Worst http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference onInformation Reuse and Integration
Results for Z = 2 * X + Y – W Best Middle Worst http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference onInformation Reuse and Integration
Results for Z = X / Y Best Middle Worst http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference onInformation Reuse and Integration
Results for Z = X 3 Best Middle Worst http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference onInformation Reuse and Integration
Results for Z = W 2 + W * X - Y Best Middle Worst http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference onInformation Reuse and Integration
Best class produces best offspring. Now what? Equations to model Z = Sin(W) + Sin(X) + Sin(Y) Z = log10(WX) + (Y * Z) Genetic Program 1000 Chromosomes (Equations) 50 Generations 20 Trials Applied Experiments Compare 2 Genetic Programs (GPs) 1) Use a vanilla-based GP 2) Use a GP that breeds only the top 20% of a population and replicates 5 times. http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference onInformation Reuse and Integration
Results for Z = Sin(W) + Sin(X) + Sin(Y) http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference onInformation Reuse and Integration
Results for Z = log10(W X) + (Y * Z) http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference onInformation Reuse and Integration
Conclusions • Proof of concept experiments demonstrate the viability of considering lineage in GPs • Applied experiments show that lineage-based GP modeling produce better results faster http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference onInformation Reuse and Integration