370 likes | 562 Views
Genetic Programming. CSCE155 Fall 2004 Leen-Kiat Soh Department of Computer Science and Engineering University of Nebraska. Acknowledgments. The materials in this presentation are based on http://www.genetic-programming.org http://www.genetic-programming.com/gpanimatedtutorial.html.
E N D
Genetic Programming CSCE155 Fall 2004 Leen-Kiat Soh Department of Computer Science and Engineering University of Nebraska
Acknowledgments • The materials in this presentation are based on • http://www.genetic-programming.org • http://www.genetic-programming.com/gpanimatedtutorial.html
Introduction • One of the central challenges of computer science is to get a computer to do what needs to be done, without telling it how to do it • Genetic programming addresses this challenge by providing a method for automatically creating a working computer program from a high-level problem statement of the problem • Automatic programming (a.k.a. program synthesis or program induction)
Basic Steps • GP • A domain-independent method • Iteratively transforms a population of computer programs into a new generation of programs • Two sets of steps: • Preparatory steps • Executional steps
Preparatory Steps • The human user communicates the high-level statement of the problem to the genetic programming system by performing certain well-defined preparatory steps: • The set of terminals • The set of primitive functions • The fitness measure • Certain parameters for controlling the run • The termination criterion and method for designating the result of the run
Preparatory Steps • The first two preparatory steps specify the ingredients that are available to create the computer programs • A run of GP is a competitive search among a diverse population of programs composed of the available functions and terminals Termination Criterion & Result Designation Terminal Set Function Set Fitness Measure Parameters GP Computer Program
Preparatory StepsTerminal and Function Sets • The identification of the function set and terminal set for a particular problem is usually a straightforward process • The function set may consist of merely the arithmetic functions (+, -, *, /) and a conditional branching operator • The terminal set may consist of the program’s external inputs (independent variables) and numerical constants • Defines the search space
Preparatory StepsTerminal and Function Sets • Robot mopping floor example • Function set: moving, turning, swishing the mop, etc. • Controller example • Function set: signal processing functions that operate on time-domain signals, including integrators, differentiators, leads, lags, gains, adders, subtractors, etc. • Terminal set: reference signal and plant output • Analog electrical circuit synthesis example • Function set: building transistors, capacitors, resistors, etc. • Terminal set: wire, a circuit’s placement and routing, etc.
Preparatory StepsFitness Measure • Specifies what needs to be done • The primary mechanism for communicating the high-level statement of the problem’s requirements to the GP system • E.g., if the goal is to get GP to automatically synthesize an amplifier, the fitness function is the mechanism for telling GP to synthesize a circuit that amplifying an incoming signal is rewarding • Defines the search’s desired goal
Preparatory StepsControl Parameters • Specifies the control parameters for the run • Population size, probabilities of performing the genetic operations, the maximum size for programs, etc. • Defines the search’s administrative details
Preparatory StepsTermination • Specifies the termination criterion and the method of designating the result of the run • Termination criterion: a maximum number of generations to be run, a problem-specific success predicate, etc. • E.g., when the value of fitness for numerous successive best-of-generation individuals appear to have reached a plateau • The single best-so-far individual is then harvested and designated as the result of the run • Defines the search’s administrative details
Executional Steps • GP typically • Starts with a population of randomly generated computer programs composed of the available programmatic ingredients (functional and terminal sets) • Iteratively transforms a population of programs into a new generation of the population by applying analogs of naturally occurring genetic operations • Operations are applied to individual(s) selected from the population • Individual(s) are probabilistically selected to participate in the genetic operations based on theirfitness measure
Executional Steps • Steps are: • Randomly create an initial population (generation 0) of individual computer programs composed of the available functions and terminals • Iteratively perform the “genetic evolution” sub-steps (called a generation) on the population until the termination criterion is satisfied • After the termination criterion is satisfied, harvest the single best program in the population produced during the run (the best-so-far individual) and designate it as the result of the run • If the run is successful, the result may be a solution (or approximate solution) to the problem
Executional Steps • “Genetic Evolution” steps are: • Execute each program in the population and ascertain its fitness using the problem’s fitness measure • Select one or two individual program(s) from the population with a probability based on fitness (with re-selection allowed) to participate in the genetic operations • Create new individual program(s) using genetic operations
Genetic Operations • Reproduction Operation • Simply allow the selected program to survive to the next generation without any changes • This reproduction is typically performed quite frequently (say, 10%-15% during each generation of the run)
Genetic Operations • Mutation Operation • Only one parental program is needed • A mutation point is randomly chosen for the selected program, the subtree rooted at that point is deleted and a new subtree is grown using the same random growth process that was used to generate the initial population • This asexual mutation is typically performed sparingly (say, 1% during each generation of the run)
Genetic Operations • Crossover (Sexual Recombination) Operation • Two parental programs are needed • A crossover point is randomly chosenin the first parent and a crossover point is randomly chosen in the second parent. Then the subtree rooted at the crossover point of the first, or receiving, parent is deleted and replaced by the subtree from the second, or contributing, parent • This mutation is the predominant operation in GP (say, 85% to 90%)
Genetic Operations • Architecture-Altering Operations • Based on gene duplication and gene deletion in nature • For computer programs related problems: • Dynmically add and delete subrountines, arguments, iterations, loops, recursions, and memory, and also different hierarchical arrangements of these elements • Programs with architectures that are well-suited to the problem at hand will tend to grow and prosper in the competitive evolutionary process; while inadequate ones wither away. • These operations are applied sparingly during the run (say, 0.5% to 1% on each generation)
Genetic Operations • Architecture-Altering Operations, Cont’d • Subtroutine duplication • Duplicates a pre-existing subroutine in an individual program, gives a new name to the copy, and randomly divides the pre-existing calls to the old subroutine between the two • Broadensthe hierarchy and may lead to divergence later of the two subroutines, sometimes yielding specialization • Argument duplication • Duplicates one argument of a subroutine, randomly divides internal references to it, and preserves overall program semantics by adjusting all calls to the subroutine • Enlarges the dimensionality of the subspace on which the subroutine operates
Genetic Operations • Architecture-Altering Operations, Cont’d • Subtroutine creation • Creates a new subroutine from part of a main result-producing branch • Deepens the hierarchy of references in the overall program • Subtroutine deletion • Deletes a pre-existing subroutine • Narrows or make shallower the hierarchy of subroutines • Argument deletion • Deletes an argument from a subroutine • Reduces the amount of information available to the subroutine • Generalization
Tidbits • Each individual program in the population is executed so that each can be measured in terms of how well it performs the task at hand • This translates into a single explicit numerical value, called fitness • E.g., the amount of error between an individual program’s output and the desired output, the amount of time, the accuracy, the number of lines, the payoff that a game-playing program produces, etc. • The creation of the initial random population is a blind random search of the search space of the problem • Typically, the individual programs in generation 0 all have exceedingly poor fitness; but some are (usually) more fit than others and are selected for the next generation
Tidbits • With probabilistic selection, better individuals are favored over inferior individuals • The best individual in the population is not necessarily selected • The worst individual in the population is not necessarily passed over • After each generation, the population of offspring replaces the now-old generation • All programs in the initial random population (generation 0) of a run of GP are syntactically valid, executable programs • The genetic operations that are performed are also designed to produce offspring that are syntactically valid, executable programs
Example of a GP RunSymbolic Regression of A Quadratic Polynomial • Goal: automatically create a computer program whose output is equal to the values of the quadratic polynomial x*x + x + 1 in the range from -1 to 1 • Preparatory Steps: • Terminal Set: independent variable x • Functional Set: flexible, say: +, -, *, % • Fitness measure: compare result of an individual program with the result of x*x + x + 1 • A fitness (error) of zero would indicate a perfect fit
Example of a GP RunSymbolic Regression of A Quadratic Polynomial • Executional Steps: Figure 1 Initial population of four randomly created individuals of generation 0
Example of a GP RunSymbolic Regression of A Quadratic Polynomial • Executional Steps: Figure 2 The fitness of each of the four randomly created individuals of generation 0 is equal to the area between two curves: (a) 0.67, (b) 1.0, (c) 1.67, and (d) 2.67
Example of a GP RunSymbolic Regression of A Quadratic Polynomial • Executional Steps: Figure 3 Population of generation 1 (after one reproduction, one mutation, and one two-offspring crossover operation)
Human-Competitive Results • An automatically created result is “human-competitive” if it satisfies one or more of the eight criteria below: • (A) The result was patented as an invention in the past, is an improvement over a patented invention, or would qualify today as a patentable new invention • (B) The result is equal to or better than that was accepted as a new scientific result at the time when it was published in a peer-reviewed scientific journal • (C) The result is equal to or better than was placed into a database or archive of results maintained by an internationally recognized panel of scientific experts • (D) The result is publishable in its own right as a new scientific result—independent of the fact that the result was mechanically created
Human-Competitive Results • An automatically created result is “human-competitive” if it satisfies one or more of the eight criteria below, cont’d: • (E) The result is equal to or better than the most recent human-created solution to a long-standing problem for which there has been a succession of increasingly better human-created solutions • (F) The result is equal to or better than a result that was considered an achievement in its field at the time it was first discovered • (G) The result solves a problem of indisputable difficulty in its field • (H) The result holds its own or wins a regulated competition involving human contestants (in the form of either live human players or human-written computer programs)
36 Instances of GP-Generated Human-Competitive Results • 15 instances where GP has created an entity that either infringes or duplicates the functionality of a previously patented 20th-century invention • 6 instances where GP has done the same with respect to a 21st-century invention • 2 instances where GP has created a patentable new invention • Fields include • Computational molecular biology, cellular automata, sorting networks, and the synthesis of the design of both the topology and component sizing for complex structures, such as analog electrical circuits, controllers, and antenna
Web and Literature • The home page of Genetic Programming Inc. at www.genetic-programming.com. • For information about the field of genetic programming in general, visit www.genetic-programming.org • The home page ofJohn R. Koza at Genetic Programming Inc. (including online versions of most papers) and the home page ofJohn R. Koza at Stanford University • Information about the 1992 book Genetic Programming: On the Programming of Computers by Means of Natural Selection, the 1994 book Genetic Programming II: Automatic Discovery of Reusable Programs, the 1999 book Genetic Programming III: Darwinian Invention and Problem Solving, and the 2003 book Genetic Programming IV: Routine Human-Competitive Machine Intelligence.
Web and Literature • For information on 3,198 papers (many on-line) on genetic programming (as of June 27, 2003) by over 900 authors, see William Langdon’s bibliography on genetic programming. • For information on the Genetic Programming and Evolvable Machines journalpublished by Kluwer Academic Publishers • Important Conferences: • Genetic and Evolutionary Computation (GECCO) conference • NASA/DoD Conference on Evolvable Hardware Conference (EH) • Euro-Genetic-Programming Conference