380 likes | 504 Views
Genetic Programming. Dinesh Dharme Prateek Srivastav Pankhil Chheda Harshad Kanhere. Outline. Introduction Motivation Theory Symbolic Regression using GP Applications Summary References. Inspiration. Based on Darwin’s concept of “Survival of Fittest”
E N D
Genetic Programming • Dinesh Dharme • Prateek Srivastav • Pankhil Chheda • Harshad Kanhere
Outline • Introduction • Motivation • Theory • Symbolic Regression using GP • Applications • Summary • References
Inspiration • Based on Darwin’s concept of “Survival of Fittest” • Fitter individuals survive and reproduce at a higher rate. • Structure of individuals in population changes because of natural selection.
What is Genetic Programming? • Part of larger discipline called Machine Learning. Machine learning can be best described as "the study of computer algorithms that improve automatically through experience" (Mitchell 1996). • It attempts to solve the problem - How can computers be made to do what needs to be done without being told exactly how to do it? • This is where the aspect of Artificial Intelligence comes into play.
Why Genetic Programming? • It saves time by freeing the human from having to design complex algorithms. Not only designing the algorithms but creating ones that give optimal solutions. • “It combines genetic algorithms with the basic thrust of AI, which was to get computers to do things automatically – to evolve a population of programs” - John R. Koza
Algorithm • Randomly create an initial population of programs from the available primitives • Repeat • Execute each program and ascertain its fitness • Select one or two program(s) from the population with a probability based on fitness to participate in genetic operations • Create new individual program(s) by applying genetic operations with specified probabilities • until an acceptable solution is found or some other stopping condition is met • return the best-so-far individual
Tree Structure • Programs are represented as tree. • (+ 1 2 (IF (> TIME 10) 3 4))
Basics • Terminal: The variables and constants in the program (x, y and 3) which forms leaves of the tree. • Funtion: The arithmetic operations (+, * and max) are internal nodes called functions. • Primitive set : Functions and terminals together from primitive set.
Initialization of population • Full Method: Selecting internal nodes from function set untill maximum depth is reached. At the end, select node from terminal set. All leaves are at same level. • Grow Method: Select nodes from the primitive set until maximum depth is reached. • Ramped half-and-half: Half the initial population is constructed using full and half is constructed using grow. Depth limits of trees not fixed.
Initialization of population(Contd.) • Initialization need not be entirely random. • If properties of the desired solution is known, produce trees accordingly. This creates intial bias in population for faster convergence.
Selection Strategies • Tournament Selection: Best individual from a set of randomly selected individuals is chosen to be parent. • Advantage: It rescales fitness among the population. A best program doesn't populate the next generation with its children reducing diversity. • Disadvantage: A small difference in fitness gets amplified in next generation. Even though it is just marginally better than its competitior.
Selection Strategies(Contd.) • Fitness-Proportionate Selection: Select individuals based on their fitness values.
Breeding Next Generation • Crossover, mutation and reproduction are main operators applied on parents to generate offsprings. • Subtree Crossover: Create the offspring by replacing the subtree rooted at the crossover point in a copy of the first parent with a copy of the subtree rooted at the crossover point in the second parent. • Reproduction: Simply copy an elite individual to next generation. Ensures passing of good genes.
Crossover contd. • Since most of the nodes of tree are leaves.Hence crossover results in replacement/exchange of small genetic material. • John R. Koza suggested 90 times functions and 10 times terminals should be selected for crossover.
Mutation • Subtree Mutation: Replace the mutation point by randomly generated tree. • Point Mutation: Randomly select a node and replace it with another primitive of same arity.
Contraints on Operators:closure • Type Consistency: Subtree crossover can mix and join nodes arbitrarily. Hence all functions used must be type consistent. i.e they return values of same type. • Evaluation safety: Subtree after evaluation should produce a valid value to be used. For this, protected versions of functions used. (e.g. “/” with 2nd argument 0 gives 1 as its output).
Sufficiency • Sufficiency means solution to the problem can be expressed through combinations of different primitives used. • In general sufficiency cannot be guaranteed. In areas where theory is well developed, we can decide on primitives to be used.
Fitness Function • Gives a measure of how good a computer program is at solving the given problem. • Proper encoding of problem through fitness functions is important for GP's success. • Measuring fitness of a given program means executing the program. • Evaluation of program tree is done in depth first search order.
GP Parameters Control parameters: • Population Size, N • Maximum number of generations, Gmax • Probability Pc of crossover • Probability Pm of mutation • Probability Pr of reproduction
Five Preparatory steps to set up GP • Determining the set T of terminals. • Determining the set F of functions. • Determining the fitness measures. • Determining the GP parameters. • Determining the Termination criteria and result designation
Symbolic Regression using GP • Symbolic regression: The process of mechanically creating a computer program that fit certain numerical data. • We want to evolve an expression whose values match those of the quadratic polynomial x2 + x + 1 in the range [−1,1].
SYMBOLIC REGRESSION Independent variable X Dependent variable Y -1.00 1.00 -0.80 0.84 -0.60 0.76 -0.40 0.76 -0.20 0.84 0.00 1.00 0.20 1.24 0.40 1.56 0.60 1.96 0.80 2.44 1.00 3.00
Preparatory Steps Objective: Find a computer program with one input (independent variable X) whose output equals the given data 1 Terminal set: T = {X, Random-Constants} 2 Function set: F = {+, -, *, %} 3 Fitness: The sum of the absolute value of the differences between the candidate program’s output and the given data (computed over numerous values of the independent variable x from –1.0 to +1.0) 4 Parameters: Population size M = 4 5 Termination: An individual emerges whose sum of absolute errors is less than 0.1
Symbolic Regression (initialization) Generation 0 with 4 individuals • Two parental chromosomes exchange part of their genetic information to create new hybrid combinations (recombinant). • No loss of genes, but an exchange of genes between two previous chromosomes. • No new genes created, preexisting old ones mixed together.
Symbolic regression(fitness) x + 1 x2 + 1 2 x 0.67 1.00 1.70 2.67
Symbolic Regression (Generation1) Second offspring of crossover of (a) and (b) picking “+” of parent (a) and left-most “x” of parent (b) as crossover points First offspring of crossover of (a) and (b) picking “+” of parent (a) and left-most “x” of parent (b) as crossover points Mutation of (c) : Picking “2” as mutation point Reproduction:Copy of (a)
Areas where GP will do well • The interrelationships among the relevant variables is unknown or poorly understood. • Finding the size and shape of the ultimate solution is a major part of the problem. • Search domain is very large and solutions are sparsely distributed. • There are good simulators to test the performance of tentative solutions to a problem, but poor methods to directly obtain good solutions.
Applications • GP was used in designing of an antenna for deployment on NASA’s Space Technology 5 Mission. • Designs generated are complex, non-intuitive, and 100% compliant with the mission antenna performance requirements.
Applications contd. • Result of Genetic Programming which was allowed to do branching. • Below we show the genotype for this antenna. Ref. [4]
forward(L,R): creates a rod of length L and radius R. • rotate-x(rad): rotates the rod by angle rad around x axis. Similarly others are defined.
Applications contd. • Result of genetic algorithm. • Branching not allowed. Ref. [4]
Summary • Genetic programming now routinely delivers high-return human-competitive machine intelligence. • The AI ratio (the “artificial-to-intelligence” ratio) of a problem-solving method as the ratio of that which is delivered by the automated operation of the artificial method to the amount of intelligence that is supplied by the human applying the method to a particular problem. • GP has very high AI ratio.
Summary • Routine:A problem solving method is routine if it is general and relatively little human effort is required to get the method to successfully handle new problems within a particular domain and to successfully handle new problems from a different domain. • Human-Competitiveness: Previously patented, an improvement over a patented invention, or patentable today. Publishable in its own right as a new scientific result.
References [1]Book: “A Field Guide to Genetic Programming” by Poli, Langdon, McPhee. 2008. [2]www.genetic-pragramming.com [3]http://en.wikipedia.org/wiki/Genetic_programming [4]Figures: “An Evolved Antenna for Deployment on Nasa’s ST5 Mission” by J. Lohn, G. Hornby, D. Linden, GTPT, May-2004
Suggested Readings Book: “Genetic Programming: On the programming of computers by means of natural selection” by John R. Koza, MIT Press Book: “Adaptation in Natural and Artificial Systems” by John H. Holland, MIT Press