Biologically Inspired Intelligent Systems

Biologically Inspired Intelligent Systems Lecture 10 Dr. Roger S. Gaborski Roger S. Gaborski

WEEKLY UPDATES • PRESENTATIONS START Tuesday, 1025 • -Updates even after you present • -Final project material due 11/8 (no exceptions) • -Random selection • -Volunteers? Roger S. Gaborski

Free Textbook • Essentials of Metaheuristics • Sean Luke, George Mason University • Available online at no cost: http://cs.gmu.edu/~sean/book/metaheuristics/ Some lecture material taken from pp1-49 Roger S. Gaborski

General Evolution • Darwin – 1859 Charles Darwin proposed a model explaining evolution Roger S. Gaborski

Principles of Evolution • Diversity of population is critical for adaptability to the environment. Population of structures that are heterogeneous. Individual phenotypes express different traits. • Populations evolve over many generations. Reproduction of a specific individual depends on specific conditions of the environment and the organism’s ability to survive and produce offspring • Offspring inherit their parents fitness, but the genetic material of the parents merge and mutation results in slight variations Roger S. Gaborski

Learning • Rote Learning: No inference, direct implantation of knowledge • Learning by Instruction: Knowledge acquired from a teacher or organized source • Learning by Deduction: Deductive, truth preserving inferences and memorization of useful conclusions • Learning by Analog: Transformation of existing knowledge that bears similarity to new desired concept • Learning by Induction: • Learning from examples (concept acquisition): Based on set of examples and counter examples, induce general concept description that explains examples and counter examples. • Learning by observations and discovery (descriptive generalization, unsupervised learning): Search regularities and general rules explaining observations without a teacher Roger S. Gaborski (Michalski, Carbonell, Mitchell,…)

Evolutionary Algorithms • Inductive learning by observation and discovery • No teacher exists who presents examples. System develops examples on its own • Creation of new examples (search points) by the algorithm is an inductive guess on basis of existing knowledge • Knowledge base – population • If new example is good, added to population (knowledge added to population) Roger S. Gaborski

Fitness • Evaluate every solution in the population and determine its fitness • Fitness is a measure of how close the solution matches the problem’s objective • Fitness is calculated by a fitness function • Fitness functions are problem dependent • Fitness values are usually positive, with zero being a perfect score (the larger the fitness value the worse the solution) Roger S. Gaborski

In1 In2 +1 bias y If we know the structure, can we evolve the weights? There are 7 weights. The solution is a point in a 7 dimensional space Roger S. Gaborski

Example Problem • Write a Matlab function that will accept weight matrix W, two input vectors and a target vector. The function should return the total error value • Implement XOR with known weight matrix W • Use the following variables: • Weight matrix: W size 2x5 • Input values: Iin1 = [ 1 1 0 0] and Iin2 = [ 1 0 1 0] • Target Outputs: Y = [ 0 1 1 0] • 2 Neurons with outputs x1 and x2 Roger S. Gaborski

Problem, continued • When the program executes with the correct W matrix performance should be similar to the example given in the previous lecture. • W = [-2.19 -2.20 .139 0 0;-2.81 -2.70 3.90 -31.8 0] • Use a sigmoid fcn, not tanh 1/(1+exp(-5*x)) The 5 controls the slope • total error should be approximately 0 • Test on AND and OR target vectors – are results correct (using given W matrix) • Experiment with random W matrices (but with 0’s in same locations) • Can you find another eight matrix that solves the problem? Roger S. Gaborski

Gradient Ascent orGradient Descent • Find the maximum (or minimum) of a function http://en.wikipedia.org/wiki/File: Gradient_ascent_%28surface%29.png http://en.wikipedia.org/wiki/ Gradient_descent Roger S. Gaborski

Gradient Ascent • Find the maximum of a function • Start with arbitrary value x • Add to this value a small fraction of the slope x = x+af’(x) a<1, f’(x) derivative • Positive slope, x increases • Negative slope, x decreases • x will continue towards the maximum, at the maximum of f(x) the slope is zero and x will no longer change Roger S. Gaborski

Gradient Descent • Gradient descent is the same algorithm except the slope is subtracted to find the minimum of the function Roger S. Gaborski

Issues • Convergence time – how large should ‘a’ be? To large and we may overshoot the maxima, too small and we may run out of time before maximum (or minimum is found) • Local maxima (minima) • Saddle points • Newton’s Method: Use both the first and second derivatives: X = x + a( f’(x)/f’’(x) ) Roger S. Gaborski

Gradient Ascent is a local optimization method • Instead of starting at only one x, randomly select several x’s and apply gradient ascent at each x allowing a wider exploration of the solution space Roger S. Gaborski

Multidimensional Functions Replace x with vector x (underscore indicating a vector) Slope f’(x) is now the gradient of x, f(x) The gradient is a vector where each element of the vector is the slope of x in that dimension Δ Roger S. Gaborski

We are making the assumption that we can calculate the first and possibly the second derivative in each dimension. • What if we can’t? • We don’t know what the function is • We can • Create inputs • Test the inputs • Assess the results Roger S. Gaborski

Metaheuristic Algorithm • Initialization Procedure: Provide one or more initial candidate solutions • Assessment Procedure: Assess quality of a solution • Modification Procedure: Make a copy of a candidate solution and produce a candidate that is slightly randomly different from candidate solution (Derivatives are not calculated) Roger S. Gaborski

Hill-Climbing • Somehow create an initial candidate solution • Randomly modify the candidate solution • If modified candidate is better than initial solution, replace initial candidate with modified candidate • Continue process until solution found or run out of time Roger S. Gaborski

Variations on Basic Hill Climbing • Make several modifications to the candidate instead of a single modification • Always keep the modified candidate (don’t compare) but keep a separate variable called ‘best’ that always retains the best discovered solution – at end of program, return ‘best’ Roger S. Gaborski

Candidate Solutions • What does the solution candidate look like? • Single number • A vector of numbers (or matrix) • Tree or graph • Etc. • Assume fixed length vector containing real numbers Roger S. Gaborski

Individual Creation • Assume vector length L and a range of valid entries, low and high • Use a random number generator (uniform distribution) and scale values to be between low and high: >> low = 10; high = 20; >> v = low + (high-low) .*rand(5,1) v = 18.2415 12.1823 10.9964 16.1951 11.0381 Roger S. Gaborski

Check Range: >> low = 10; >> high = 20; >> test = low + (high-low) .*rand(10000,1); >> min(test) ans = 10.0012 >> max(test) ans = 19.9991 Roger S. Gaborski

Modification of Individual • Add a small amount of uniformly distributed random noise to each component of vector v u = (2*rand(10000,1))-1; %u ranges from -1 to +1 Simply scale u to the desired range, r Desired range -r to r, let r=.1 resulting in: -.1 to +.1 >> u1 = r*n; >> min(u1) = -0.1000 >> max(u1) = 0.1000 v(i) = v(i) + u1(i), check that v(i) is within bounds: low, high Roger S. Gaborski

>> u1(1:5) = -0.0476 -0.0297 0.0526 -0.0639 -0.1000 >> low = 10; high = 20; v = low + (high-low) .*rand(5,1) v = 14.7211 11.7077 17.7977 19.9797 16.2644 Modified v: >> v = v +u1(1:5) v = 14.6735 11.6780 17.8503 19.9158 16.1644 Roger S. Gaborski

Effects of Range Value • If r is very small, hill climbing will explore only the local region and potentially get caught in local minima. • If r is very large, hilling climbing will bounce around and if its near the peak of the function it may miss it because it may overshoot the peak • r controls the degree of Exploration (randomly explore space) versus Exploitation (exploit local gradient) in the hill climbing algorithm Roger S. Gaborski

Hill Climbing with Random Restarts • Extreme Exploration – random search • Extreme Exploitation – very small r • Combination: • Randomly select starting place x • Using small r, perform hill climbing for a random amount of time, save result if ‘Best’ • At end of time, randomly select new starting point x • Using small r, perform hill climbing for a random amount of time, save result if ‘Best’ • Repeat until solution found Roger S. Gaborski

Affect of Time Interval • If random time interval is long, algorithm effectively becomes a Hill Climber algorithm • If random time interval is short, algorithm effectively becomes a random search • The random time interval drives the algorithm from one extreme to the other • Which is best  It Depends…. Roger S. Gaborski

Roger S. Gaborski

RANDOM SEARCH HILL CLIMBING RANDOM SEARCH HILL CLIMBING LEAD AWAY FROM MAXIMUM Roger S. Gaborski

Previously, we required a bounded uniform distribution. The range of values was specified. A Gaussian distribution usually generates small numbers, but numbers of any magnitude are possible. Large numbers result in exploration Roger S. Gaborski

GAUSSIAN DISTRIBUTION >>g1(1:5) = 0.0280 larger values -0.1634 -0.1019 1.0370 0.1884 PREVIOUSLY, UNIFORM >> u1(1:5) = -0.0476 -0.0297 0.0526 -0.0639 -0.1000 >> low = 10; high = 20; v = low + (high-low) .*rand(5,1) v = 14.7211 11.7077 17.7977 19.9797 16.2644 Modified v: >> v = v +u1(1:5) v = 14.6735 11.6780 17.8503 19.9158 16.1644 >> v = v +g1(1:5) v = 14.7491 11.5443 17.6958 21.0167 16.4528 Roger S. Gaborski

Simulated Annealing • Differs from Hill Climbing in its decision when to replace the original individual (parent S) with the modified individual (the child R) • In Hill Climbing, check if modified individual is better. If it is, replace original • In simulated annealing, if the child is better, replace parent • If the child is NOT better, still replace parent with child with a certain probability P(t,R,S): • P(t,R,S)= exp(Quality(R)-Quality(S)/t) Roger S. Gaborski

P(t,R,S)= exp(Quality(R)-Quality(S)/t) • Recall, R is worse than S • First, t ≥ 0 • (Quality(R) – Quality(S)) is negative • If R is much worse than S, the fraction is larger, and the probability is close to 0 • If R is very close to S, the probability is close to 1 and we will select R with reasonable probability • t is selectable, t close to 0, fraction is large and the probability is close to 0 • If t is large, probability is close to 1 Roger S. Gaborski

Example • R(child) = 5, S(parent) = 8, t = 2 • P(t,R,S) = exp((R-S)/t) =0.2231 Raise t, t=8 • P(t,R,S) = exp((R-S)/t) = 0.6873 •  The probability of replace S with R increases when t increases •  Initially set t high causing the algorithm to move to the newly created solution even if it is worse than the current position (random walk) •  Slowly decrease t as the algorithm proceeds, eventually to zero (then it’s simple Hill climbing) Roger S. Gaborski

Schedule • The rate we decrease t is called the algorithm’s schedule • The longer the schedule is, the longer the algorithm resembles a random walk and the more exploration it does Roger S. Gaborski

Tabu Search • Keep a history of recently considered candidate solutions (tabu list) • Do not return to solutions on tabu list until there are sufficiently in the past • Keep a list of previous candidates of length k. After list is full, remove old candidates and add new candidates • Tabu Search operates in discrete spaces Roger S. Gaborski

Tabu Search – real valued numbers? • Unlikely you will visit the same real valued location twice • Consider candidate to be on the list if it is sufficiently similar to member on the list • Similar measure needs to be determined Roger S. Gaborski

Tabu List • Instead of candidate solutions, keep list of changes you made to specific features Roger S. Gaborski

Population Methods • Keep a collection of candidate solutions and not just a single candidate (as in Hill Climbing) • Candidate solutions interact Roger S. Gaborski

Evolutionary Computation (EC)- A Set of Techniques Based on population biology, genetics and evolution Roger S. Gaborski

Evolutionary Algorithm (EA) • Generational algorithms – update entire population once per iteration • Steady-state algorithms – update the population a few samples at a time • Common EAs include Genetic Algorithms (GA) and Evolution Strategies (ES) • Both generational and steady state methods are used Roger S. Gaborski

Common Terms (from S. Luke) • Individual – a candidate solution • Child and Parent – child is a modified copy of the candidate solution (parent) • Population – set of candidate solutions • Fitness - quality of solution • Selection – picking individuals based on their fitness • Mutation – simple modification to an individual • Recombination or Crossover – takes two parents and swaps sections, resulting in 2 children Roger S. Gaborski

Terms, continued • Genotype or genome – individual’s data structure used during breeding • Chromosome – a genotype • Gene – a particular position in the chromosome • Phenotype – how the individual operates during fitness assessment • Generation – one cycle of fitness assessment, breeding Roger S. Gaborski

Generational Evolutionary Computation Algorithm • First, construct an initial population • Iterate: • Assess fitness of individuals in population • Use fitness function to breed new population of children • Join parents and children to form new population • Continue until solution found or time runs out Roger S. Gaborski

How Do Algorithms Differ? • Breeding: • Two parts: – select parents from population - modify (mutation and/or recombining) to form children • Join Operation • Completely replace parents with children • Keep fit parents and fit children Roger S. Gaborski

Evolution Strategies (ES) • Truncation Selection Method (TSM) • Uses only mutation • Simplest ES algorithm is the µ,λ algorithm • Population of λ individuals • ITERATE: • Find fitness of all individuals • Delete all but µ fittest individuals (TSM) • Each of µ fittest individuals get to produce λ/ µ children through mutation resulting in λ new children • The children replace all the parents • Repeat fixed number of times, or until goal met Roger S. Gaborski

ES (5,20) • µ = 5 and λ = 20 • Find the 5 fittest individuals • Each individual gets to produce 20/5 children through mutation = 4 • Total number of children, 4*5 = 20 • Replace all parents with the new children Roger S. Gaborski

ES(µ+λ) • In the ES(µ,λ) all the parents are replaced with the children in the next generation • In the ES(µ+λ) algorithm, the next generation consists of the µ parents plus the λ new children • The parents and children compete • All successive generations are µ+λ in size Roger S. Gaborski

Biologically Inspired Intelligent Systems