Biologically Inspired Intelligent Systems

Biologically Inspired Intelligent Systems Lecture 05 Dr. Roger S. Gaborski Roger S. Gaborski

HW#1 AnswerMatlab Function function [wts, tar, results = HebbTrainingHW1(inp, tar) % inp is the input data % tar is the target data wts = tar’*inp; % Check if response is correct % wts contain the weight matrix results = sign(wts*inp’)’;

Quiz on Thursday • Review lecture material • Review recommended videos (see lecture notes for links) • Closed book except for one page of hand written notes • Turn in your one page of notes with your quiz (include your name on your notes) Roger S. Gaborski

Textbook • Essentials of Metaheuristics • Sean Luke, George Mason University • Available online at no cost: http://cs.gmu.edu/~sean/book/metaheuristics/ Some lecture material taken from pp1-49 Roger S. Gaborski

General Evolution • Darwin – 1859 Charles Darwin proposed a model explaining evolution Roger S. Gaborski

Principles of Evolution • Diversity of population is critical for adaptability to the environment. Population of structures that are heterogeneous. Individual phenotypes express different traits. • Populations evolve over many generations. Reproduction of a specific individual depends on specific conditions of the environment and the organism’s ability to survive and produce offspring • Offspring inherit their parents fitness, but the genetic material of the parents merge and mutation results in slight variations Roger S. Gaborski

Learning • Rote Learning: No inference, direct implantation of knowledge • Learning by Instruction: Knowledge acquired from a teacher or organized source • Learning by Deduction: Deductive, truth preserving inferences and memorization of useful conclusions • Learning by Analog: Transformation of existing knowledge that bears similarity to new desired concept • Learning by Induction: • Learning from examples (concept acquisition): Based on set of examples and counter examples, induce general concept description that explains examples and counter examples. • Learning by observations and discovery (descriptive generalization, unsupervised learning): Search regularities and general rules explaining observations without a teacher Roger S. Gaborski (Michalski, Carbonell, Mitchell,…)

Evolutionary Algorithms • Inductive learning by observation and discovery • No teacher exists who presents examples. System develops examples on its own • Creation of new examples (search points) by the algorithm is an inductive guess on basis of existing knowledge • Knowledge base – population • If new example is good, added to population (knowledge added to population) Roger S. Gaborski

Fitness • Evaluate every solution in the population and determine its fitness • Fitness is a measure of how close the solution matches the problem’s objective • Fitness is calculated by a fitness function • Fitness functions are problem dependent • Fitness values are usually positive, with zero being a perfect score (the larger the fitness value the worse the solution) Roger S. Gaborski

In1 In2 +1 bias y If we know the structure, can we evolve the weights? There are 7 weights. The solution is a point in a 7 dimensional space Roger S. Gaborski

Gradient Ascent orGradient Descent • Find the maximum (or minimum) of a function http://en.wikipedia.org/wiki/File: Gradient_ascent_%28surface%29.png http://en.wikipedia.org/wiki/ Gradient_descent Roger S. Gaborski

Gradient Ascent • Find the maximum of a function • Start with arbitrary value x • Add to this value a small fraction of the slope x = x+hf’(x) h<1, f’(x) derivative • Positive slope, x increases • Negative slope, x decreases • x will continue towards the maximum, at the maximum of f(x) the slope is zero and x will no longer change Roger S. Gaborski

Gradient Descent • Gradient descent is the same algorithm except the slope is subtracted to find the minimum of the function Roger S. Gaborski

Gradient descent: algorithm • Start with a point (guess) Repeat • Determine a descent direction • Choose a stepUpdate • Until stopping criterion is satisfied Roger S. Gaborski

Issues • Convergence time – how large should ‘h’ be? To large and we may overshoot the maxima, too small and we may run out of time before maximum (or minimum is found) • Local maxima (minima) • Saddle points • Newton’s Method: Use both the first and second derivatives: X = x + h( f’(x)/f’’(x) ) Roger S. Gaborski

Gradient Ascent is a local optimization method • Instead of starting at only one x, randomly select several x’s and apply gradient ascent at each x allowing a wider exploration of the solution space Roger S. Gaborski

Multidimensional Functions Replace x with vector x (underscore indicating a vector) Slope f’(x) is now the gradient of x, f(x) The gradient is a vector where each element of the vector is the slope of x in that dimension Δ Roger S. Gaborski

We are making the assumption that we can calculate the first and possibly the second derivative in each dimension. • What if we can’t? • We don’t know what the function is • We can • Create inputs • Test the inputs • Assess the results Roger S. Gaborski

Metaheuristic Algorithm • Initialization Procedure: Provide one or more initial candidate solutions • Assessment Procedure: Assess quality of a solution • Modification Procedure: Make a copy of a candidate solution and produce a candidate that is slightly randomly different from candidate solution (Derivatives are not calculated) Roger S. Gaborski

Hill-Climbing • Somehow create an initial candidate solution • Randomly modify the candidate solution • If modified candidate is better than initial solution, replace initial candidate with modified candidate • Continue process until solution found or run out of time Roger S. Gaborski

Hill Climbing Roger S. Gaborski

Roger S. Gaborski

Pick Initial (x,y) Location: 54,113 Roger S. Gaborski

Simple Implementation • Compare value of the function at four adjacent location – N,S,E and W of current position: Current location: (54, 113) N: (54, 114) S: (54, 112) E: (55, 113) W: (53, 113) • Update location to largest function value • If no difference, done Roger S. Gaborski

BIIShillClimbing1.m Roger S. Gaborski

Variations on Basic Hill Climbing • Make several modifications to the candidate instead of a single modification • Always keep the modified candidate (don’t compare) but keep a separate variable called ‘best’ that always retains the best discovered solution – at end of program, return ‘best’ Roger S. Gaborski

Individual Creation • Assume vector length L and a range of valid entries, low and high • Use a random number generator (uniform distribution) and scale values to be between low and high: >> low = 10; high = 20; >> v = low + (high-low) .*rand(5,1) v = 18.2415 12.1823 10.9964 16.1951 11.0381 Roger S. Gaborski

Check Range: >> low = 10; >> high = 20; >> test = low + (high-low) .*rand(10000,1); >> min(test) ans = 10.0012 >> max(test) ans = 19.9991 Roger S. Gaborski

Modification of Individual • Add a small amount of uniformly distributed random noise to each component of vector v u = (2*rand(10000,1))-1; %u ranges from -1 to +1 Simply scale u to the desired range, r Desired range -r to r, let r=.1 resulting in: -.1 to +.1 >> u1 = r*n; >> min(u1) = -0.1000 >> max(u1) = 0.1000 v(i) = v(i) + u1(i), check that v(i) is within bounds: low, high Roger S. Gaborski

>> u1(1:5) = -0.0476 -0.0297 0.0526 -0.0639 -0.1000 >> low = 10; high = 20; v = low + (high-low) .*rand(5,1) v = 14.7211 11.7077 17.7977 19.9797 16.2644 Modified v: >> v = v +u1(1:5) v = 14.6735 11.6780 17.8503 19.9158 16.1644 Roger S. Gaborski

Effects of Range Value • If r is very small, hill climbing will only explore local region and potentially get caught in local minima. • If r is very large, hilling climbing will bounce around and if its near the peak of the function it may miss it because it may overshoot the peak • r controls the degree of Exploration (randomly explore space) versus Exploitation (exploit local gradient) in the hill climbing algorithm Roger S. Gaborski

Function with Local Maximums Roger S. Gaborski

Another view Roger S. Gaborski

BIIShillClimbing2.m Roger S. Gaborski

Hill Climbing with Random Restarts • Extreme Exploration – random search • Extreme Exploitation – very small r • Combination: • Randomly select starting place x • Using small r, perform hill climbing for a random amount of time, save result if ‘Best’ • At end of time, randomly select new starting point x • Using small r, perform hill climbing for a random amount of time, save result if ‘Best’ • Repeat until solution found Roger S. Gaborski

Affect of Time Interval • If random time interval is long, algorithm effectively becomes a Hill Climber algorithm • If random time interval is short, algorithm effectively becomes a random search • The random time interval drives the algorithm from one extreme to the other • Which is best  It Depends…. Roger S. Gaborski

Random Restarts Roger S. Gaborski

>> BIIShillClimbing3 MATLAB/CV2011 Roger S. Gaborski

RANDOM SEARCH HILL CLIMBING RANDOM SEARCH HILL CLIMBING LEAD AWAY FROM MAXIMUM Roger S. Gaborski

Previously, we required a bounded uniform distribution. The range of values was specified. A Gaussian distribution usually generates small numbers, but numbers of any magnitude are possible. Large numbers result in exploration Roger S. Gaborski

GAUSSIAN DISTRIBUTION >>g1(1:5) = 0.0280 larger values -0.1634 -0.1019 1.0370 0.1884 PREVIOUSLY, UNIFORM >> u1(1:5) = -0.0476 -0.0297 0.0526 -0.0639 -0.1000 >> low = 10; high = 20; v = low + (high-low) .*rand(5,1) v = 14.7211 11.7077 17.7977 19.9797 16.2644 Modified v: >> v = v +u1(1:5) v = 14.6735 11.6780 17.8503 19.9158 16.1644 >> v = v +g1(1:5) v = 14.7491 11.5443 17.6958 21.0167 16.4528 Roger S. Gaborski

Simulated Annealing • Differs from Hill Climbing in its decision when to replace the original individual (parent S) with the modified individual (the child R) • In Hill Climbing, check if modified individual is better. If it is, replace original • In simulated annealing, if the child is better, replace parent • If the child is NOT better, still replace parent with child with a certain probability P(t,R,S): • P(t,R,S)= exp((Quality(R)-Quality(S))/t) Roger S. Gaborski

P(t,R,S)= exp((Quality(R)-Quality(S))/t) • Recall, R is worse than S • First, t ≥ 0 • (Quality(R) – Quality(S)) is negative • If R is much worse than S, the fraction is larger, and the probability is close to 0 • If R is very close to S, the probability is close to 1 and we will select R with reasonable probability • t is selectable, t close to 0, fraction is large and the probability is close to 0 • If t is large, probability is close to 1 Roger S. Gaborski

Biologically Inspired Intelligent Systems