760 likes | 891 Views
Biologically Inspired Intelligent Systems. Lecture 05 Dr. Roger S. Gaborski. HW#1 Answer Matlab Function. function [wts, tar, results = HebbTrainingHW1(inp, tar) % inp is the input data % tar is the target data wts = tar ’ *inp; % Check if response is correct
E N D
Biologically Inspired Intelligent Systems Lecture 05 Dr. Roger S. Gaborski Roger S. Gaborski
HW#1 AnswerMatlab Function function [wts, tar, results = HebbTrainingHW1(inp, tar) % inp is the input data % tar is the target data wts = tar’*inp; % Check if response is correct % wts contain the weight matrix results = sign(wts*inp’)’;
Quiz on Thursday • Review lecture material • Review recommended videos (see lecture notes for links) • Closed book except for one page of hand written notes • Turn in your one page of notes with your quiz (include your name on your notes) Roger S. Gaborski
Textbook • Essentials of Metaheuristics • Sean Luke, George Mason University • Available online at no cost: http://cs.gmu.edu/~sean/book/metaheuristics/ Some lecture material taken from pp1-49 Roger S. Gaborski
General Evolution • Darwin – 1859 Charles Darwin proposed a model explaining evolution Roger S. Gaborski
Principles of Evolution • Diversity of population is critical for adaptability to the environment. Population of structures that are heterogeneous. Individual phenotypes express different traits. • Populations evolve over many generations. Reproduction of a specific individual depends on specific conditions of the environment and the organism’s ability to survive and produce offspring • Offspring inherit their parents fitness, but the genetic material of the parents merge and mutation results in slight variations Roger S. Gaborski
Learning • Rote Learning: No inference, direct implantation of knowledge • Learning by Instruction: Knowledge acquired from a teacher or organized source • Learning by Deduction: Deductive, truth preserving inferences and memorization of useful conclusions • Learning by Analog: Transformation of existing knowledge that bears similarity to new desired concept • Learning by Induction: • Learning from examples (concept acquisition): Based on set of examples and counter examples, induce general concept description that explains examples and counter examples. • Learning by observations and discovery (descriptive generalization, unsupervised learning): Search regularities and general rules explaining observations without a teacher Roger S. Gaborski (Michalski, Carbonell, Mitchell,…)
Evolutionary Algorithms • Inductive learning by observation and discovery • No teacher exists who presents examples. System develops examples on its own • Creation of new examples (search points) by the algorithm is an inductive guess on basis of existing knowledge • Knowledge base – population • If new example is good, added to population (knowledge added to population) Roger S. Gaborski
Fitness • Evaluate every solution in the population and determine its fitness • Fitness is a measure of how close the solution matches the problem’s objective • Fitness is calculated by a fitness function • Fitness functions are problem dependent • Fitness values are usually positive, with zero being a perfect score (the larger the fitness value the worse the solution) Roger S. Gaborski
In1 In2 +1 bias y If we know the structure, can we evolve the weights? There are 7 weights. The solution is a point in a 7 dimensional space Roger S. Gaborski
Gradient Ascent orGradient Descent • Find the maximum (or minimum) of a function http://en.wikipedia.org/wiki/File: Gradient_ascent_%28surface%29.png http://en.wikipedia.org/wiki/ Gradient_descent Roger S. Gaborski
Gradient Ascent • Find the maximum of a function • Start with arbitrary value x • Add to this value a small fraction of the slope x = x+hf’(x) h<1, f’(x) derivative • Positive slope, x increases • Negative slope, x decreases • x will continue towards the maximum, at the maximum of f(x) the slope is zero and x will no longer change Roger S. Gaborski
Gradient Descent • Gradient descent is the same algorithm except the slope is subtracted to find the minimum of the function Roger S. Gaborski
Gradient descent: algorithm • Start with a point (guess) Repeat • Determine a descent direction • Choose a stepUpdate • Until stopping criterion is satisfied Roger S. Gaborski
Issues • Convergence time – how large should ‘h’ be? To large and we may overshoot the maxima, too small and we may run out of time before maximum (or minimum is found) • Local maxima (minima) • Saddle points • Newton’s Method: Use both the first and second derivatives: X = x + h( f’(x)/f’’(x) ) Roger S. Gaborski
Gradient Ascent is a local optimization method • Instead of starting at only one x, randomly select several x’s and apply gradient ascent at each x allowing a wider exploration of the solution space Roger S. Gaborski
Multidimensional Functions Replace x with vector x (underscore indicating a vector) Slope f’(x) is now the gradient of x, f(x) The gradient is a vector where each element of the vector is the slope of x in that dimension Δ Roger S. Gaborski
We are making the assumption that we can calculate the first and possibly the second derivative in each dimension. • What if we can’t? • We don’t know what the function is • We can • Create inputs • Test the inputs • Assess the results Roger S. Gaborski
Metaheuristic Algorithm • Initialization Procedure: Provide one or more initial candidate solutions • Assessment Procedure: Assess quality of a solution • Modification Procedure: Make a copy of a candidate solution and produce a candidate that is slightly randomly different from candidate solution (Derivatives are not calculated) Roger S. Gaborski
Hill-Climbing • Somehow create an initial candidate solution • Randomly modify the candidate solution • If modified candidate is better than initial solution, replace initial candidate with modified candidate • Continue process until solution found or run out of time Roger S. Gaborski
Hill Climbing Roger S. Gaborski
Pick Initial (x,y) Location: 54,113 Roger S. Gaborski
Simple Implementation • Compare value of the function at four adjacent location – N,S,E and W of current position: Current location: (54, 113) N: (54, 114) S: (54, 112) E: (55, 113) W: (53, 113) • Update location to largest function value • If no difference, done Roger S. Gaborski
BIIShillClimbing1.m Roger S. Gaborski
Variations on Basic Hill Climbing • Make several modifications to the candidate instead of a single modification • Always keep the modified candidate (don’t compare) but keep a separate variable called ‘best’ that always retains the best discovered solution – at end of program, return ‘best’ Roger S. Gaborski
Individual Creation • Assume vector length L and a range of valid entries, low and high • Use a random number generator (uniform distribution) and scale values to be between low and high: >> low = 10; high = 20; >> v = low + (high-low) .*rand(5,1) v = 18.2415 12.1823 10.9964 16.1951 11.0381 Roger S. Gaborski
Check Range: >> low = 10; >> high = 20; >> test = low + (high-low) .*rand(10000,1); >> min(test) ans = 10.0012 >> max(test) ans = 19.9991 Roger S. Gaborski
Modification of Individual • Add a small amount of uniformly distributed random noise to each component of vector v u = (2*rand(10000,1))-1; %u ranges from -1 to +1 Simply scale u to the desired range, r Desired range -r to r, let r=.1 resulting in: -.1 to +.1 >> u1 = r*n; >> min(u1) = -0.1000 >> max(u1) = 0.1000 v(i) = v(i) + u1(i), check that v(i) is within bounds: low, high Roger S. Gaborski
>> u1(1:5) = -0.0476 -0.0297 0.0526 -0.0639 -0.1000 >> low = 10; high = 20; v = low + (high-low) .*rand(5,1) v = 14.7211 11.7077 17.7977 19.9797 16.2644 Modified v: >> v = v +u1(1:5) v = 14.6735 11.6780 17.8503 19.9158 16.1644 Roger S. Gaborski
Effects of Range Value • If r is very small, hill climbing will only explore local region and potentially get caught in local minima. • If r is very large, hilling climbing will bounce around and if its near the peak of the function it may miss it because it may overshoot the peak • r controls the degree of Exploration (randomly explore space) versus Exploitation (exploit local gradient) in the hill climbing algorithm Roger S. Gaborski
Function with Local Maximums Roger S. Gaborski
Another view Roger S. Gaborski
BIIShillClimbing2.m Roger S. Gaborski
Hill Climbing with Random Restarts • Extreme Exploration – random search • Extreme Exploitation – very small r • Combination: • Randomly select starting place x • Using small r, perform hill climbing for a random amount of time, save result if ‘Best’ • At end of time, randomly select new starting point x • Using small r, perform hill climbing for a random amount of time, save result if ‘Best’ • Repeat until solution found Roger S. Gaborski
Affect of Time Interval • If random time interval is long, algorithm effectively becomes a Hill Climber algorithm • If random time interval is short, algorithm effectively becomes a random search • The random time interval drives the algorithm from one extreme to the other • Which is best It Depends…. Roger S. Gaborski
Random Restarts Roger S. Gaborski
>> BIIShillClimbing3 MATLAB/CV2011 Roger S. Gaborski
RANDOM SEARCH HILL CLIMBING RANDOM SEARCH HILL CLIMBING LEAD AWAY FROM MAXIMUM Roger S. Gaborski
Previously, we required a bounded uniform distribution. The range of values was specified. A Gaussian distribution usually generates small numbers, but numbers of any magnitude are possible. Large numbers result in exploration Roger S. Gaborski
GAUSSIAN DISTRIBUTION >>g1(1:5) = 0.0280 larger values -0.1634 -0.1019 1.0370 0.1884 PREVIOUSLY, UNIFORM >> u1(1:5) = -0.0476 -0.0297 0.0526 -0.0639 -0.1000 >> low = 10; high = 20; v = low + (high-low) .*rand(5,1) v = 14.7211 11.7077 17.7977 19.9797 16.2644 Modified v: >> v = v +u1(1:5) v = 14.6735 11.6780 17.8503 19.9158 16.1644 >> v = v +g1(1:5) v = 14.7491 11.5443 17.6958 21.0167 16.4528 Roger S. Gaborski
Simulated Annealing • Differs from Hill Climbing in its decision when to replace the original individual (parent S) with the modified individual (the child R) • In Hill Climbing, check if modified individual is better. If it is, replace original • In simulated annealing, if the child is better, replace parent • If the child is NOT better, still replace parent with child with a certain probability P(t,R,S): • P(t,R,S)= exp((Quality(R)-Quality(S))/t) Roger S. Gaborski
P(t,R,S)= exp((Quality(R)-Quality(S))/t) • Recall, R is worse than S • First, t ≥ 0 • (Quality(R) – Quality(S)) is negative • If R is much worse than S, the fraction is larger, and the probability is close to 0 • If R is very close to S, the probability is close to 1 and we will select R with reasonable probability • t is selectable, t close to 0, fraction is large and the probability is close to 0 • If t is large, probability is close to 1 Roger S. Gaborski