1 / 98

Biologically Inspired Intelligent Systems

Biologically Inspired Intelligent Systems. Lecture 10 Dr. Roger S. Gaborski. WEEKLY UPDATES PRESENTATIONS START Tuesday, 1025 -Updates even after you present -Final project material due 11/8 ( no exceptions) -Random selection -Volunteers?. Free Textbook. Essentials of Metaheuristics

mea
Download Presentation

Biologically Inspired Intelligent Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Biologically Inspired Intelligent Systems Lecture 10 Dr. Roger S. Gaborski Roger S. Gaborski

  2. WEEKLY UPDATES • PRESENTATIONS START Tuesday, 1025 • -Updates even after you present • -Final project material due 11/8 (no exceptions) • -Random selection • -Volunteers? Roger S. Gaborski

  3. Free Textbook • Essentials of Metaheuristics • Sean Luke, George Mason University • Available online at no cost: http://cs.gmu.edu/~sean/book/metaheuristics/ Some lecture material taken from pp1-49 Roger S. Gaborski

  4. General Evolution • Darwin – 1859 Charles Darwin proposed a model explaining evolution Roger S. Gaborski

  5. Principles of Evolution • Diversity of population is critical for adaptability to the environment. Population of structures that are heterogeneous. Individual phenotypes express different traits. • Populations evolve over many generations. Reproduction of a specific individual depends on specific conditions of the environment and the organism’s ability to survive and produce offspring • Offspring inherit their parents fitness, but the genetic material of the parents merge and mutation results in slight variations Roger S. Gaborski

  6. Learning • Rote Learning: No inference, direct implantation of knowledge • Learning by Instruction: Knowledge acquired from a teacher or organized source • Learning by Deduction: Deductive, truth preserving inferences and memorization of useful conclusions • Learning by Analog: Transformation of existing knowledge that bears similarity to new desired concept • Learning by Induction: • Learning from examples (concept acquisition): Based on set of examples and counter examples, induce general concept description that explains examples and counter examples. • Learning by observations and discovery (descriptive generalization, unsupervised learning): Search regularities and general rules explaining observations without a teacher Roger S. Gaborski (Michalski, Carbonell, Mitchell,…)

  7. Evolutionary Algorithms • Inductive learning by observation and discovery • No teacher exists who presents examples. System develops examples on its own • Creation of new examples (search points) by the algorithm is an inductive guess on basis of existing knowledge • Knowledge base – population • If new example is good, added to population (knowledge added to population) Roger S. Gaborski

  8. Fitness • Evaluate every solution in the population and determine its fitness • Fitness is a measure of how close the solution matches the problem’s objective • Fitness is calculated by a fitness function • Fitness functions are problem dependent • Fitness values are usually positive, with zero being a perfect score (the larger the fitness value the worse the solution) Roger S. Gaborski

  9. In1 In2 +1 bias y If we know the structure, can we evolve the weights? There are 7 weights. The solution is a point in a 7 dimensional space Roger S. Gaborski

  10. Example Problem • Write a Matlab function that will accept weight matrix W, two input vectors and a target vector. The function should return the total error value • Implement XOR with known weight matrix W • Use the following variables: • Weight matrix: W size 2x5 • Input values: Iin1 = [ 1 1 0 0] and Iin2 = [ 1 0 1 0] • Target Outputs: Y = [ 0 1 1 0] • 2 Neurons with outputs x1 and x2 Roger S. Gaborski

  11. Problem, continued • When the program executes with the correct W matrix performance should be similar to the example given in the previous lecture. • W = [-2.19 -2.20 .139 0 0;-2.81 -2.70 3.90 -31.8 0] • Use a sigmoid fcn, not tanh 1/(1+exp(-5*x)) The 5 controls the slope • total error should be approximately 0 • Test on AND and OR target vectors – are results correct (using given W matrix) • Experiment with random W matrices (but with 0’s in same locations) • Can you find another eight matrix that solves the problem? Roger S. Gaborski

  12. Gradient Ascent orGradient Descent • Find the maximum (or minimum) of a function http://en.wikipedia.org/wiki/File: Gradient_ascent_%28surface%29.png http://en.wikipedia.org/wiki/ Gradient_descent Roger S. Gaborski

  13. Gradient Ascent • Find the maximum of a function • Start with arbitrary value x • Add to this value a small fraction of the slope x = x+af’(x) a<1, f’(x) derivative • Positive slope, x increases • Negative slope, x decreases • x will continue towards the maximum, at the maximum of f(x) the slope is zero and x will no longer change Roger S. Gaborski

  14. Gradient Descent • Gradient descent is the same algorithm except the slope is subtracted to find the minimum of the function Roger S. Gaborski

  15. Issues • Convergence time – how large should ‘a’ be? To large and we may overshoot the maxima, too small and we may run out of time before maximum (or minimum is found) • Local maxima (minima) • Saddle points • Newton’s Method: Use both the first and second derivatives: X = x + a( f’(x)/f’’(x) ) Roger S. Gaborski

  16. Gradient Ascent is a local optimization method • Instead of starting at only one x, randomly select several x’s and apply gradient ascent at each x allowing a wider exploration of the solution space Roger S. Gaborski

  17. Multidimensional Functions Replace x with vector x (underscore indicating a vector) Slope f’(x) is now the gradient of x, f(x) The gradient is a vector where each element of the vector is the slope of x in that dimension Δ Roger S. Gaborski

  18. We are making the assumption that we can calculate the first and possibly the second derivative in each dimension. • What if we can’t? • We don’t know what the function is • We can • Create inputs • Test the inputs • Assess the results Roger S. Gaborski

  19. Metaheuristic Algorithm • Initialization Procedure: Provide one or more initial candidate solutions • Assessment Procedure: Assess quality of a solution • Modification Procedure: Make a copy of a candidate solution and produce a candidate that is slightly randomly different from candidate solution (Derivatives are not calculated) Roger S. Gaborski

  20. Hill-Climbing • Somehow create an initial candidate solution • Randomly modify the candidate solution • If modified candidate is better than initial solution, replace initial candidate with modified candidate • Continue process until solution found or run out of time Roger S. Gaborski

  21. Variations on Basic Hill Climbing • Make several modifications to the candidate instead of a single modification • Always keep the modified candidate (don’t compare) but keep a separate variable called ‘best’ that always retains the best discovered solution – at end of program, return ‘best’ Roger S. Gaborski

  22. Candidate Solutions • What does the solution candidate look like? • Single number • A vector of numbers (or matrix) • Tree or graph • Etc. • Assume fixed length vector containing real numbers Roger S. Gaborski

  23. Individual Creation • Assume vector length L and a range of valid entries, low and high • Use a random number generator (uniform distribution) and scale values to be between low and high: >> low = 10; high = 20; >> v = low + (high-low) .*rand(5,1) v = 18.2415 12.1823 10.9964 16.1951 11.0381 Roger S. Gaborski

  24. Check Range: >> low = 10; >> high = 20; >> test = low + (high-low) .*rand(10000,1); >> min(test) ans = 10.0012 >> max(test) ans = 19.9991 Roger S. Gaborski

  25. Modification of Individual • Add a small amount of uniformly distributed random noise to each component of vector v u = (2*rand(10000,1))-1; %u ranges from -1 to +1 Simply scale u to the desired range, r Desired range -r to r, let r=.1 resulting in: -.1 to +.1 >> u1 = r*n; >> min(u1) = -0.1000 >> max(u1) = 0.1000 v(i) = v(i) + u1(i), check that v(i) is within bounds: low, high Roger S. Gaborski

  26. >> u1(1:5) = -0.0476 -0.0297 0.0526 -0.0639 -0.1000 >> low = 10; high = 20; v = low + (high-low) .*rand(5,1) v = 14.7211 11.7077 17.7977 19.9797 16.2644 Modified v: >> v = v +u1(1:5) v = 14.6735 11.6780 17.8503 19.9158 16.1644 Roger S. Gaborski

  27. Effects of Range Value • If r is very small, hill climbing will explore only the local region and potentially get caught in local minima. • If r is very large, hilling climbing will bounce around and if its near the peak of the function it may miss it because it may overshoot the peak • r controls the degree of Exploration (randomly explore space) versus Exploitation (exploit local gradient) in the hill climbing algorithm Roger S. Gaborski

  28. Hill Climbing with Random Restarts • Extreme Exploration – random search • Extreme Exploitation – very small r • Combination: • Randomly select starting place x • Using small r, perform hill climbing for a random amount of time, save result if ‘Best’ • At end of time, randomly select new starting point x • Using small r, perform hill climbing for a random amount of time, save result if ‘Best’ • Repeat until solution found Roger S. Gaborski

  29. Affect of Time Interval • If random time interval is long, algorithm effectively becomes a Hill Climber algorithm • If random time interval is short, algorithm effectively becomes a random search • The random time interval drives the algorithm from one extreme to the other • Which is best  It Depends…. Roger S. Gaborski

  30. Roger S. Gaborski

  31. RANDOM SEARCH HILL CLIMBING RANDOM SEARCH HILL CLIMBING LEAD AWAY FROM MAXIMUM Roger S. Gaborski

  32. Previously, we required a bounded uniform distribution. The range of values was specified. A Gaussian distribution usually generates small numbers, but numbers of any magnitude are possible. Large numbers result in exploration Roger S. Gaborski

  33. GAUSSIAN DISTRIBUTION >>g1(1:5) = 0.0280 larger values -0.1634 -0.1019 1.0370 0.1884 PREVIOUSLY, UNIFORM >> u1(1:5) = -0.0476 -0.0297 0.0526 -0.0639 -0.1000 >> low = 10; high = 20; v = low + (high-low) .*rand(5,1) v = 14.7211 11.7077 17.7977 19.9797 16.2644 Modified v: >> v = v +u1(1:5) v = 14.6735 11.6780 17.8503 19.9158 16.1644 >> v = v +g1(1:5) v = 14.7491 11.5443 17.6958 21.0167 16.4528 Roger S. Gaborski

  34. Simulated Annealing • Differs from Hill Climbing in its decision when to replace the original individual (parent S) with the modified individual (the child R) • In Hill Climbing, check if modified individual is better. If it is, replace original • In simulated annealing, if the child is better, replace parent • If the child is NOT better, still replace parent with child with a certain probability P(t,R,S): • P(t,R,S)= exp(Quality(R)-Quality(S)/t) Roger S. Gaborski

  35. P(t,R,S)= exp(Quality(R)-Quality(S)/t) • Recall, R is worse than S • First, t ≥ 0 • (Quality(R) – Quality(S)) is negative • If R is much worse than S, the fraction is larger, and the probability is close to 0 • If R is very close to S, the probability is close to 1 and we will select R with reasonable probability • t is selectable, t close to 0, fraction is large and the probability is close to 0 • If t is large, probability is close to 1 Roger S. Gaborski

  36. Example • R(child) = 5, S(parent) = 8, t = 2 • P(t,R,S) = exp((R-S)/t) =0.2231 Raise t, t=8 • P(t,R,S) = exp((R-S)/t) = 0.6873 •  The probability of replace S with R increases when t increases •  Initially set t high causing the algorithm to move to the newly created solution even if it is worse than the current position (random walk) •  Slowly decrease t as the algorithm proceeds, eventually to zero (then it’s simple Hill climbing) Roger S. Gaborski

  37. Schedule • The rate we decrease t is called the algorithm’s schedule • The longer the schedule is, the longer the algorithm resembles a random walk and the more exploration it does Roger S. Gaborski

  38. Tabu Search • Keep a history of recently considered candidate solutions (tabu list) • Do not return to solutions on tabu list until there are sufficiently in the past • Keep a list of previous candidates of length k. After list is full, remove old candidates and add new candidates • Tabu Search operates in discrete spaces Roger S. Gaborski

  39. Tabu Search – real valued numbers? • Unlikely you will visit the same real valued location twice • Consider candidate to be on the list if it is sufficiently similar to member on the list • Similar measure needs to be determined Roger S. Gaborski

  40. Tabu List • Instead of candidate solutions, keep list of changes you made to specific features Roger S. Gaborski

  41. Population Methods • Keep a collection of candidate solutions and not just a single candidate (as in Hill Climbing) • Candidate solutions interact Roger S. Gaborski

  42. Evolutionary Computation (EC)- A Set of Techniques Based on population biology, genetics and evolution Roger S. Gaborski

  43. Evolutionary Algorithm (EA) • Generational algorithms – update entire population once per iteration • Steady-state algorithms – update the population a few samples at a time • Common EAs include Genetic Algorithms (GA) and Evolution Strategies (ES) • Both generational and steady state methods are used Roger S. Gaborski

  44. Common Terms (from S. Luke) • Individual – a candidate solution • Child and Parent – child is a modified copy of the candidate solution (parent) • Population – set of candidate solutions • Fitness - quality of solution • Selection – picking individuals based on their fitness • Mutation – simple modification to an individual • Recombination or Crossover – takes two parents and swaps sections, resulting in 2 children Roger S. Gaborski

  45. Terms, continued • Genotype or genome – individual’s data structure used during breeding • Chromosome – a genotype • Gene – a particular position in the chromosome • Phenotype – how the individual operates during fitness assessment • Generation – one cycle of fitness assessment, breeding Roger S. Gaborski

  46. Generational Evolutionary Computation Algorithm • First, construct an initial population • Iterate: • Assess fitness of individuals in population • Use fitness function to breed new population of children • Join parents and children to form new population • Continue until solution found or time runs out Roger S. Gaborski

  47. How Do Algorithms Differ? • Breeding: • Two parts: – select parents from population - modify (mutation and/or recombining) to form children • Join Operation • Completely replace parents with children • Keep fit parents and fit children Roger S. Gaborski

  48. Evolution Strategies (ES) • Truncation Selection Method (TSM) • Uses only mutation • Simplest ES algorithm is the µ,λ algorithm • Population of λ individuals • ITERATE: • Find fitness of all individuals • Delete all but µ fittest individuals (TSM) • Each of µ fittest individuals get to produce λ/ µ children through mutation resulting in λ new children • The children replace all the parents • Repeat fixed number of times, or until goal met Roger S. Gaborski

  49. ES (5,20) • µ = 5 and λ = 20 • Find the 5 fittest individuals • Each individual gets to produce 20/5 children through mutation = 4 • Total number of children, 4*5 = 20 • Replace all parents with the new children Roger S. Gaborski

  50. ES(µ+λ) • In the ES(µ,λ) all the parents are replaced with the children in the next generation • In the ES(µ+λ) algorithm, the next generation consists of the µ parents plus the λ new children • The parents and children compete • All successive generations are µ+λ in size Roger S. Gaborski

More Related