200 likes | 373 Views
Stochastic Optimization and Simulated Annealing. Psychology 85-419/719 January 25, 2001. In Previous Lecture. Discussed constraint satisfaction networks, having: Units, weights, and a “goodness” function Updating states involves computing input from other units
E N D
Stochastic Optimizationand Simulated Annealing Psychology 85-419/719 January 25, 2001
In Previous Lecture... • Discussed constraint satisfaction networks, having: • Units, weights, and a “goodness” function • Updating states involves computing input from other units • Guaranteed to locally increase goodness • Not guaranteed to globally increase goodness
True Optima Local Optima Goodness Activation State The General Problem: Local Optima
How To Solve the Problemof Local Optima? • Exhaustive search? • Nah. Takes too long. n units have 2 to the nth power possible states (if binary) • Random re-starts? • Seems wasteful. • How about something that generally goes in the right direction, with some randomness?
Sometimes It Isn’t Best ToAlways Go Straight TowardsThe Goal • Rubik’s Cube: Undo some moves in order to make progress • Baseball: sacrifice fly • Navigation: move away from goal, to get around obstacles
Activation State Randomness Can Help Us Escape Bad Solutions
So, How Random Do WeWant to Be? • We can take a cue from physical systems • In metallurgy, metals can reach a very strong (stable) state by: • Melting it; scrambles molecular structure • Gradually cooling it • Resulting molecular structure very stable • New terminology: reduce energy (which is kind of like the negative of goodness)
The input to the unit, net The temperature, T Simulated Annealing Odds that a unit is on is a function of:
Picking it Apart... • As net increases, probability that output is 1 increases • e is raised to the negative of net/T; so as net gets big, e to the negative of net/T goes to zero. So probability goes to 1/1=1.
The Temperature Term • When T is big, the exponent for e goes to zero. • e (or anything) to the zero power is 1 • So, odds output is 1 goes to 1/(1+1)=0.5
The Temperature Term (2) • When T gets small, exponent gets big. • Effect of net becomes amplified.
Low Temp Med Temp High Temp Different Temperatures... 1 .5 Probability Output is 1 0 Net Input
T 0 50 100 Ok, So At What RateDo We Reduce Temperature? In general, must decrease it very slowly to guarantee convergence to global optimum In practice, we can get away with a more aggressive annealing schedule..
Putting it Together... • We can represent facts, etc. as units • Knowledge about these facts encoded as weights • Network processing fills in gaps, makes inferences, forms interpretations • Stable Attractors form; the weights and input sculpt these attractors. • Stability (and goodness) enhanced with randomness in updating process.
Stable Attractors Can BeThought Of As Memories • How many stable patterns can be remembered by a network with N units? • There are 2 to the N possible patterns… • … but only about 0.15*N will be stable • To remember 100 things, need 100/0.15=666 units! • (then again, the brain has about 10 to the 12th power neurons…)
Human Performance, When Damaged (some examples) • Category coordinate errors • Naming a CAT as a DOG • Superordinate errors • Naming a CAT as an ANIMAL • Visual errors (deep dyslexics) • Naming SYMPATHY as SYMPHONY • or, naming SYMPATHY as ORCHESTRA
CAT CAT COT COT The Attractors We’ve TalkedAbout Can Be UsefulIn Understanding This “CAT” “CAT” Normal Performance A Visual Error (see Plaut Hinton, Shallice)
Properties of Human Memory • Details tend to go first, more general things next. Not all-or-nothing forgetting. • Things tend to be forgotten, based on • Salience • Recency • Complexity • Age of acquisition?
Do These Networks Have These Properties? • Sort of. • Graceful degradation. Features vanish as a function of strength of input to them. • Complexity: more complex / arbitrary patterns can be more difficult to retain • Salience, recency, age of acquisition? • Depends on learning rule. Stay tuned
Next Time:Psychological Implications:The IAC Model of Word Perception • Optional reading: McClelland and Rumelhart ‘81 (handout) • Rest of this class: Lab session. Help installing software, help with homework.