160 likes | 315 Views
Recurrent Networks. Psych 419/719 Feb 22, 2001. Extending Backprop to Recurrent Networks. Consider activation of units at a given time t. Instead of using backprop equations for previous unit activity in space (e.g., previous layer), use previous activity in time. tn. Wba. t2. Waa. Wba.
E N D
Recurrent Networks Psych 419/719 Feb 22, 2001
Extending Backprop toRecurrent Networks • Consider activation of units at a given time t. • Instead of using backprop equations for previous unit activity in space (e.g., previous layer), use previous activity in time
tn Wba t2 Waa Wba Wab Wbb Waa Wbb t1 a Wab b Wba Wab Wbb Waa t0 a b “Unrolling” a Network ...
The Math is Similar To Normal Backprop • Gradient term at time t is sum of error at time t (difference between output and target at time t), and error term propagated backwards to that unit from previous time slice. • Some units still don’t have targets. Or, might not have targets at all time samples. • Some units have both targets, and error propagated backwards.
Useful Applications • Time varying behavior • Pressure for speed. Inject error in early samples. • Attractor networks. Settle to a stable state. • Maintain “memory” of events. Activity is a result of current input, and computations on previous inputs.
Time Varying Behavior • Can build oscillators • Rhythmic behavior: walking, chewing… • Sequences of actions • Motor plans: reaching for a cup of coffee, writing, speech… • Higher level plans: making a cup of coffee, going to a movie...
Inputs for Time Varying Behavior • Can be static (like, a plan) • Input is some code meaning “make cup of coffee” • Output: t0 get cup; t1 grind coffee; t2 get water • Can make input be last step done • Input: get cup; output: grind coffee • Input: grind coffee; output: get water….
Pressure for Speed • Suppose network is run for 10 time samples • Inject error on samples 2-10. • Network is penalized (gets error) not only for producing wrong answer, but for not producing right answer rapidly • Works well with frequency weighted stimuli: • Frequent items really pressured to go faster
the dog anvil Example error time
Attractor Networks:Some Background (Autoencoder) • Input and Output reps have the same semantics • Train network to reproduce input in output • Hidden units compress representation • Can do pattern completion, repair noisy or incomplete data
“cleanup” units Representation Attractor Networks:A Recurrent Implementation Can repair partial patterns over time
Attractor Networks UsuallyUsed in Larger Networks • Plaut & Shallice (1991) deep dyslexia model: attractor in semantic space • Damage to attractor -> semantic errors • Harm & Seidenberg (1999) phonological dyslexia model: attractor in phonological space • Damage to attractor -> errors in generalization
“Memory” of sequential events • The “Elman” network • For sentences: present words one at a time. Target at each step is next word. • Has “context” units, which are copies of hidden unit activity on previous time step • Learns to predict what kinds of words can follow, based on sequence we’ve seen “Prediction” “context” Current Input
Extension to Continuous Time • Activity changes slowly in response to input, not instantly. • Approximate continuous time by taking small discrete samples output input time
Two Ways to do Continuous Time • Time averaged outputs: the output of a unit is weighted sum of its previous output, and what it is being driven to. • Time averaged inputs: the effective input to a unit is weighted sum of previous effective input value, and what it is being driven to. Output is instantaneous result of current effective input.