340 likes | 572 Views
ICO Learning. Gerhard Neumann Seminar A, SS06. Overview. Short Overview of different control methods Correlation Based Learning ISO Learning Comparison to other Methods ([Wörgötter05]) TD Learning STDP ICO Learning ([Porr06]) Learning Receptive Fields ([Kulvicius06]).
E N D
ICO Learning Gerhard Neumann Seminar A, SS06
Overview • Short Overview of different control methods • Correlation Based Learning • ISO Learning • Comparison to other Methods ([Wörgötter05]) • TD Learning • STDP • ICO Learning ([Porr06]) • Learning Receptive Fields ([Kulvicius06])
Comparison of ISO learning to other Methods • Comparison for Classical Conditioning learning Problems (open loop control) • Relating RL to Classical Conditioning • Classical Conditioning: Pairing of two subsequent stimuli is learned such that the presentation of the first stimulus is taken as a predictor of the second one. • RL: Maximization of Rewards: • v … Predictor of future reward
RL for Classical Conditioning • TD-Error: • Derivation Term : • Weight Change: • => Nothing new so far… • Goal: Output v should react after learning to the onset of the CS xn, and remains active until the reward terminates • Present CS internally by a chain of n + 1 delayed pulses xi • Replace the states from traditional RL with time steps
RL for Classical Conditioning • Special kind of E-Trace • Serial Compound Representation • Learning Steps: • Rectangular response of v • Special Treatment of the reward not necessary • x0 can replace the reward when setting w0 to 1 at the beginning
Comparison for Classical Conditioning • Correlation Based Learning • „Reward“ x0 is not an independent term as in TD learning • TD-Learning
Comparison for Classical Conditioning • TD-Learning • ISO-Learning • Uses another form of E-Traces (Band-pass filters) • Used for all input pathways • -> also for calculating the output
Comparison for the closed loop • Closed loop • Actions of the agent affect future sensory input • Comparison not so easy any more, because behavior of the algorithms is now quite different • Reward Based Architectures • Actor-Critic Architecture • Use Evaluative Feed-Back • Reward Maximation • A good reward signal is very often hard to find • In nature: Found by evolution • Can theoretically be applied to any learning problem • Resolution in the State Space: • Only applicable for low dimensional state spaces • -> Curse of dimensionality!
Comparison for the closed loop • Correlation Based Architectures • Non-evaluative feedback, all signals are value free • Minimize Disturbance • Valid Regions are usually much bigger than in for reward maximation • Better Convergence !! • Restricted Solutions • Evaluations are implicitely build into the sign of the reaction behavior • Actor and Critic are the same architectureal building block • Only for a restricted set of learning problems • Hard to apply for complex tasks • Resolution in Time: • Only looks at temporal correlation of the input variables • Can be applied for high dimensional state spaces
Comparison of ISO learning and STDP • ISO learning generically produces a bimodal weight change curve • Similiar to the STDP (Spike timing dependent plasticity) learning weight change curve • ISO learning STDP rule: • Potential from the synapse: Filtered version of a spike • Gradient Dependent Model • Much faster time scale used in STDP • Can model different kind of synapses with different filters easily
Overview • Short Overview of different control methods • Correlation Based Learning • ISO Learning • Comparison to other Methods ([Wörgötter05]) • TD Learning • STDP • ICO Learning ([Porr06]) • Learning Receptive Fields([Kulvicius06])
ICO (Input Correlation Only) Learning • Drawback of Hebbian Learning • Auto-Correlation can result in divergence even if x0 = 0 • ISO learning: • Relies on orthogonal filters of different inputs • Orthogonal to its derivative • Only works for if steady state is assumed • Auto correlation does not vanish any more if the weights are changed during the impulse response of the filters • -> can not be applied for large learning rates • => Can be used only for small learning rates, otherwise Auto-Correlation causes divergence of the weights
ICO & ISO Learning • ISO Learning • ICO Learning
ICO Learning • Simple adaption of the ISO Learning rule • Correlate only inputs with each other • No correlation with the output • -> No Auto Correlation • Define one Input as the reflex input x0 • Drawback: • Loss of Generality: Not Isotropic any more • Not all inputs are treated equally any more • Advantage: • Can use much higher learning rates (up to 100x faster) • Can use almost arbitrary types of filter • No Divergence in weights any more
ICO Learning • Weight change curve (open loop, just one Filter bank) • Same as for ISO learning • Weight changing curve • ISO learning contains exponential instability • Even after setting x0 to 0 after 100000 timesteps
ICO Learning: Closing the Loop • Output of learner v feeds back to its inputs xj after being modified by the environment • Reactive Pathway: Fixed Reactive Feedback control • Learning Goal: • Learn earlier reaction to keep x0 (Disturbance or error signal) at 0 • One can proof that under simplified conditions that one shootlearning is possible • With one filter bank, impulse signals • Using Z-Transform
ICO Learning: Applications • Simulated Robot Experiment: • Robot has to find food (disks in the environment) • Sensors for Uncondition Stimulus: • 2 Touchsensors (Left + Right) • Reflex: Robot elicits a sharp turn as it touches a disk • Pulls the robot into the centre of the disk • Sensors for predictive Stimulus • 2 Sound (Distance) Sensors (Left + Right), Disks • Can measure distance to the disk • Stimulus: Difference between Left + Right sound signals • Use 5 filters (resonators) in the filter bank • Output v: Steering angle of the Robot
ICO Learning: Simulated Robot • Only One experience has been sufficient to show an adapted behavior • Only Possible with ICO learning
Simulated Robot • Comparison for different Learning rates • ICO Learning ISO Learning • Learning was successful if for a sequence of four contacts • Equivalent for small learning rates • Small Auto correlation term
Simulated Robot • Two Different Learning Rates • Divergent Behavior of ISO learning for high learning rates • Robot shows avoidance behavior from food disks
Applications continued • More Complex Task: • Three food disks simultanously • No simple relationship between the reflex input and the predictive input any more • Superimposed Sound Fields • Is only learned by ICO learning, not by ISO learning
ICO: Real Robot Application • Real Robot: • Target White disk from a distance • Reflex: Pulls the robot into the white disk just at the moment the robot drives over the disk • Achieved by analysing the bottom-scanline of a camera • Predictive input: • Analysing Scanline from the top of the image • Filter Bank • 5 FIR Filters with different filter length • All coefficients set to 1 -> smear out signal • Narrow viewing angle of the camera • Put robot more or less in front of the disk
ICO: Real Robot Experiment • Processing the input • Calculate the deviation of the positions of all white points in a scanline to the center of the scanline • 1D signal • Results: • A before learning • B & C After learning • 14 contacts • Weights oscillate around their best values, but do not diverge
ICO Learning: Other Applications • Mechanical Arm • Arm is always controlled with a PI controller to a specified set point • Input of the PI controller: Motor position • PI controller is used as reactive filter • Disturbance: • Pushing force of a second small arm mounted to the main arm • Fast reacting touch sensors measures D. • Use 10 resonator filters in the filter bank
ICO Learning: Other Applications • Result: • Control is shifted backwards in time • Error signal (derivation to the set point) almost vanishes • Other example: Temperature Control • Predict temperature changes caused by another heater
Overview • Short Overview of different control methods • Correlation Based Learning • ISO Learning • Comparison to other Methods ([Wörgötter05]) • TD Learning • STDP • ICO Learning ([Porr06]) • Learning Receptive Fields([Kulvicius06])
Development of Receptive fields through temporal Sequence learning [Kulvicius06] • Develop receptive fields by ICO learning • Learn behavior and receptive fields simultanously • Usually these 2 learning processes are considered seperately • First approach where the receptive field and the behavior is trained simultanously!! • Shows the application of ICO learning for high dimensional input spaces
Line Following • System: • Robot should learn to better follow a line painted on the ground • Reactive Input: • x0… Pixels at the bottom ot the image • Predictive Input • x1… Pixels in the middle of the image • Use 10 different filters in the filter bank (resonators) • Reflexive Output: • Brings robot back to the line • Not a Smooth behavior • Motor Output • S… Constant Speed • v modifies speed and steering of the robot • Use Left-Right symmetry
Line Following • Simple System • Fixed sensor banks, all pixels are summed up • Input x1 predicts x0
Line Following • Three different Tracks • Steep, Shallow, Sharp • For one learning experiment always the same track is used • Robot steers much smoother • Usually 1 trial is enough for learning • Videos • Without Learning • Steep • Sharp
Line Following: Receptive Fields • Receptive fields • Use 225 pixels for the far sensors • Use individual filter banks for each pixel • 10 filters per pixel • Left-Right Symmetry: • Left Receptive field is a mirror of the right
Line Following: Receptive Fields • Results • Lower learning rates have to be used • More trials are needed (3 to 6 trials) • Different RFs are learned for different tracks • Steep and Sharp Track, Plots show the sum of all filter weights for one pixel
Conclusion • Correlation Based Learning • Tries to minimize the influence of disturbances • Easier to learn than Reinforcement Learning • The framework is less general • Questions: • When to apply Correlation Based Learning and when Reinforcement Learning • How is it done by Animals/Humans? • How can these two methods be combined • Correlation learning in early learning stage • RL for fine tuning • ICO Learning • Improvement of ISO learning • More Stable, higher learning rates can be used • One Shoot Learning is possible
Literature: • [Porr05]: F. Wörgötter and B. Porr, Temporal Sequence Learning, Prediction and Control, A Review of different control methods and their relation to biological mechanisms • [Porr03]: B. Porr, F. Wörgötter, Isotropic Sequence Order Learning • [Porr06]: B. Porr, F. Wörgötter, Strongly improved stability and faster convergence of temporal sequence learning by utilising input correlations only • [Kulvicius06]: T. Kulvicius, B. Porr and F. Wörgötter, Behaviourally Guided Development of Primary and Secondary Receptive Fields through temporal sequence learning