1 / 38

Universal Learning Models

Cognitive Neuroscience and Embodied Intelligence. Universal Learning Models. Based on a courses taught by Prof. Randall O'Reilly , University of Colorado, Prof. Włodzisław Duch , Uniwersytet Mikołaja Kopernika and http://wikipedia.org/.

theta
Download Presentation

Universal Learning Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cognitive Neuroscience and Embodied Intelligence Universal Learning Models Based on a courses taught by Prof. Randall O'Reilly, University of Colorado, Prof. Włodzisław Duch, Uniwersytet Mikołaja Kopernika and http://wikipedia.org/ http://grey.colorado.edu/CompCogNeuro/index.php/CECN_CU_Boulder_OReilly http://grey.colorado.edu/CompCogNeuro/index.php/Main_Page Janusz A. Starzyk

  2. We want to combine Hebbian learning and learning using error correction, hidden units and biologically justified models. Hebbian networks model states of the world but not perception-action. Error correction can learn mapping. Unfortunately the delta rule is only good for output units, and not hidden units, because it has to be given a goal. Backpropagation of errors can teach hidden units. But there is no good biological justification for this method… The idea of backpropagation is simple but a detailed algorithms requires many calculations. Main idea: we're looking for the minimum error function, measuring the difference between the desired behavior and the behavior realized by the network. Task learning

  3. E(w) – error function, dependent on all parameters of network w, is the sum of errors E(X;w) for all images X. ok(X;w) – values reached on output nr. k network for image X. tk(X;w)– values desired on output nr. k network for image X. One image X, one parameter w then: Error function Error value f. =0 is not always attainable, the network may not have enough parameters to learn the desired behavior, we can only aim for the smallest error. In the minimum error E(X;w) is for parameter w for derivative dE(X;w)/dw = 0. For many parameters we have all derivatives dE/dwi, or gradient.

  4. The delta rule minimizes error for one neuron, e.g.. the output neuron, which is reached by signals si Dwik =e ||tk – ok|| si Error propagation What signals should we take for hidden neurons? First we let signals into the network calculating activation h, output signals from neurons h, through all layers, to the outputs ok (forward step). We calculate the errors dk = (tk-ok),and corrections for the output neurons Dwik = edkhi.Error for hidden neurons: dj = e Sk wjk dk hj(1-hj), (backward step) (backpropagation of error). The strongest correction for undecided weights – near 0.5

  5. Although most models used in psychology teach multilayer perceptron structures with the help of variations of backpropagation (in this way one can learn any function) the idea of transferring information about errors doesn't have a biological justification. GeneRec GeneRec (General Recirculation, O’Reilly 1996), Bi-directional signal propagation, asymmetrical weights wkl  wjk. First phase –, response of the network to the activation of x– gives output y–, then observation of the desired result y+ and propagation to input x+. The change in weights requires information about signals from both phases.

  6. The learning rule agrees with the delta rule: GeneRec - learning In comparison with backpropagation the difference of signals [y+-y-] replaces the aggregate error, (the difference of signals) ~ (the difference of activations) * (the derivative of the activation function), thus it is a gradient rule. For setups b is xi=1, so: Bi-directional information transfer is almost simultaneous, answers for the formation of attractor states, constraint satisfaction, image completion. The P300 wave which appears 300 msec after activation shows expectations resulting from external activation Errors are the result of activity in the whole network, we will get slightly better results taking the average [x++x-]/2 and retaining the weight symmetry: CHL rule (Contrastive Hebbian Rule)

  7. From where does the error come for correction of synaptic connections? Two phases The layer on the right side = the middle after time t+1; e.g.. a) word pronunciation: external action correction; b) external expectations and someone's pronunciation; c) awaiting results of action and their observation; d) reconstruction (awaiting input).

  8. Hebbian learning creates a model of the world, remembering correlations, but it is not capable of learning task execution. Hidden layers allow for the transformation of a problem and error correction permits learning of difficult task execution, the relationships of inputs and outputs. The combination of Hebbian learning – correlations (x y) – and error-based learning can learn everything in a biologically correct manner: CHL leads to symmetry, an approximate symmetry will suffice, connections are generally bidirectional. Err = CHL in the table. * * * * GeneRec properties Lack of Ca2+ = there is no learning; little Ca2+ = LTD, much Ca2+ = LTP LTD – unfulfilled expectations, only phase -, lack of z + reinforcement.

  9. Combination of Hebb + errors It's good to combine Hebbian learning and CHL error correction • CHL is like socialism • tries to correct errors of the whole, • limits unit motivation, • common responsibility • low effectiveness • planed activity • Hebbian learning is like capitalism • based on greed • local interests • individualism • efficacy of activity • lack of monitoring the whole

  10. It's good to combine Hebbian learning and CHL error correction Combination of Hebb + errors Correlations and errors: Combination Additionally, inhibition within layers is necessary: it creates economical internal representations, units compete with each other, only the best remain, specialized, makes possible self-organized learning.

  11. Genrec.proj.gz, chapt. 5.9 3 hidden units. Learning is interrupted after 5 epochs without error. Simulation of a difficult problem Errors during learning show substantial fluctuations – networks with recurrence are sensitive to small changes in weight, explore different solutions. Compare with learning easy and difficult tasks using only Hebb.

  12. Inhibition Leads to sparse distributed representations (many representations, only some are useful in a concrete situation) Competition and specialization: survival of the best adapted Self-organized learning Often more important than Hebbian Inhibition was also used in the mixture of experts framework gating units are subject to WTA competition control outputs of the experts Inhibitory competition as a constraint

  13. View of hidden layer weights in Hebbian learning Neural weights are introduced in reference to particular inputs Comparison of weight change in learning View of hidden layer weights in error correction learning • The weights seem fairly random when compared with Hebbian learning

  14. Charts comparing a) training errors b) number of cycles as functions of the number of training epochs for three different learning methods Hebbian (Pure Hebb) Error correction (Pure Err) Combination (Hebb& Err) – which attained the best results b) Epochs Comparison of weight change in learning

  15. Inhibition within layers, Hebbian learning + error correction for weights between layers. Full Leabra model • 6 principles of intelligent system construction. • Biological realism • Distributed representations • Inhibitory competition • Bidirectional • Activation Propagation • Error-driven learning • Hebbian learning

  16. Generalization in attractor networks GeneRec by itself does not give a good generalization. Simulation: Ch6, model_and_task.proj. learn_rule = PURE_ERR, PURE_HEBBorHEBB_AND_ERR 35 training data, testing every 5 epochs on the remaining 10 data. Learning by error correction only does not give good results. Parameterhebbcontrols how much CHL and how much Hebb correlation. Pure_errimplements only CHL. Check + and – learning phases. Generalization requires good internal representations = strong correlations, and error correction by itself do not lead to sufficiently strong internal representations, only Hebb + kWTA will do it.

  17. Black line = cnt_err, training data error. Generalization: plots Red = unq_pats, determines how many input lines is uniquely represented by the hidden layer (max=10). Blue = gen_Cnt, evaluates generalization for 10 new lines – 8 errors at the end. Weights appear random. Generalization is poor based only on the error correction, since there is no conditions forcing internal representations. Batch Run repeats 5 times (slow).

  18. Generalizacja: korekcja błędów + Hebb Szybka zbieżność, powstają dobre reprezentacje wewnętrzne.

  19. How do we deal with things which we've never seen Generalization nust every time we enter the classroom, every meeting, every sentence that you hear, etc. We always encounter new situations, and we reasonably generalize them How do we do this?

  20. Internal distributed representations. New concepts are combinations of existing properties. Good representations Hebbian learning + competition based on inhibition limit error correction so as to create good representations.

  21. The GeneRec rule itself doesn't lead to good generalization. Simulations: model_and_task.proj. gz, Chapt. 6 Generalization in attractor networks The Hebb parameter controls how much CHL and how much Hebb. Pure_err realizes only CHL, check phases - and + Compare internal representations for different types of learning.

  22. To learn difficult problems, many transformations are necessary, strongly changing the representation of the problem. Deep networks Error signals become weak and learning is difficult. We must add limits and self-organizing learning. Analogy: Balancing several connected sticks is difficult, but adding self-organizing learning between fragments will simplify this significantly – like adding a gyroscope to each element.

  23. Except for object and relationship recognition and task execution, sequential learning is important, eg. the sequence of words in the sentences: The dog bit the man. The man bit the dog. The child lifted up the toy. I drove through the intersection because the car on the right was just approaching. The meaning of words, gestures, behaviors, depends on the sequence, the context. Time plays a fundamental role: the consequences of the appearance of image X may be visible only with a delay, eg. the consequences of the position of figures during a game are only evident after a few turns. Network models react immediately – how do brains do this? Sequential learning

  24. Example simulation: family_trees.proj.gz, Chapt. 6.4.1 Family tree 24 people = agent. relations: husband, wife, son, daughter, father, mother, brother, sister, aunt, uncle, cousin. Generalization needs to find relations between people. What is still missing? Temporal and sequential relationships!

  25. How to learn family relations? Enter all relations according to the tree below. We need 40 epochs to learn, but BP needs 80. Hebbian learning only is not sufficient. Family tree Init_cluster + Cluster run.

  26. Cluster plot showing the representation of hidden layer neurons a) before learning b) after learning using a combined Hebbian and error-correction method The trained network has two branches corresponding to two families Sequential learning

  27. Categories of temporal relationships: Sequences with a given structure Delayed in time Continuous trajectories The context is represented in the frontal lobes of the cortex it should affect the hidden layer. We need recurrent networks, which can hold onto context information for a period of time. Sequential learning Simple Recurrent Network, SRN, • The context layer is a copy of the hidden layer Elman network.

  28. Biological justification for context representation Frontal lobes of the cortex Responsible for planning and performing temporal activities. People with damaged frontal lobes have trouble performing the sequence of an activity even though they have no problem with the individual steps of the activity Frontal lobes are responsible for temporal representations For example words such as “fly” or “pole” acquire meanings based on the context Context is a function of previously acquired information People with schizophrenia can use context directly before an ambiguous word but not context from a previous sentence. Context representations not only lead to sequential behavior but are also necessary for understanding sequentially presented information such as speech. Sequential learning

  29. Can we discover rules of sequence creation? Examples: BTXSE BPVPSE BTSXXTVVE BPTVPSE Examples of sequential learning Are these sequences acceptable? BTXXTTVVE TSXSE VVSXE BSSXSE A machine with consecutive passages produces these behaviors: As studies have shown, people can learn more quickly to recognize letters produced according to a specific pattern, even if they don't know the rules being used

  30. The network randomly chooses one of two possible states. Hidden/contextual neurons learn to recognize machine states, not only labels. Network realization Behavior modeling: the same observations but different internal states => different decisions and next states. Project fsa.proj.gz, chapt. 6.6.3

  31. The reward (reinforcement) often follows with a delay eg. learning a game, behavioral strategies. Idea: we have to foresee sufficiently early what events lead to a reward. This is done by the temporal differences algorithm. (Temporal Differences TD - Sutton). From where does a reward come in the brain? Temporal delay and reinforcement The midbrain dopaminergic system modulates the activity of the basal ganglia (BG) through the substantia nigra (SN), and the frontal cortex through the ventral tegmental area (VTA). It's a rather complicated system, whose actions are related to the evaluation of impulses/actions from the point of view of value and reward.

  32. The ventral tegmental area (VTA) is part of the reward system. VTA neurons deliver the neurotransmitter dopamine (DA) to the frontal lobes and the basal ganglia modulating learning in this area responsible for planning and action. Temporal delay and reinforcement More advanced regions of the brain are responsible for producing this global learning signal Studies of patients with damage in the VTA area indicate its role in predicting reward and punishment.

  33. Anticipation of reward and result Anticipation of reward and reaction on the decision (Knutson et al, 2001)

  34. VTA neurons first learn to react to reward and then to predict ahead of time the appearance of a reward. Basal gangliaBG

  35. We need to determine a value function, the sum after all future rewards, the further away in time the less important: Formulation sketch –TD algorithm The adaptive critic AC learns how to estimate the value function V(t). At every point in time, AC tries to predict the value of the reward This can be done recursively: Error of the predicted reward: The network tries to reduce this error. The name of the algorithm – TD (temporal difference) represents the error in the calculation of the value function during a period of time

  36. Network implementation Prediction of activity and error. Conditioned stimulus CS for t=2 Unconditioned stimulus (reward) US for t=16 rl_cond.proj.gz Initially large error for Time=16 because the reward r(16) is unexpected Adaptive critic AC

  37. Two-phase implementation (Phase +) computes the expected size of the reward over time t+1 (value r). (Phase –) in step t-k predicts t-k+1, at the end r(tk). The function value V(t+1) in phase + is carried over to value V(t) in phase - Learning progresses backwards in time affecting the value of the previous step CS for t=2 US for t=16

  38. The system learns that stimulants (tone) predicts the reward Input CSC – Complete Serial Compound, uses unique elements for each stimulus for each point in time. Two-phase implementation Chapt. 6.7.3, proj. rl_cond.proj.gz This is not a very realistic model of classical conditioning.

More Related