240 likes | 248 Views
This paper explores enhancements to motivated learning through the creation of abstract goals and motivations. It discusses bias calculation, probabilistic goal selection, and determining desired resource levels. A comparison to reinforcement learning algorithms is also presented.
E N D
Advancing Motivated Learning with Goal Creation James Graham1, Janusz A. Starzyk1,2, Zhen Ni3 and Haibo He3 1School of Electrical Engineering and Computer Science Ohio University, Athens, OH, USA 2University of Information Technology and Management Rzeszow, Poland 3Electrical, Computer, and Biomedical Engineering University of Rhode Island, Kingston, RI, USA
Overview • Introduction • Enhancements to Motivated Learning • Bias calculation • Use of desirability and availability • Probabilistic goal selection • Desired Resource Levels • Resource level as an optimization problem • Resource dependencies • Desirability calulations • Comparison to RL algorithms • Conclusions
Motivated Learning • Controlled by underlying “primitive” motivations • Builds on motivations to create additional “abstract” motivations Motivation Hierarchy • Unlike in RL, focus is not on maximizing externally set rewards, but on intrinsic rewards and creating mission related new goals and motivations. Intrinsic Intrinsic Intrinsic Extrinsic
Improvements to ML • Bias/Pain calculations • Resource availability • Learning to select actions • Probabilistic goal selection • Determining desired resource levels
Significance of bias signals • Initially we only have primitive needs (no biases) • Bias is a foundation for the creation of new needs0 • Bias is a preference for or aversion to something (resource or action) • Bias results from an existing need being helped or hurt by a resource or action • Level of bias is measured related to the availability of a resource or likelihood of an action
Bias based on availability and desirability • Availability based bias Rd is a desired resource value (at a sensory input si) Rc is a current resource value A is the Availability calculation dc is the current distance to another agentdd is a desired (a comfortable) distance to another agent
Bias based on availability and desirability • Bias for a desired resource • Bias for a desired action • Bias for an undesired action • Bias for an undesired resource
Probabilistic goal selection • Uses normalized wPG weights to select actions based on probability. • However, previous wPG calculation could lead to weight saturation at αg, so we used the following: • This causes the weights to saturate at (3/π)atan(ds/dŝ) • (ds/dŝ) measures how useful action is at restoring resource
Probabilistic goal selection • WPG weights • Weights will saturate as determined by (ds/dŝ ) tend toward zero
Probabilistic goal selection • Here we show how wbp weights are affected by the different goal selction approaches. Without probabilistic selection With probabilistic selection
Determining desired resource levels • Desired values should be set according to the agent’s needs. • To begin, the agent is given the initial “primitive” resource level, Rdp. • The agent must learn the rate at which “desired” resources are used (∆p). • The agent can use its knowledge of the environment to set the desired resource levels. • Resource levels are established only for resources that the agent cares about. • The frequency of performing tasks cannot be too great as the agent’s time is limited. The agent also needs to “learn”.
Determining desired resource levels • To establish the optimum level of desired resources we solve the optimization problem • subject to constraints • and sum of all frequencies is less than 1 • where the restoration frequency is
Determining desired resource levels - example • The agent starts with levels for multiple resource set to the initially observed environment state. • As it learns to use specific resources it adjusts the levels at which it wants to maintain said resources. • Each resource equilibrates to a different level
Reinforcement Learning • Reinforcement learning maximizes external reward • Learns approximating value functions • Usually a single function • May include “subgoal” generation and “curiosity” • Primarily reactive • Objectives are set by the designer
Motivated Learning • Controlled by underlying motivations • Uses existing motivations to create additional “abstract” motivations • ML focus is not on maximizing externally set objectives (as is RL), but on learning new motivations, and building and supporting its internal reward system • Minimax – minimize pain • Primarily deliberative
Comparison to other RL algorithms • Algorithms tested: • Q-learning • SARSA • Hierarchical RL – MAXQ • Neural Fitted Q Iteration (NFQ) • TD-FALCON
Comparison to other RL algorithms – test environment • Testing environment is a simplified version of what we use in NeoAxis. • In NeoAxis we have pains, tasks, triggering pains, and (maybe) NACs. • Comparison test is a “Black Box” that has no NACs and is run as a simplified environment making RL algorithms more compatible and easier to interface.
Comparison to other RL algorithms - results • Algorithms tested: Q-learning, SARASA, HRL, ML ML HRL, Q-Learning, SARSA NFQ TD-Falcon
NFQ Results • Note highlighted lines and see both when the occur and their general profile
Conclusion • Designed and implemented several enhancements to the Motivated Learning architecture • Bias calculations • Goal Selection • Setting desired resource levels • Compared ML to several RL algorithms using a basic test environment and simple reward scenario. • ML achieved higher average reward faster than other algorithms tested
Bias signal calculation for resources • For resource related pain Rd is a desired resource value (at a sensory input si) Rc is a current resource value ε is a small positive number γ regulates how quickly pain increases δr=1when the resource is desired, δr=-1when it is not; δr=0otherwise
Learning and selecting actions • Goals are selected based on pain-goal weights: • δpindicates how the associated pain changed • ∆a, outside of μgensures the weights stay below the ceiling of αg=1 • μg determines the rate of chance
Comparing Reinforcement Learning to Motivated Learning • Compare ML and RL