Advancing Motivated Learning with Goal Creation

Advancing Motivated Learning with Goal Creation James Graham1, Janusz A. Starzyk1,2, Zhen Ni3 and Haibo He3 1School of Electrical Engineering and Computer Science Ohio University, Athens, OH, USA 2University of Information Technology and Management Rzeszow, Poland 3Electrical, Computer, and Biomedical Engineering University of Rhode Island, Kingston, RI, USA

Overview • Introduction • Enhancements to Motivated Learning • Bias calculation • Use of desirability and availability • Probabilistic goal selection • Desired Resource Levels • Resource level as an optimization problem • Resource dependencies • Desirability calulations • Comparison to RL algorithms • Conclusions

Motivated Learning • Controlled by underlying “primitive” motivations • Builds on motivations to create additional “abstract” motivations Motivation Hierarchy • Unlike in RL, focus is not on maximizing externally set rewards, but on intrinsic rewards and creating mission related new goals and motivations. Intrinsic Intrinsic Intrinsic Extrinsic

Improvements to ML • Bias/Pain calculations • Resource availability • Learning to select actions • Probabilistic goal selection • Determining desired resource levels

Significance of bias signals • Initially we only have primitive needs (no biases) • Bias is a foundation for the creation of new needs0 • Bias is a preference for or aversion to something (resource or action) • Bias results from an existing need being helped or hurt by a resource or action • Level of bias is measured related to the availability of a resource or likelihood of an action

Bias based on availability and desirability • Availability based bias Rd is a desired resource value (at a sensory input si) Rc is a current resource value A is the Availability calculation dc is the current distance to another agentdd is a desired (a comfortable) distance to another agent

Bias based on availability and desirability • Bias for a desired resource • Bias for a desired action • Bias for an undesired action • Bias for an undesired resource

Probabilistic goal selection • Uses normalized wPG weights to select actions based on probability. • However, previous wPG calculation could lead to weight saturation at αg, so we used the following: • This causes the weights to saturate at (3/π)atan(ds/dŝ) • (ds/dŝ) measures how useful action is at restoring resource

Probabilistic goal selection • WPG weights • Weights will saturate as determined by (ds/dŝ ) tend toward zero

Probabilistic goal selection • Here we show how wbp weights are affected by the different goal selction approaches. Without probabilistic selection With probabilistic selection

Determining desired resource levels • Desired values should be set according to the agent’s needs. • To begin, the agent is given the initial “primitive” resource level, Rdp. • The agent must learn the rate at which “desired” resources are used (∆p). • The agent can use its knowledge of the environment to set the desired resource levels. • Resource levels are established only for resources that the agent cares about. • The frequency of performing tasks cannot be too great as the agent’s time is limited. The agent also needs to “learn”.

Determining desired resource levels • To establish the optimum level of desired resources we solve the optimization problem • subject to constraints • and sum of all frequencies is less than 1 • where the restoration frequency is

Determining desired resource levels - example • The agent starts with levels for multiple resource set to the initially observed environment state. • As it learns to use specific resources it adjusts the levels at which it wants to maintain said resources. • Each resource equilibrates to a different level

Reinforcement Learning • Reinforcement learning maximizes external reward • Learns approximating value functions • Usually a single function • May include “subgoal” generation and “curiosity” • Primarily reactive • Objectives are set by the designer

Motivated Learning • Controlled by underlying motivations • Uses existing motivations to create additional “abstract” motivations • ML focus is not on maximizing externally set objectives (as is RL), but on learning new motivations, and building and supporting its internal reward system • Minimax – minimize pain • Primarily deliberative

Comparison to other RL algorithms • Algorithms tested: • Q-learning • SARSA • Hierarchical RL – MAXQ • Neural Fitted Q Iteration (NFQ) • TD-FALCON

Comparison to other RL algorithms – test environment • Testing environment is a simplified version of what we use in NeoAxis. • In NeoAxis we have pains, tasks, triggering pains, and (maybe) NACs. • Comparison test is a “Black Box” that has no NACs and is run as a simplified environment making RL algorithms more compatible and easier to interface.

Comparison to other RL algorithms - results • Algorithms tested: Q-learning, SARASA, HRL, ML ML HRL, Q-Learning, SARSA NFQ TD-Falcon

NFQ Results • Note highlighted lines and see both when the occur and their general profile

Conclusion • Designed and implemented several enhancements to the Motivated Learning architecture • Bias calculations • Goal Selection • Setting desired resource levels • Compared ML to several RL algorithms using a basic test environment and simple reward scenario. • ML achieved higher average reward faster than other algorithms tested

Questions?

Bias signal calculation for resources • For resource related pain Rd is a desired resource value (at a sensory input si) Rc is a current resource value ε is a small positive number γ regulates how quickly pain increases δr=1when the resource is desired, δr=-1when it is not; δr=0otherwise

Learning and selecting actions • Goals are selected based on pain-goal weights: • δpindicates how the associated pain changed • ∆a, outside of μgensures the weights stay below the ceiling of αg=1 • μg determines the rate of chance

Comparing Reinforcement Learning to Motivated Learning • Compare ML and RL

Advancing Motivated Learning with Goal Creation

Advancing Motivated Learning with Goal Creation

Presentation Transcript

Learning Goal

Learning Goal:

Learning Goal:

Learning Goal

Learning Goal

Learning Goal

Learning Goal

Learning Goal

Learning Goal

Learning Goal:

Learning Goal:

Learning Goal:

Learning Goal

Learning Goal:

Learning Goal

Learning Goal

Motivated Learning based on Goal Creation

LEARNING GOAL

Advancing Medical Imaging with Deep Learning

Learning Goal

LEARNING GOAL