311 likes | 544 Views
IJCNN, International Joint Conference on Neural Networks , San Jose 2011. Motivated Learning i n Autonomous Systems. Pawel Raif Silesian University of Technology, Poland, Janusz A. Starzyk Ohio University, USA,. Outline. Reinforcement Learning (RL)
E N D
IJCNN, International Joint Conference on Neural Networks, San Jose 2011 Motivated LearninginAutonomous Systems Pawel Raif Silesian University of Technology, Poland, Janusz A. Starzyk Ohio University, USA,
Outline • Reinforcement Learning (RL) • Goal Creation System (GCS)yields self-organizing pain based network • Motivated Learning (ML)as a combination of RL + GCS • SimulationsResults • Possible Applications of ML
corrective learning supervised learning machine learning reinforcement learning unsupervised learning Machine Learning Methods PROBLEMS IN „REAL WORLD” APPLICATIONS like in AUTONOMOUS SYSTEMS intrinsicmotivation „top-down approach” „curse of dimensionality” lack of motivation for development hierarchical RL „bottom-up approach”
Reinforcement Learninglearning through interaction with the environment RL s a r ENVIRONMENT
Motivated Learning • Motivated learning (ML) is need based motivation, goal creation and learning in an embodied agent. • An agent creates hierarchy of goals based on the primitive need signals. • It receives internal rewards for satisfying its goals (both primitive and abstract). • ML applies to EI working in a hostile environment. ML can combine internal goal creation system (GCS) and reinforcement learning (RL).
Motivated Learning – the main IDEA…intrinsic motivations created by learning machines. state action RL GC reward ML GOALS (motivations)
How to motivate a machine? We suggest that the hostility of the environment, is the most effective motivational factor. An intelligent agent learnshow to survive in a hostile environment.
Assumptions 1. ML agent is independent: it can act autonomously in its environment and is able to choose its own way of development. 2. ML agent’sinterface tothe environment is the same as RL agent’s. 3. Environment is hostile to the agent. 4. Hostility may be active or passive (depleted resources). 5. Environment is fully observable.
GoalCreation SystemNeuralself-organizing pain-based structures UA WTA WTA M2 M -10 1 1 1 Sk S2 P2 M1 G wBP2 P2 G B2 B2 wPG wBP2 wP1G P1 1 wBP1 S1 B1 P1 G B1 wBP1 wPpG • Motivations and selection of a goal • Motivations are as desires in BDI agent • WTA competition selects motivation • another WTA selects goals • Goal creation scheme • a primitive pain is directly sensed • an abstract pain is introduced by solving a lower level pain • thresholded curiosity based pain Pp .
Internal goalssimple linear hierarchy between different goals Hierarchy of resources(and possible agent’s goals): 4 Resources are distributed all over the „grid world”. The most abstract Office 3 Bank 2 Grocery 1 The least abstract Food
Modified „gridworld” Agent must localize resources and learn how to utilize them This environment is: Complex, Dynamically changing, Fully observable.
Environment 2 1 Resources present in the environment can be used to satisfy the agent’s needs 3 4 Resources are distributed all over the„grid world”. 4 Perception of resources 3 2 Internal need signals By discovering useful resources and their dependencies, learned hierarchy of internal goals expresses the environment complexity. 1 Subjective sense of „lack of resources”
Relationships between internal goals Relationships between internal goals doesn’t have to be a linear hierarchy. They may constitute a tree structure or a complex network of resource dependencies. Top level resources need3 By discovering subsequent resources and their dependencies, the complexity of internal goal network grows. BUT each system may have unique experiences (reflecting personal history of development) need1 need2 Designer’s specified needs
Experiment that combines ML & RL Every resource discovered by the agentbecomes a potential goal and is assigned avalue function „level”. Goal Creation System establishes new goals and switches agent’s activity between them. RL algorithm learns value functions on different levels.
Experiment Resultsswitching between goals at the beginning … Initially the agent uses many iterations to reach a goal (red dots). Sometimes it abandons the goal when another pain dominates. Final runs are shorter and more successful. … and at the end.
Experiment Results Comparing Primitive Pain Levels of RL & ML Initially RL agent learns better. Its performance deteriorates as the resources are depleted Moving average of the primitive pain signal.
Experiment Results Effectiveness in terms of cumulative reward: Cumulative reward Reward determined by the designer of the experiment.
Reinforcement LearningMotivated Learning Single value function Various objectives Measurable rewards Predictable Objectives set by designer Maximizes the reward Potentially unstable Learning effort increases with complexity Always active Multiple value functions One for each goal Internal rewards Unpredictable Sets its own objectives Solves minimax problem Always stable Learns better in complex environment than RL Acts when needed http://www.bradfordvts.co.uk/images/goal.jpg
Conclusions Motivated learning method, based on goal creation system, can improve learning of autonomus agents in special class of problems. ML is especially useful in complex, dynamic environments where it works according to learned hierarchy of goals. Individual goals use well known reinforcement learning algorithms to learn their corresponding value functions. ML concerns building internal representations of useful environment percepts, through interaction with the environment. ML switches machine’s attention and sets intended goals becoming an important mechanism for a cognitive system.
„The real danger is not that computers will begin to think like man, but that man will begin to think like computers.” Sydney J. Harris
References: • J.A. Starzyk, J.T. Graham, P. Raif, and A-H.Tan, Motivated Learning for the Development of Autonomous Systems, Cognitive Systems Research, Special issue on Computational Modeling and Application of Cognitive Systems, 12 January 2011. • Starzyk J.A., Raif P., Ah-Hwee Tan, Motivated Learning as an Extension of Reinforcement Learning, Fourth International Conference on Cognitive Systems, CogSys 2010, ETH Zurich, January 2010. • Starzyk J.A., Raif P., Motivated Learning Based on Goal Creation in Cognitive Systems, Thirteenth International Conference on Cognitive and Neural Systems, Boston University, May 2009. • J. A. Starzyk, Motivation in Embodied Intelligence,Frontiers in Robotics, Automation and Control, I-Tech Education and Publishing, Oct. 2008, pp. 83-110.