Exploration of Unknown Environments with Motivational Agents

Exploration of Unknown Environments with Motivational Agents Luis Macedo1,2, Amilcar Cardoso2 1Department of Informatics and Systems Engineering, Engineering Institute, Coimbra Polytechnic Institute, 3030 Coimbra, Portugal 2Artificial Intelligence Lab, Centre of Informatics and Systems, University of Coimbra, 3000 Coimbra, Portugal Ravinder Reddy Siriseni

Abstract • The architecture is based on belief, desire, and intention (BDI) approach. • Beliefs: Information about environment; informative • Desires: Objectives to be accomplished, possibly with each objective’s associated priority/payoff; motivational • Intensions: The currently chosen course of action; deliberative

Environment • Virtual environment with some entities/objects • Contains variety of objects located at different positions • Entity comprises three distinct and fundamental components • Structure - visual part of the object which may comprise several other sub-objects • Function - role of the object in the environment

Sensors, Effectors & Motivation • Sensors: Comprises of two simulated sensors • An optical sensor • An infrared sensor • The information related to the structure, function and behavior of the object is collected from environment through optical sensor • Infrared sensors provides the distance of the object • User defines parameter for range of visual field • the functions are not accessible (or cannot not be inferred from the visual information) unless the agent is at the same place of the object • Effectors: simulated legs to move in the environment • Motivation: Exploring the environment populated with entities/objects

goal • Explorations of unknown, uncertain and dynamic environments by motivational agents whose decisions are based on the motivations they “feel”. • Acquisition of a model of the environment including models of the entities that populate the environment. • Agent performs exploration using an action selection method based on the maximization of the intensity of positive feelings and minimization of negative ones.

Exploration • Process of selecting and executing actions so that maximum knowledge of the environment is acquired • The result is acquisition of model of physical environment • Exploring unknown environment requires resources from agents such as time and power. • The goal is to get maximum knowledge of the environment with minimum cost. • Many strategies have been pursued • Minimize the required amount of the resources • Maximize knowledge acquisition • Strategies are grouped to 2 categories • Undirected • Directed

Cont … • Undirected exploration • e.g.: random walk exploration, Boltzman distributed exploration • Use no exploration-specific knowledge • ensures exploration by merging randomness into action selection • Directed exploration • Use exploration specific knowledge for guiding the learning process • This techniques is according with some psychological studies that have shown that novelty and new stimulus incite exploration. • Curiosity is the psychological construct that has been closely related to the behavior. • “curiosity - A primary emotion consisting of simple impulse to know, controlling and sustaining attention and evoking the bodily movements that allow one to acquire information about an object”---A. Shand • These approaches are closely related to emotion concept of interest excitement proposed by different emotional theory to account of exploration, adventure, problem solving, creativity and acquisition of skills

Cont … • In addition to novelty, other variables such as changes, suprisingness, complexity, uncertainity also determine the kind of behavior related to exploration and investigation activities • A biologically fundamental emotion, surprise may play an important role in the cognitive activities, especially in attention focus and learning • Exploration is an activity that involves decision-making. • Physiological and neuroscience research suggests that motivations play a crucial role in decision-making, action and performance influencing a variety of cognitive processes (e.g. attention, perception, planning etc.) • Architecture of the motivational agent • How motivational agent performs exploration

Architecture Sensors Memory Motivations world Deliberative Reasoning/ decision-Making Goals, Desires Effectors Agent Architecture of an Agent

Cont … • The deliberative reasoning/decision-making module is the core of the architecture • It receives internal information (from memory) and environmental information (through sensors) and outputs an action that has been selected for execution • The process of action selection takes into account the states of the environment the agent would like to happen (desires), i.e. it selects an action that leads to those states of the environment the agent prefer • This preference is implicitly represented in a mathematical function that evaluates states of the world in terms of the positive feelings they elicit in the agent • The intensity of the feelings (motivations) is computed by motivations module taking into account both the past experience (information stores in the memory) and present environment description. • Agent continuously performs the deliberative reasoning/decision-making algorithm.

Memory • Memory of an agent stores the information about the world • It includes the configuration of the surrounding world such as the position of the objects that inhibit it, the description of entities themselves, sequence of actions (plans), in general, belief about the environment. • This information is stored in different memory locations • There is a matrixmap (grid based) to specially model the surrounding physical environment of the agent • Description of the entities (physical structure and function) are stored in episodic memory • Plans are stored in semantic memory

Matrix map • Matrix map of the world is a three-dimensional grid in which cells contain information of set of entities that may alternatively occupy the cell and the probability of the occupancy • For each cell <x , y> of matrix map of an agent i is a set of pairs • Where is the identifier of the jthentity that may occupy the cell <x , y>

Memory for entities • Set of description of the entities perceived from the environment are stored in the episodic memory of entities. • The description of the form < ID,PS,F > , • ID identifies the entity uniquely in the environment • PS is the physical structure • F is the function of the entity • The sensors may provide incomplete information about the entity, the missing case is estimated by taking into account the available information and description of the other entities previously perceived and already stored in the episodic memory of entities • Physical structure of the entity may be described analogically or propositionally • Analogical representation reflects directly physical structure. • Propositional – through semantic features or attributes much like in semantic networks • Function is description of the role or category of the entity in environment

Cont …

Memory for plans • Plans are represented as hierarchy of tasks • This structure has a form of planning tree i.e. a kind of AND/OR tree that express all the possible ways to decompose a goal task • A task t is both conditional and probabilistic • That means each task has a set of conditions and for each one of these mutually exclusive and exhaustive conditions, there is a set of alternative effects • Each effect contains information about produced in the world by achieving the goal task • i.e. an effect may give information about the amount of power consumed, the new location of the agent, the emotions felt, etc.

Motivation Module • This module receives information from the current state of the environment and outputs the intensities of emotions, drives and other motivations • The agent is continuously presented with an input proposition, which corresponds to sensorial information of an entity, in response the surprise and curiosity units outputs the intensity of these motivations • The surprise felt by an agent Agt elicited by an object Objk is given by degree of unexpectedness of the object Objk • Curiosity is the desire to know or learn an object that arouses interest by being novel, which means novel objects simulate actions intended to acquire knowledge about those objects • The curiosity induced in an agent Agt by an object Objk depends on the novelty or difference of Objk • The drive hunger is defined as the need of a source of energy. Given the capacity C of the storage of that source, and L is the amount of energy left, the hunger elicited in agent is computed as HUNGER(Agt)=C-L

Deliberative reasoning/decision-making • This module receives information from the internal/external world and outputs an action that has been selected for execution • To say agent starts computing the current world state • Generates expectations or assumption for the gaps in the environment information provided by sensors • Then new intensions/goals are generated and their Expected Utility (EU) is computed • According to this EU, the set of goals of the agent is ranked • And first MEU (Maximum Expected Utility) is taken and HTN (Hierarchal Task Network) plan is generated for it and executed

Generation of assumptions/expectations • It is very difficult for an agent to get all the information about the surrounding environment mainly because of the incompleteness of their perceptual information. • Taking the evidence the available information it is possible to generate expectations/assumptions for the missing information using the Bayes’ rule. • E1,E2 … are pieces of evidence and Hi is mutually exclusive and collectively exhaustive set of hypotheses for a specific piece of information

Generation and Ranking of agent’s goals/intensions • The motivational system plays an important role in the generation and ranking of goals/intensions • The algorithm for generation and ranking of goals/intensions • Set of goal tasks are retrieved from the memory of plans • Similar goals are generated by adapting the past goals to situations of present state of the world. • The adaptation strategies used are mainly substitutions • Entity Utility (EU) of each goal task is computed. The computation of EU is performed by predicting the motivations that could be elicited by achieving/executing the goal task • This means motivation felt by an agent when the affect takes place are predicted and estimated (anticipated) based on the information available in the effect about the changes produced in the environment. Based on the computed EU the, the maximum EU is ranked first

Exploration with motivational agents • The goals of exploration is twofolds • Acquisition of maps of the environment-matrix maps-to be stored in memory and where the cell occupied by the entities that populate the environment are represented • Construction of models of those entities • Agent is continuously performing the deliberative resasoning/decision-making algorithm • Agent at a given time senses the environment to look for the entities and compute the current state of the world based on • Sensorial information and • Generations of expectations for missing information • Then, a goal of kind visitEntity is generated for each unvisited entity within the visual range • In addition, a goal of kind visitLoc is generated for all frontier cells • Then, these goals are ranked according to their EU, which is based on the intensities of motivations predicted

Thank You

Exploration of Unknown Environments with Motivational Agents