300 likes | 426 Views
Cooperation, Negotiation and Reconsideration on the basis of Environmental Cues. O. Tripp and J.S. Rosenschein. Agenda. The BDI control loop and the reconsideration problem Educated reconsideration Cooperation and state-changing rules Conditional cooperation Cooperation through exploration
E N D
Cooperation, Negotiation and Reconsideration on the basis of Environmental Cues O. Tripp and J.S. Rosenschein
Agenda • The BDI control loop and the reconsideration problem • Educated reconsideration • Cooperation and state-changing rules • Conditional cooperation • Cooperation through exploration • Conclusions
The Reconsideration Problem • The reconsideration function behaves optimally if, and only if, whenever it chooses to deliberate the agent changes its intentions (Wooldridge and Parsons, 1995). • The logic that controls the activation of the reconsideration function is hardwired into the agent.
Educated Reconsideration • The agent’s frequency of ‘seeing’ is proportional to the degree of dynamism of the environment. • When the agent senses its environment to change rapidly, it reconsiders quite often. • When the agent senses its environment to be relatively static, it rarely reconsiders.
A Simple Learning Algorithm • If you are in the middle of a plan, then carry on executing it. Otherwise, select a new plan. • If ReconsiderationInterval is not set, then set it to the number of steps in the current plan. • After ReconsiderationInterval steps, stop to reconsider. • Compare the expected state of the environment with its actual state (as registered by your sensors). • Update ReconsiderationInterval accordingly. • Feed ReconsiderationInterval into a buffer maintaining your history, and set ReconsiderationInterval to some weighed average on the values stored in the buffer. • Jump to the first step.
Cooperation and State-Changing Rules • State-changing rules are meta-rules that encourage agents to transform the world in a cooperative manner (Goldman and Rosenschein, 1994). • In this setting, agents do some extra work, which (ideally) saves other agents a lot of effort.
State-Changing Rules in the Tileworld Scenario two: Agents are blocked from the tiles closest to them by a barrier. As they move around the barrier, another agent prepares the target tile for them. Scenario one: A2 helps A1 by repositioning the tiles, and thus each agent wastes 12 moves (on average) instead of 17.
Weaknesses of the Existing Model • Naïve cooperation may be counterproductive in a sufficiently heterogeneous environment. • Agents that are hardwired to cooperate can easily be abused.
The Premises of Conditional Cooperation • Perception and inferencing are a primary means of acquiring information in the absence of direct communication. • Multiagent coordination can be based on perception even if the agents are not aware of one another.
Conditional Cooperation in Practice • The agent registers the degree to which it found its environment to be cooperative. • The agent uses the technique used previously for determining its reconsideration rate to determine its cooperation level.
Conditional Cooperation in the Tileworld • The agent might expect the tiles it comes across to have many degrees of freedom. • If this is the case, then the agent observes its environment to be cooperative and raises its cooperation level accordingly. • Otherwise, the agent expresses its disappointment by lowering its cooperation level. • The higher the agent’s cooperation level is, the more it is inclined to allocate resources to tasks that were not assigned to it.
The Benefits of Conditional Cooperation • Agents can react dynamically to situations where they get in each other’s way, and thus prevent from instances of negative cooperation (this happens without the agents being aware of one another!). • Agents are no longer exposed to the threat of being (continually) abused. They expect to see return on their investment.
Notes on Conditional Cooperation • If the depth of the history the agent maintains is 1, then the agent’s cooperation strategy converges with the TIT-FOR-TAT strategy. • There is no one answer to the question of when the agent should update its cooperation level. • Should this happen when the agent constructs a new plan or initiates the execution of a new plan? • Should this happen every time the agent comes across aspects of the environment that may influence its cooperation level?
Cooperation through Exploration • So far, agents restricted their patterns of cooperation to actions they find desirable. • This choice of actions assumes homogeneity amongst the agents occupying the environment. • In a general setting, this may not necessarily be the case, and thus exploratory actions may lead to more useful patterns of cooperation.
Cooperation through Exploration in a Natural Setting • Two robotic agents occupy some natural environment that consists of a hill and a valley. • The first robot’s task is to push stones up to the hill-top. • The second robot’s task is to push stones down the hill.
Cooperation through Exploration in a Natural Setting - Continued • The first robot is more skilled at moving small stones, but gets higher reward on moving large stones. • The opposite is true of the second agent. • It may thus be rational for the first robot to push small stones down the hill, while the second robot pushes large stones up the hill. • Each of the robots was designed to perform a task that another robot – designed by another person – can do better. Exploratory actions have led them to cooperate.
Formal Definition of Cooperation through Exploration • We look at pairs of of the form: (Action,CooperationLevel) • Assuming that the environment is fairly stable, each such pair has a certain utility associated with it. • An exploring agent thus iterates through this set of cooperation patterns in search of a pair that would maximize its utility.
Formal Definition of Coopeation through Exploration - Continued • Maximization of the agent’s utility thus reduces to the task of gradient ascent optimization. • To make this task tractable, Monte Carlo methods can be used.
Exploration as a Source of Adventure • The classic distinction between bold and cautious agents finds a natural interpretation in this setting: • A bold agent expends considerable energy searching for an optimal pattern of cooperation with its environment. Such a search may lead the agent far away from its original goals. • A cautious agent is more reluctant to diverge from the goals it was designed to accomplish.
The Complexity of Exploratory Cooperation • Theoretical results concerning finite, deterministic domains cannot be applied to this setting. • There are malicious stochastic domains where any exploratory technique will take exponential time to reach a goal state [Thrun, 1992].
Exploratory Cooperation as a Form of Gradient Ascent • At time 0: • Start from an arbitrary cooperation mode. • Execute the selected cooperation mode and register its utility. • At time t+1: • With probability prob(boldness), jump to an arbitrary cooperation mode. • With probability 1-prob(boldness), choose the next cooperation mode (uniformly) from amongst the current cooperation mode’s untried neighbors.
Exploratory Cooperation as a Form of Gradient Ascent - Continued • Stop if either of the following two events occurs: • The space of possible cooperation modes has been exhausted. • ticks(boldness) clock ticks have elapsed. • Revert to the cooperation mode that yielded the highest utility.
Notes on Cooperation through Exploration • The theoretical model of cooperation through exploration finds its counterpart in many biological systems [Axelrod, 1984]. • Many decisions remain in the hands of the designer of the agent. These include: • Deciding how bold the agent should be. • Deciding on criteria for determining whether the environment is stable. • Deciding how much time the agent should wait before switching between cooperation modes. • Deciding on the initial cooperation mode.
Negotiation through Exploration • Another perspective on exploratory behaviors is to view them as a form of implicit negotiation. • The object of negotiation is the multi-dimensional space formed by the cooperation modes available to the agents. • An agreement is a specification of the cooperation modes the agents will embrace. • Agents can always jump to the conflict deal and thus operate on their own. • This perspective is reminiscent of state oriented domains [Rosenschein and Zlotkin, 1994].
Conclusions • Agents operating in realistic environments cannot afford to calculate their moves in isolation from the dynamics of their environment [Pollack and Horty, 1999]. • Cooperation in the dark is a viable form of interaction between agents. • Exploratory actions allow the agent to settle into patterns of interaction that were unforeseen by its designer. • Exploratory actions are a form of negotiation that necessitates minimal assumptions about the negotiation scenario.
The Déjà Vu Strategy 0. Assumptions and notations: • The utilities the agent can register with distinct cooperation modes are separable. • The agent has access to some clock. • The agent’s degree of boldness is denoted by BoldNess and is an integral value. 1. Select a cooperation mode (i.e., a combination of action and frequency of performing that action) at random. • Execute the selected action at the selected frequency until the inputs you register from the environment stabilize. • Register the utility of the current cooperation mode.
The Déjà Vu Strategy - Continued 4. Check how many times this utility has been registered so far. • If this has happened or more times, then jump to then next step. • Otherwise, return to the first step. 5. Revert to the cooperation mode with which you registered the highest utility. • If this utility is negative, then stop cooperating. • Otherwise, remain in that cooperation mode so long as its utility is high enough and the environment has not changed too much. • If the environment has changed considerably, then jump back to the first step. • If the utility of the current cooperation mode drops below that of another cooperation mode, then delete the current cooperation mode and return to the beginning of this step.
The Complexity of the Déjà Vu strategy • Cooperation modes are treated as points in a uniform probability space. • We define a geometrical random variable that counts how many steps have passed between two subsequent returns to an arbitrary cooperation mode. • Its expectation is equal to the number of cooperation modes the agent supports. • Using Chebyshev’s inequality, we obtain that: is the agent’s degree of boldness) (where