150 likes | 277 Views
Towards A Formalization Of Teamwork With Resource Constraints. Praveen Paruchuri, Milind Tambe, Fernando Ordonez University of Southern California Sarit Kraus Bar-Ilan University,Israel University of Maryland, College Park December,2003. Motivation: Teamwork with Resource Constraints.
E N D
Towards A Formalization Of Teamwork With Resource Constraints Praveen Paruchuri, Milind Tambe, Fernando Ordonez University of Southern California Sarit Kraus Bar-Ilan University,Israel University of Maryland, College Park December,2003
Motivation: Teamwork with Resource Constraints Agent teams:Agents maximize team rewards and also ensure limited resource consumption E.g., Limited communication bandwidth, limited battery power etc Example Domain: • Sensor Net Agents - Limited replenishable energy • Mars Rovers - Limited energy for each daily activity
Framework & Context • Framework for agent teams with resource constraints in complex and dynamic environments • Resource constraints soft, not “hard” • Okay for Sensor to exceed energy threshold when needed. • Okay for Mars rover to exceed allocated energy once in a while for a regular activity. CMDP ??? MDP POMDP MTDP Context Single Agent Multi Agent With resource Constrains
Our Contributions • Extended MTDP ( EMTP ) – A Distributed MDP framework • EMTDP ≠ CMDP with many agents. • Policy Randomization in CMDP • Causes miscoordination in teams. • Algorithm for transforming conjoined EMTDP (initial formulation dealing with joint actions) into actual EMTDP (reasoning about individual actions). • Proof of equivalence between different transformations. • Solution algorithm for the actual EMTDP. Bound Expected Resource Consumption Maximize Expected Team Reward
E-MTDP: Formally Defined • An E-MTDP (for a 2 agent case) is a tuple <S,A,P,R,C1,C2,T1,T2,N,Q>where, • S,A,P,R : As defined in MTDP. • C1 = [ ]: Vector of cost of resource k for joint action a in state i ( for agent 1). • T1 = [ ]: Threshold on expected resource k consumption. • N = [ ]: Vector of joint communication costs for joint action a in state i. • Q : Threshold on communication costs • Simplifying assumptions: • Individual observability (no POMDPs) • Two agents
Conjoined EMTDP – Simple example • Two agent case R(S1,a2b2)=9 C1(S1,a2b2)=7 C2(S1,a2b2)=7 a1b1=.3 S4 S2 a1b2=.9 a2b1=.7 a2b2=1 a2b1=.3 a1b2=.9 S1 a2b1=.7 S3 S5 a1b2=.7 a1b2=.1 a1b1=1 S6 S7
Linear Program : Solving Conjoined EMTDP Maximizing Reward LP for solving MDP Expected cost of resource k over all states and actions less than t1 Handling constraints
Sample LP solution VISITED( X11) 0.000000 a1b1 to be executed 0% time VISITED( X12) 0.3653846 a1b2 : 36% = 9/25 VISITED( X13) 0.6346154 a2b1 : 64% = 16/25 VISITED( X14) 0.000000 a2b2 : 0% Should have been 0. (Miscoordination)
Conjoined to Actual EMTDP: Transformation A1c S1 S1 For each state, for each joint action, Introduce a communication and non-communication action for each different individual action and add corresponding new states Introduce transition between original and new states Introduce transitions between new states and original target states
Non-linear Constraints • Need to introduce non-linear constraints • For each original state • For each new state introduced by no communication action • Set conditional probability of corresponding actions equal Ex: P(b1/ ) = P(b1/ )=……=P(b1/ ) && …….. && P(bn/ ) = P(bn/ )=……=P(bn/ ). , , , - Observable, Reached by Comm action , , , - Unobservable, No Comm action
Reason for non-linear constraints • Agent B has no hint of state if NC actions. • Necessity to make its actions independent of source state. • Probability of action b1 from state should equal probability of same action (i.e b1) from . • Miscoordination avoided Actions independent of state. Transformation example -
Experimental Results Fig 1 S1 - 0 S9 - 8
Experiments : Example domain 2 Domain 1: Comparing Expected rewards – Comm Threshold Conjoined Deterministic Miscoordination EMTDP 0 10.55 0 No reward 6.99 3 10.55 0 No 8.91 6 10.55 0 No 10.55 ( Miscoordination resulted in violating resource constraints ) Domain 2 - • A team of two rovers and several scientists using them • Each scientist has a daily routine of observations • Rover can use a limited amount of energy in serving a scientist • Experiment conducted: Observe Martian rocks • Rovers Maximize observation output within the energy budget provided. • Soft constraint – Exceeding energy budget on a day is not catastrophic • Overutilizing frequently affects other scientist’s work • Uncertainty – Only .75 chance of succeeding in an observation EMTDP had about 180 states, 1500 variables and 40 non-linear constraints. Could handle problem of this order in below 20 secs.
Summary and Future Work • Novel formalization of teamwork with resource constraints • Maximize expected team reward but bound expected resource consumption. • Provided a EMTDP formulation where agents avoid miscoordination even though randomized policies. • Proved equivalence of different EMTDP transformation strategies ( see paper for details ) • Introduction of non-linear constraints. • Future Work - • Need to fix on complexity. • Experiment on n-agent case. • Extend work to partially observable domains.
Thank You Any Questions ???