220 likes | 301 Views
Exploiting Coordination Locales in DisPOMDPs via Social Model Shaping. Pradeep Varakantham Singapore Management University. Joint work with J.Y.Kwak, M.Taylor , J. Marecki, P. Scerri, M.Tambe. Motivating Domains. Sensor Networks. Disaster Rescue. Characteristics of Domains:
E N D
Exploiting Coordination Locales in DisPOMDPs via Social Model Shaping PradeepVarakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe
Motivating Domains Sensor Networks Disaster Rescue • Characteristics of Domains: • Uncertainty • Coordinating multiple agents • Sequential decision making
Meeting the challenges • Problem: • Multiple agents coordinating to perform multiple tasks in presence of uncertainty • Sol: Represent as Distributed POMDPs and solve • NEXP Complete for optimal solution • Approximate algorithm to dynamically exploit structure in interactions • Result: Vast improvement in performance over existing algorithms
Outline • Illustrative Domain • Model • Approach: Exploit dynamic structure in interactions • Results
Illustrative Domain • Multiple types of robots • Uncertainty in movements • Reward • Saving victims • Collisions • Clearing debris • Maximize expected joint reward
Model • DisPOMDPs with Coordination Locales, DPCL • Joint model: <S, A, Ω, P, R, O, Ag> • Global state represents completion of tasks • Agents independent except in coordination locales, CLs • Two types of CLs: • Same time CL (Ex: Agents colliding with each other) • Future time CL (Ex: Cleaner robot cleaning the debris assists rescue robot in reaching the goal) • Individual observability
Solving DPCLs with TREMOR • Teams REshaping of MOdels for Rapid execution • Two steps: • Branch and Bound search • MDP based heuristics • Task Assignment evaluation • By computing policies for every agent • Perform only joint policy computation at CLs
2. Task Assignment Evaluation • Until convergence of policies or maximum iterations: • Solve individual POMDPs • Identify potential coordination locales • Based on type and value of coordination : • Shape P and R of relevant individual agents • Capture interactions • Encourage/Discourage interactions • Go to step 1
Identifying potential CLs • CL = <State, Action> • Probability of CL occurring at a time step, T • Given starting belief • Standard belief update given policy Policy over belief states Updating “b” Probability of observing w, in belief state “b”
Type of CL • STCL, if there exists “s” and “a” for which • Transition/Reward function not decomposable, • P(s,a,s’) ≠ Π1≤i≤NP((sg,si),ai,(sg’,si’)) OR • R(s,a,s’) ≠ Σ1≤i≤NR((sg,si),ai,(sg’,si’)) • FTCL, • Completion of task (global state) by an agent at t’ affects transitions/rewards of other agents at t
Shaping Model (STCL) • Shaping transition function Joint transition probability when CL occurs New transition probability for agent “i” • Shaping rewardfunction
Results • Benchmark Algorithms • Independent POMDPs • Memory Bounded Dynamic Programming (MBDP) • Criterion • Decision quality • Run-time • Parameters: (i) agents; (ii) CLs; (iii) states; (iv) horizon
Related work • Existing Research • DEC-MDPs • Assuming individual or collective full observability • Task allocation and dependencies as input • DEC-POMDPs • JESP • MBDP • Exploiting independence in transition/reward/observation. • Model Shaping • Guestrin and Gordon, 2002
Conclusion • DPCL, a specialization of Distributed POMDPs • TREMOR exploits presence of few CLs in domains • TREMOR depends on single agent POMDP solvers • Results: • TREMOR outperformed DisPOMDP algorithms, except in tightly coupled small problems
Same Time CL (STCL) • There is an STCL, if • Transition function not decomposable, OR • P(s,a,s’) ≠ Π1≤i≤NP((sg,si),ai,(sg’,si’)) • Observation function not decomposable, OR • O(s’,a,o) ≠ Π 1≤i≤NO(oi,ai,(sg’,si’)) • Reward function not decomposable • R(s,a,s’) ≠ Σ1≤i≤NR((sg,si),ai,(sg’,si’)) • Ex: Two robots colliding in a narrow corridor
Future Time CL • Actions of one agent at “ t’ ” can affect • transitions OR observations OR rewards of other agents at “ t ” • P((stg,sti),ati,(stg’,sti’)|ajt’ ) ≠ P((stg,sti),ati,(stg’,sti’)) , ¥ t’ < t • R((stg,sti),ati,(stg’,sti’)|ajt’ ) ≠ R((stg,sti),ati,(stg’,sti’)) , ¥ t’ < t • O(wti,ati,(stg’,sti’)|ajt’ ) ≠ O(wti,ati,(stg’,sti’)) , ¥ t’ < t • Ex: Clearing of debris assists rescue robots in getting to victims faster