1 / 22

Exploiting Coordination Locales in DisPOMDPs via Social Model Shaping

Exploiting Coordination Locales in DisPOMDPs via Social Model Shaping. Pradeep Varakantham Singapore Management University. Joint work with J.Y.Kwak, M.Taylor , J. Marecki, P. Scerri, M.Tambe. Motivating Domains. Sensor Networks. Disaster Rescue. Characteristics of Domains:

sydney
Download Presentation

Exploiting Coordination Locales in DisPOMDPs via Social Model Shaping

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Exploiting Coordination Locales in DisPOMDPs via Social Model Shaping PradeepVarakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe

  2. Motivating Domains Sensor Networks Disaster Rescue • Characteristics of Domains: • Uncertainty • Coordinating multiple agents • Sequential decision making

  3. Meeting the challenges • Problem: • Multiple agents coordinating to perform multiple tasks in presence of uncertainty • Sol: Represent as Distributed POMDPs and solve • NEXP Complete for optimal solution • Approximate algorithm to dynamically exploit structure in interactions • Result: Vast improvement in performance over existing algorithms

  4. Outline • Illustrative Domain • Model • Approach: Exploit dynamic structure in interactions • Results

  5. Illustrative Domain • Multiple types of robots • Uncertainty in movements • Reward • Saving victims • Collisions • Clearing debris • Maximize expected joint reward

  6. Model • DisPOMDPs with Coordination Locales, DPCL • Joint model: <S, A, Ω, P, R, O, Ag> • Global state represents completion of tasks • Agents independent except in coordination locales, CLs • Two types of CLs: • Same time CL (Ex: Agents colliding with each other) • Future time CL (Ex: Cleaner robot cleaning the debris assists rescue robot in reaching the goal) • Individual observability

  7. Solving DPCLs with TREMOR • Teams REshaping of MOdels for Rapid execution • Two steps: • Branch and Bound search • MDP based heuristics • Task Assignment evaluation • By computing policies for every agent • Perform only joint policy computation at CLs

  8. 1. Branch and Bound search

  9. 2. Task Assignment Evaluation • Until convergence of policies or maximum iterations: • Solve individual POMDPs • Identify potential coordination locales • Based on type and value of coordination : • Shape P and R of relevant individual agents • Capture interactions • Encourage/Discourage interactions • Go to step 1

  10. Identifying potential CLs • CL = <State, Action> • Probability of CL occurring at a time step, T • Given starting belief • Standard belief update given policy Policy over belief states Updating “b” Probability of observing w, in belief state “b”

  11. Type of CL • STCL, if there exists “s” and “a” for which • Transition/Reward function not decomposable, • P(s,a,s’) ≠ Π1≤i≤NP((sg,si),ai,(sg’,si’)) OR • R(s,a,s’) ≠ Σ1≤i≤NR((sg,si),ai,(sg’,si’)) • FTCL, • Completion of task (global state) by an agent at t’ affects transitions/rewards of other agents at t

  12. Shaping Model (STCL) • Shaping transition function Joint transition probability when CL occurs New transition probability for agent “i” • Shaping rewardfunction

  13. Results • Benchmark Algorithms • Independent POMDPs • Memory Bounded Dynamic Programming (MBDP) • Criterion • Decision quality • Run-time • Parameters: (i) agents; (ii) CLs; (iii) states; (iv) horizon

  14. State space

  15. Agents

  16. Coordination Locales

  17. Time Horizon

  18. Related work • Existing Research • DEC-MDPs • Assuming individual or collective full observability • Task allocation and dependencies as input • DEC-POMDPs • JESP • MBDP • Exploiting independence in transition/reward/observation. • Model Shaping • Guestrin and Gordon, 2002

  19. Conclusion • DPCL, a specialization of Distributed POMDPs • TREMOR exploits presence of few CLs in domains • TREMOR depends on single agent POMDP solvers • Results: • TREMOR outperformed DisPOMDP algorithms, except in tightly coupled small problems

  20. Questions?

  21. Same Time CL (STCL) • There is an STCL, if • Transition function not decomposable, OR • P(s,a,s’) ≠ Π1≤i≤NP((sg,si),ai,(sg’,si’)) • Observation function not decomposable, OR • O(s’,a,o) ≠ Π 1≤i≤NO(oi,ai,(sg’,si’)) • Reward function not decomposable • R(s,a,s’) ≠ Σ1≤i≤NR((sg,si),ai,(sg’,si’)) • Ex: Two robots colliding in a narrow corridor

  22. Future Time CL • Actions of one agent at “ t’ ” can affect • transitions OR observations OR rewards of other agents at “ t ” • P((stg,sti),ati,(stg’,sti’)|ajt’ ) ≠ P((stg,sti),ati,(stg’,sti’)) , ¥ t’ < t • R((stg,sti),ati,(stg’,sti’)|ajt’ ) ≠ R((stg,sti),ati,(stg’,sti’)) , ¥ t’ < t • O(wti,ati,(stg’,sti’)|ajt’ ) ≠ O(wti,ati,(stg’,sti’)) , ¥ t’ < t • Ex: Clearing of debris assists rescue robots in getting to victims faster

More Related