Exploiting Coordination Locales in DisPOMDPs via Social Model Shaping

Exploiting Coordination Locales in DisPOMDPs via Social Model Shaping PradeepVarakantham Singapore Management University Joint work with J.Y.Kwak, M.Taylor, J. Marecki, P. Scerri, M.Tambe

Motivating Domains Sensor Networks Disaster Rescue • Characteristics of Domains: • Uncertainty • Coordinating multiple agents • Sequential decision making

Meeting the challenges • Problem: • Multiple agents coordinating to perform multiple tasks in presence of uncertainty • Sol: Represent as Distributed POMDPs and solve • NEXP Complete for optimal solution • Approximate algorithm to dynamically exploit structure in interactions • Result: Vast improvement in performance over existing algorithms

Outline • Illustrative Domain • Model • Approach: Exploit dynamic structure in interactions • Results

Illustrative Domain • Multiple types of robots • Uncertainty in movements • Reward • Saving victims • Collisions • Clearing debris • Maximize expected joint reward

Model • DisPOMDPs with Coordination Locales, DPCL • Joint model: <S, A, Ω, P, R, O, Ag> • Global state represents completion of tasks • Agents independent except in coordination locales, CLs • Two types of CLs: • Same time CL (Ex: Agents colliding with each other) • Future time CL (Ex: Cleaner robot cleaning the debris assists rescue robot in reaching the goal) • Individual observability

Solving DPCLs with TREMOR • Teams REshaping of MOdels for Rapid execution • Two steps: • Branch and Bound search • MDP based heuristics • Task Assignment evaluation • By computing policies for every agent • Perform only joint policy computation at CLs

1. Branch and Bound search

2. Task Assignment Evaluation • Until convergence of policies or maximum iterations: • Solve individual POMDPs • Identify potential coordination locales • Based on type and value of coordination : • Shape P and R of relevant individual agents • Capture interactions • Encourage/Discourage interactions • Go to step 1

Identifying potential CLs • CL = <State, Action> • Probability of CL occurring at a time step, T • Given starting belief • Standard belief update given policy Policy over belief states Updating “b” Probability of observing w, in belief state “b”

Type of CL • STCL, if there exists “s” and “a” for which • Transition/Reward function not decomposable, • P(s,a,s’) ≠ Π1≤i≤NP((sg,si),ai,(sg’,si’)) OR • R(s,a,s’) ≠ Σ1≤i≤NR((sg,si),ai,(sg’,si’)) • FTCL, • Completion of task (global state) by an agent at t’ affects transitions/rewards of other agents at t

Shaping Model (STCL) • Shaping transition function Joint transition probability when CL occurs New transition probability for agent “i” • Shaping rewardfunction

Results • Benchmark Algorithms • Independent POMDPs • Memory Bounded Dynamic Programming (MBDP) • Criterion • Decision quality • Run-time • Parameters: (i) agents; (ii) CLs; (iii) states; (iv) horizon

State space

Agents

Coordination Locales

Time Horizon

Related work • Existing Research • DEC-MDPs • Assuming individual or collective full observability • Task allocation and dependencies as input • DEC-POMDPs • JESP • MBDP • Exploiting independence in transition/reward/observation. • Model Shaping • Guestrin and Gordon, 2002

Conclusion • DPCL, a specialization of Distributed POMDPs • TREMOR exploits presence of few CLs in domains • TREMOR depends on single agent POMDP solvers • Results: • TREMOR outperformed DisPOMDP algorithms, except in tightly coupled small problems

Questions?

Same Time CL (STCL) • There is an STCL, if • Transition function not decomposable, OR • P(s,a,s’) ≠ Π1≤i≤NP((sg,si),ai,(sg’,si’)) • Observation function not decomposable, OR • O(s’,a,o) ≠ Π 1≤i≤NO(oi,ai,(sg’,si’)) • Reward function not decomposable • R(s,a,s’) ≠ Σ1≤i≤NR((sg,si),ai,(sg’,si’)) • Ex: Two robots colliding in a narrow corridor

Future Time CL • Actions of one agent at “ t’ ” can affect • transitions OR observations OR rewards of other agents at “ t ” • P((stg,sti),ati,(stg’,sti’)|ajt’ ) ≠ P((stg,sti),ati,(stg’,sti’)) , ¥ t’ < t • R((stg,sti),ati,(stg’,sti’)|ajt’ ) ≠ R((stg,sti),ati,(stg’,sti’)) , ¥ t’ < t • O(wti,ati,(stg’,sti’)|ajt’ ) ≠ O(wti,ati,(stg’,sti’)) , ¥ t’ < t • Ex: Clearing of debris assists rescue robots in getting to victims faster

Exploiting Coordination Locales in DisPOMDPs via Social Model Shaping

Exploiting Coordination Locales in DisPOMDPs via Social Model Shaping

Presentation Transcript

Enhancing coordination in Social Protection

Exploiting Social Interactions in Mobile Systems

Care Coordination Model CCM Principles

Modernised Social Security Coordination

Exploiting diverse knowledge source via Maximum Entropy in Name Entity Recognition

Recent developments in the social security coordination

Social Network Analysis via Factor Graph Model

Distributed POMDPs with Coordination Locales (DPCLs)

EEG Coordination Dynamics Neuromarkers of Social Coordination E. Tognoli

Janus : exploiting parallelism via hindsight

Exploiting Social Networks for your Institute

Exploiting Route Redundancy via Structured Peer to Peer Overlays

Social Networks of Agents Exploiting a Resource

Exploiting n a ture for social justice?

Exploiting Routing Redundancy via Structured Peer-to-Peer Overlays

The Social Ecological Model in Action in Social Marketing

Recent developments in EU social security coordination

Exploiting diverse knowledge source via Maximum Entropy in Name Entity Recognition

Enhancing coordination in Social Protection

Exploiting self-organisation in techno-social systems

Exploiting Route Redundancy via Structured Peer to Peer Overlays