770 likes | 900 Views
Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents. Prasanna Velagapudi Pradeep Varakantham Paul Scerri Katia Sycara. Motivation. Search & Rescue. Military C2. Convoy Planning. 100s to 1000s of robots, agents, people Complex, collaborative tasks
E N D
Distributed Model Shaping for Scaling to Decentralized POMDPs with hundreds of agents Prasanna Velagapudi PradeepVarakantham Paul Scerri Katia Sycara D-TREMOR - AAMAS2011
Motivation Search & Rescue Military C2 Convoy Planning • 100s to 1000s of robots, agents, people • Complex, collaborative tasks • Dynamic, uncertain environment • Offline planning Disaster Response D-TREMOR - AAMAS2011
Motivation • Exploit three characteristics of these domains • Explicit Interactions • Specific combinations of states and actions where effects depend on more than one agent • Sparsity of Interactions • Many potential interactions could occur between agents • Only a few will occur in any given solution • Distributed Computation • Each agent has access to local computation • A centralized algorithm has access to 1 unit of computation • A distributed algorithm has access to Nunits of computation D-TREMOR - AAMAS2011
Review: Dec-POMDP 1 2 : Joint Transition : Joint Reward : Joint Observation D-TREMOR - AAMAS2011
Distributed POMDP with Coordination Locales [Varakantham, et al 2009] CL = Nature of time constraint (e.g. affects only same-time, affects any future-time) Time constraint Relevant region of joint state-action space D-TREMOR - AAMAS2011
Distributed POMDP with Coordination Locales [Varakantham, et al 2009] : CL = : D-TREMOR - AAMAS2011
D-TREMOR(extending TREMOR [Varakantham, et al 2009]) Decentralized auction Task Allocation Local Planning EVA POMDP solver Interaction Exchange Policy sub-sampling and Coordination Locale (CL) messages Model Shaping Prioritized/randomized reward and transition shaping D-TREMOR - AAMAS2011
D-TREMOR: Task Allocation • Assign “tasks” using decentralized auction • Greedy, nearest allocation • Create local, independent sub-problem: D-TREMOR - AAMAS2011
D-TREMOR: Local Planning • Solve using off-the-shelf algorithm (EVA) • Result: locally-optimal policies D-TREMOR - AAMAS2011
D-TREMOR: Interaction Exchange Entered corridor in 95 of 100 runs: PrCLi= 0.95 No collision FindPrCLiand ValCLi: • Send CL messages to teammates: [Kearns 2002] +1 ValCLi= -7 Collision -6 D-TREMOR - AAMAS2011
D-TREMOR: Model Shaping • Shape local model rewards/transitions based oninteractions Probability of interaction Interaction model functions Independent model functions D-TREMOR - AAMAS2011 11
D-TREMOR: Local Planning (again) • Re-solve shaped local models to get new policies • Result: new locally-optimal policies new interactions D-TREMOR - AAMAS2011 12
D-TREMOR: Adv. Model Shaping • In practice, we run into three common issues faced by concurrent optimization algorithms: • Slow convergence • Oscillation • Local optima • We can alter our model-shaping to mitigate these by reasoning about the types of interactions we have D-TREMOR - AAMAS2011
D-TREMOR: Adv. Model Shaping • Slow convergence Prioritization • Assign priorities to agents, only model-shape collision interactions for higher priority agents • Can quickly resolve purely negativeinteractions • Negative interaction: when every agent is guaranteed to have a lower-valued local policy if an interaction occurs D-TREMOR - AAMAS2011
D-TREMOR: Adv. Model Shaping • Oscillation Probabilistic shaping • Often caused by time dynamics between agents • Agent 1 shapes based on Agent 2’s old policy • Agent 2 shapes based on Agent 1’s old policy • Each agent only applies model-shaping with probability δ[Zhang 2005] • Breaks out of cycles between agent policies D-TREMOR - AAMAS2011
D-TREMOR: Adv. Model Shaping • Local Optima Optimistic initialization • Agents cannot detect mixed interactions (e.g. debris) • Rescue agent policies can only improve if debris is cleared • Cleaner agent policies can only worsen if they clear debris I’m not going near the debris I’m not clearing the debris Ifno one is going through debris, I won’t clear it D-TREMOR - AAMAS2011
D-TREMOR: Adv. Model Shaping • Local Optima Optimistic initialization • Agents cannot detect mixed interactions (e.g. debris) • Rescue agent policies can only improve if debris is cleared • Cleaner agent policies can only worsen if they clear debris • Let each agent solve an initial model that uses an optimistic assumption of interaction condition D-TREMOR - AAMAS2011
Experimental Setup • D-TREMOR policies • Max-joint-value • Last iteration • Comparison policies • Independent • Optimistic • Do-nothing • Random • Scaling: • 10 to 100 agents • Random maps • Density • 100 agents • Concentric ring maps • 3 problems/condition • 20 planning iterations • 7 time step horizon • 1 CPU per agent D-TREMOR produces reasonable policies for 100-agent planning problems in under 6 hrs. (with some caveats) D-TREMOR - AAMAS2011
Experimental Datasets Scaling Dataset Density Dataset D-TREMOR - AAMAS2011
Experimental Results: Scaling D-TREMOR Policies Naïve Policies D-TREMOR - AAMAS2011
Experimental Results: Density D-TREMOR rescues the most victims D-TREMOR does not resolve every collision +10 ea. -5 ea. D-TREMOR - AAMAS2011
Experimental Results: Time Increase in time related to # of CLs, not # of agents # of CLs Active D-TREMOR - AAMAS2011
Conclusions • D-TREMOR: Decentralized planning for sparse Dec-POMDPs with many agents • Demonstrated complete distributability, fast heuristic interaction detection, and local message exchange to achieve high scalability • Empirical results in simulated search and rescue domain D-TREMOR - AAMAS2011
Future Work • Generalized framework for distributed planning under uncertainty through iterative message exchange • Optimality/convergence bounds • Reduce necessary communication • Better search over task allocations • Scaling to larger team sizes D-TREMOR - AAMAS2011
Questions? D-TREMOR - AAMAS2011
Motivation • Scaling planning to large teams is hard • Need to plan (with uncertainty) for each agent in team • Agents must consider the actions of a growing number of teammates • Full, joint problem has NEXP complexity [Bernstein 2002] • Optimality is going to be infeasible • Find and exploit structure in the problem • Make good plans in reasonable amount of time D-TREMOR - AAMAS2011
Motivation • Exploit three characteristics of these domains • Explicit Interactions • Specific combinations of states and actions where effects depend on more than one agent • Sparsity of Interactions • Many potential interactions could occur between agents • Only a few will occur in any given solution • Distributed Computation • Each agent has access to local computation • A centralized algorithm has access to 1 unit of computation • A distributed algorithm has access to Nunits of computation D-TREMOR - AAMAS2011
Experimental Results: Density Do-nothing does the best? Ignoring interactions = poor performance D-TREMOR - AAMAS2011
Experimental Results: Time Why is this increasing? D-TREMOR - AAMAS2011
Related Work D-TREMOR Scalability Prioritized Planning TREMOR OC-Dec-MDP Dynamic Networks SPIDER Generality DPC EDI-CR TD-Dec-POMDP Optimal Decoupling Optimality JESP D-TREMOR - AAMAS2011 Structured Dec-(PO)MDP planners • JESP [Nair 2003] • TD-Dec-POMDP [Witwicki 2010] • EDI-CR [Mostafa 2009] • SPIDER [Marecki 2009] • Restrict generality slightly to get scalability • High optimality
Related Work D-TREMOR Scalability Prioritized Planning TREMOR OC-Dec-MDP Dynamic Networks SPIDER Generality DPC EDI-CR TD-Dec-POMDP Optimal Decoupling Optimality JESP D-TREMOR - AAMAS2011 Heuristic Dec-(PO)MDP planners • TREMOR [Varakantham 2009] • OC-Dec-MDP [Beynier 2005] • Sacrifice optimality for scalability • High generality
Related Work D-TREMOR Scalability Prioritized Planning TREMOR OC-Dec-MDP Dynamic Networks SPIDER Generality DPC EDI-CR TD-Dec-POMDP Optimal Decoupling Optimality JESP D-TREMOR - AAMAS2011 Structured multiagent path planners • DPC [Bhattacharya 2010] • Optimal Decoupling [Van den Berg 2009] • Sacrifice generality further to get scalability • High optimality
Related Work D-TREMOR Scalability Prioritized Planning TREMOR OC-Dec-MDP Dynamic Networks SPIDER Generality DPC EDI-CR TD-Dec-POMDP Optimal Decoupling Optimality JESP D-TREMOR - AAMAS2011 Heuristic multiagent path planners • Dynamic Networks [Clark 2003] • Prioritized Planning [Van den Berg 2005] • Sacrifice optimality to get scalability
Related Work D-TREMOR Scalability Prioritized Planning TREMOR OC-Dec-MDP Dynamic Networks SPIDER Generality DPC EDI-CR TD-Dec-POMDP Optimal Decoupling Optimality JESP D-TREMOR - AAMAS2011 Our approach: Fix high scalability and generality Explore what level of optimality is possible
A Simple Rescue Domain Unsafe Cell Rescue Agent Clearable Debris Narrow Corridor Cleaner Agent Victim D-TREMOR - AAMAS2011
A Simple (Large) Rescue Domain D-TREMOR - AAMAS2011
Distributed POMDP with Coordination Locales (DPCL) • Often, interactions between agents are sparse Only fits one agent Passable if cleaned [Varakantham, et al 2009] D-TREMOR - AAMAS2011
Distributed, Iterative Planning • Inspiration: • TREMOR [Varankantham 2009] • JESP [Nair 2003] • Reduce the full joint problem into a set of smaller, independent sub-problems • Solve independent sub-problems with local algorithm • Modify sub-problems to push locally optimal solutions towards high-quality joint solution D-TREMOR - AAMAS2011
Distributed Team REshaping of MOdels for Rapid execution (D-TREMOR) • Reduce the full joint problem into a set of smaller, independent sub-problems (one for each agent) • Solve independent sub-problems with existing state-of-the-art algorithms • Modify sub-problems such that local optimum solution approaches high-quality joint solution Task Allocation Local Planning Interaction Exchange Model Shaping D-TREMOR - AAMAS2011
D-TREMOR(extending [Varakantham, et al 2009]) Decentralized auction Task Allocation Local Planning EVA POMDP solver Interaction Exchange Policy sub-sampling and Coordination Locale (CL) messages Model Shaping Prioritized/randomized reward and transition shaping D-TREMOR - AAMAS2011
D-TREMOR: Task Allocation • Assign “tasks” using decentralized auction • Greedy, nearest allocation • Create local, independent sub-problem: D-TREMOR - AAMAS2011
D-TREMOR: Local Planning • Solve using off-the-shelf algorithm (EVA) • Result: locally-optimal policies D-TREMOR - AAMAS2011
D-TREMOR: Interaction Exchange Finding PrCLi • Evaluate local policy • Compute frequency of associated si, ai [Kearns 2002]: Entered corridor in 95 of 100 runs: PrCLi= 0.95 D-TREMOR - AAMAS2011
D-TREMOR: Interaction Exchange No collision Finding ValCLi • Sample local policy value with/without interactions • Test interactions independently • Compute change in value if interaction occurred [Kearns 2002]: +1 ValCLi= -7 Collision -6 D-TREMOR - AAMAS2011
D-TREMOR: Interaction Exchange • Send CL messages to teammates: • SparsityRelatively small # of messages D-TREMOR - AAMAS2011
D-TREMOR: Model Shaping • Shape local model rewards/transitions based on remote interactions Probability of interaction Interaction model functions Independent model functions D-TREMOR - AAMAS2011 47
D-TREMOR: Local Planning (again) • Re-solve shaped local models to get new policies • Result: new locally-optimal policies new interactions D-TREMOR - AAMAS2011 48
D-TREMOR: Adv. Model Shaping • In practice, we run into three common issues faced by concurrent optimization algorithms: • Slow convergence • Oscillation • Local optima • We can alter our model-shaping to mitigate these by reasoning about the types of interactions we have D-TREMOR - AAMAS2011
D-TREMOR: Adv. Model Shaping • Slow convergence Prioritization • Majority of interactions are collisions • Assign priorities to agents, only model-shape collision interactions for higher priority agents • From DPP: prioritization can quickly resolve collision interactions • Similar properties for any purely negative interaction • Negative interaction: when every agent is guaranteed to have a lower-valued local policy if an interaction occurs D-TREMOR - AAMAS2011