1 / 5

An Accelerated Gradient Method for Multi-Agent Planning in Factored MDPs

An Accelerated Gradient Method for Multi-Agent Planning in Factored MDPs. Sue Ann Hong Geoff Gordon. Carnegie Mellon University. Multi-agent planning. Optimize Shared constraints resources. Individual constraints Individual objective.

odelia
Download Presentation

An Accelerated Gradient Method for Multi-Agent Planning in Factored MDPs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Accelerated Gradient Method for Multi-Agent Planning in Factored MDPs Sue Ann Hong Geoff Gordon Carnegie Mellon University

  2. Multi-agent planning Optimize Shared constraints resources Individual constraints Individual objective

  3. Factored MDPs[Guestrin et al., 2002] • Want: an efficient, distributed solver Piece-wise linear constraints on shared resources Optimize Shared constraints resources MDP: maximize linear reward Fast solver: value iteration Individual constraints Individual objective

  4. Distributed optimizationLagrangian relaxation Resource 1 @ $100 • How to set the prices? Gradient-based methods. 2 NO NO 1 2 Solve in a distributed fashion $300 $50 $80 $200 $100

  5. FISTA for factored MDPs • linear objective  : augment with a strongly convex function: causal entropy[Ziebart et al., 2010] • Usually regularization towards a more uniform policy • Retains a fast individual planner (softmax value iteration) • Introduces smoothing error (to the linear objective) • We show that the gain in convergence can outweigh the approximation (smoothing) error.

More Related