310 likes | 437 Views
MURI Review MIT. Eric Feron Joint Work with Jan De Mot, Vishwesh Kulkarni, Sommer Gentry, Tom Schouwenaars, and Vladislav Gavrilets. at the Laboratory for Information and Decision Systems, MIT. Efficiency = . ??. Number of UAVs.
E N D
MURI ReviewMIT Eric Feron Joint Work with Jan De Mot, Vishwesh Kulkarni, Sommer Gentry, Tom Schouwenaars, and Vladislav Gavrilets. at the Laboratory for Information and Decision Systems, MIT.
Efficiency = ?? Number of UAVs We view spatial distribution of the UAVs as a key factor and present original results concerning the UAV separations and the UAV placements. Overview • Efficient multi-agent operations require robust, optimal coordination policies. • UAV specifications constrain deployable coordination policies. • How may we improve our understanding of these constraints? • How may we use it to synthesize more efficient coordination policies?
Coordinated Path Planning (CPP) • CPP Problem Setting • UAVs need to go from a point s to a point t. • Environment is dynamic and uncertain. • UAVs cooperate by sharing the acquired local information. • UAVs have limited resources. GOAL: Optimize the traversal efficiency. • Questions • What is the spatial distribution under an optimal policy? • We have characterized the separation bounds. • How many UAVs are needed? • We do not know the full answer yet!
Multi-Agent Exploration of Unknown Environments • Probabilistic map building of Burgard et al [2002] uses deterministic value iteration to determine the next optimal observation point. • The market architecture of Zlot et al [2002] auctions off the next optimal observation points obtained by solving a TSP. • The end goal is spanning rather than CPP. • CPP as Multi-Agent MDPs • Boutilier et al [2000]. We consider partially observable MDPs. • Greedy policy pursuit-evasion games of Hespanha et al [2002]. Related Past Works agent known region unknown region new region We present new results in a coordinated target acquisition setting using DP.
Our CPP Problem • Gray zones: obstacles • Red zones: danger • Terrain is mapped into regions having payoffs indicating for example the distance to a target t or potential threats. • To each region is associated a node. • Links connect the nodes forming a graph. • To each link is associated a link cost reflecting: • The payoff of the goal region and • The traverse cost of the link. • Cluster of agents (e.g. UAV’s) need to reach t from s.
Model of the Environment: Cylindrical Graph • Cylindrical: reduces the size of the state space and therefore the computational complexity (eliminating boundaries). • Infinitely long: target sits at infinity, reduces computational complexity. • Graph Gm contains m horizontal arrays of nodes. The figure shows the cylinder cut open and flattened out.
Observation Zones • Each agent observes a set of links from its current location, the observation zone, using on board sensors. • The observation zone used in the rest of this presentation is: • Each agent observes the link costs of two links in the target direction, • Each agent has no local information on any other link cost. • Links not belonging to any observation zone are subject to a set of assumptions: • Its cost belongs to the set {0, 0.5, …, 3}, and • Independent Identically Distributed (i.i.d.). • For example: • Two UAV’s • Each observes the cost of the two red links • On the black links there is no more local information
Cooperating Agents • Agents share information and have memory: • What is observed by each agent is shared with all other agents • Once a link cost is known, it remains known. • Therefore: the cluster agents cooperate.
Goal for the Agents • Goal: Find a path for each agent such that, under the assumptions stated, the following is minimized: • Remark: discount factor a leads to giving less importance to future costs. • In other words: Find a path for each agent so that the whole cluster moves in the cheapest possible way infinitely long in the direction of the target. • Remark: After each move, the cluster sits on the same vertical line, or stage. The agents move synchronously. • N: number of agents • a: problem discount factor (0<a<1) • ci,k: link cost of link traversed by agent i leaving stage k • E[.]: expected value over the unknown costs ci,k, given the initial position of the cluster
Dynamic Programming Formulation • To formulate the problem as a discrete dynamic program, write the total cost to be: Cost to go from stage 0 to infinity, given the initial state is x0 Cost to go from stage 0 to stage 1 Cost to go from stage 1 to infinity, discounted by factor a. • x0 comprises: • Initial cluster position • Link costs of observed links, • at stage 0, i.e. the initial stage. • u0 stands for the decision the agents take at stage 0 • p1 stands for the set of policies mk each • agent uses at each stage in the future • a policy mk(xk), takes the current state and gives us the input • to be applied.
Dynamic Programming Formulation • Rewrite to: • And this becomes: • Formulating this for a general stage k, yields : x1 denotes the state at stage 1
Dynamic Programming Formulation • Since the cost to go to infinity from stage k as is equal to the cost to go when starting from stage k+1, Bellman’s equation for a infinite horizon discrete system dynamic program is: • Remark: • x = (s, a1, a2, b1, b2), where separation s = 0 in case the two agents sit on one node, as in (i). (a1,b1) = (a2,b2) = (a, b), in case s = 0. • In (ii), for both agents to go straight, xnext = (1, b1, b2, 1, 2), where 1 and 2 denote variables unknown at the current stage (and over which E[.] is taken in Bellman’s equation).
Value Iteration • We need to find J*(x), the optimal costs of the discounted problem, unique solution of Bellman’s equation, (J*(x): optimal value function): • So that stationary policy m can be computed as: (a stationary policy m is independent of the stage and is optimal iff for all states x, m attains the minimum in Bellman’s eq.) • How do we compute J*(x) ? With value iteration: a numerical iterative algorithm which, starting from arbitrary initial conditions, converges to J*(x).
Value Iteration • The algorithm: • This is the finite horizon version of Bellman’s equation, with terminal cost J0, as initial condition. (index k has been reversed for notational convenience) It can be proven that if , this yields J*. • So: plug J0 in, in the right hand side, get J1, plug in and so on, infinitely many times. • Properties of value iteration: • , the error is bounded, d is a constant. • , where Lower bound Upper bound and
Two Agent Example G7, infinite horizon, discount factor a = 0.8
Cluster Separation Lemma Using optimal paths for two agents in , configurations , , and do not evolve into configurations with l > 2. The UAV separation is bounded in Conjecture 1: The UAV separation is bounded in Extra nodes should not affect the separation adversely. Conjecture 2: The UAV separation is bounded in in a pair-wise sense. Conjecture 1 should hold pair-wise in the n-agent setting. The CPP Separation Results G7, infinite horizon, discount factor a = 0.8 • Communication power, hierarchy tier sizes
Outline of the Proof of the Bounded Agent Separation • What do we want to prove? Two agents using optimal policies don’t separate more than two nodes apart. • Cn is the configuration in which the agents are n nodes apart. • In other words: we want to show that configuration C3, cannot be reached. • How? We show that there is no possible state where it is optimal for the agents to reach a three node separation (C3).
How can the agents reach a three node separation… • Given the agents are 0, 1 or 2 nodes apart (C0, C1 or C2), then in three situations (i, ii and iii), there exists a policy that leads to C3: • The green dots denote the position of the agents. • The blue arrows denote the policy leading to C3. Note that even in case (ii) this is the case since the graph is cylindrical.
… but never actually reach C3? • In each case, the red arrows, not leading to C3, denote a cheaper policy than the blue policy, according to the optimal value function. • Remark that the red policy is not necessarily optimal, we only claim that it is better than the blue policy.
Value Function J as function of separation s • Observation zone: • Link costs: Pr(0) = 0.5 Pr(3) = 0.3 Pr(7) = 0.2 • # nodes per stage: m = 40 • Note: • Minimum at s = 1 (adjacent nodes), smin = 1. • E[J(.)|s] monotonically increasing for increasing s • For large s, cost two cooperating agents incur = sum of cost two single non-cooperating agents incur E[J(.)|s] Separation s
How can we prove using a numerically obtained value function? • Value iteration is a numerical iterative method to solve Bellman’s equation of this problem, if infinitely many iterations are done: (Bellman’s eq.) • It also provides an upper and a lower bound of the optimal value function (J*). These bounds can be made arbitrarily close. • In practice: after a limited number of iterations (~10), we find a reasonable upper and lower bound of . • Let’s prove case (i): • For any value of a1 and b1, the red policy is better than the blue policy, or, the agents prefer to move closer than to move apart. (i)
Case (i) • Using Bellman’s equation we can calculate the cost to continue infinitely long on the graph for both policies: • Blue: Remark: J3* is the optimal value function for a separation of 3 and is function of the four observed links (only b1 known). • Red: J1* is the optimal value function for a separation of 1 and is function of the four observed links (only b1 known). Expected cost to go from next stage to infinity using an optimal policy Expected cost to go from Current to next stage
Case (i) • In order for the red policy to be better than the blue policy, the following inequality needs to be valid for each b1: Namely, it needs to be cheaper to take red than to take blue. • Or, since , we get: • Replacing the unknown J3* and J1* with the appropriate upper and lower bounds yields: If this equation is valid for all b1, then the previous equation is also valid.
Case (i) • Conceptually: • This is verified numerically. Cases (ii) and (iii) are similar.
Current Research • Characteristics of the general value function on a graph with infinite diameter lead to properties of the optimal two-agent policy: For example: • If E[J(.)|s] can be shown to increase monotonically with s, separation will be bounded and sbound = smin+1. • The separation bound can be shown to be independent of the particular assumptions on the link costs (pdf – cost values). • Translate a real environment into a stochastic graph: • Hierarchical approach: each layer is solving a mapping problem on a different scale using DP, to deal with scalability. • On which level should cooperation occur?
Current Research • Performance of agent wavescompared to synchronous motion: • Agents (with different functionality) navigate sequentially. • Later agents use information previous agents gather. • Time is a key factor in determining optimality: synchronous requires less time than sequential. • In between these two extreme cases: 2nd agent starts when 1st agent hasn’t reached the target yet. Optimal delay? • Mixed worst-case probabilistic approach: • Pdf on link costs is an element of a bounded set of pdfs. • Compute the worst-case value function Jwc. • Use local information on the actual pdf on the link costs to devise a better, local, value function with value iteration using Jwc as terminal cost.
Suggestions for Future Work • Different observation zones lead to different separation bounds. For example: add two diagonal links. Separation is most likely not bounded with a hard bound. BUT: With high probability separation will not exceed… • Do all observation zones lead to a better performance per agent ? Is the extra cost of moving close together balanced by a cost decrease thanks to extra relevant information?
Suggestions for Future Work • Link costs varying with time, and consequently, agents which have the option to wait or return. • Observation zone follows the direction in which each agent moves. • Agents follow each other rather than staying parallel. Previous extensions increase the computational complexity dramatically. Can we extract properties on the navigation strategies and separations using approximate Dynamic Programming?
Trade-off between the agent DOC (direct operating cost) and benefit to the mission. Optimal number of UAVs How many UAVs are optimal? Efficiency is a function of the number of UAVs: The more UAVs used, the higher the benefit of the cluster of UAVs.