840 likes | 861 Views
Scalable Real-Time Negotiation Toolkit Organizational-Structured, Distributed Resource Allocation. PI: Victor R. Lesser University of Massachusetts. Problem Description/Objective. Organizational-Structured Distributed Resource Allocation The specific technical problems we are trying to solve:
E N D
Scalable Real-Time Negotiation ToolkitOrganizational-Structured, Distributed Resource Allocation PI: Victor R. Lesser University of Massachusetts
Problem Description/Objective Organizational-Structured Distributed Resource Allocation • The specific technical problems we are trying to solve: • Development of Soft Real-Time, Distributed Resource Allocation Protocols • Development of Techniques for Specification, Implementation and Adaptation of Agent Organizations • Relevance to DoD: • Techniques for building large-scale, soft real-time, multi-agent applications involving complex resource allocation decisions • Distributed sensor networks, distributed command and control
Approach to Soft, Real-Time Distributed Coordination/Resource Allocation • Structured as a distributed optimization problem with a range of “satisficing” solutions • Adaptable to available time and communication bandwidth • Responsive to dynamics of environment • Organizationally-constrained — range of agents and issues are limited • Can be done at different levels of abstraction • Does not require all resource conflicts to be resolved to be successful — resource manager agents able to resolve some issues locally
Multi-level Approach to Distributed Resource Allocation and Coordination • Organizational Design: Determine appropriate agent roles and responsibilities • Team-Based Negotiation: Managers negotiate solutions to allocation conflicts • Local Autonomy: Individuals decide local unresolved allocation details
Capabilities • Scaling: Support for large-scale adaptive agent sensor networks • Efficiency: Organizationally grounded resource allocation • Responsiveness: Dynamic, soft real-time resource allocation • Adaptability: Organizational self-design and maintenance
Major Issues in Implementing this Approach • What is an appropriate organization for agents • Scalability and Robustness • What is the protocol for distributed resource allocation • Soft Real-Time, Graceful Degradation, Efficient • What is the structure of an agent architecture that supports: • agents functioning in an organizational context • agents implementing complex distributed resource protocols • agents operating under soft real-time constraints How domain-independent and efficient can we make these approaches?
Our Solution at the Organizational Level • Decompose environment to form a partitioned organization. • Each partition (sector) will contain a set of sensor nodes, each with its own controlling agent. • Individual sectors are relatively autonomous. • Specialize members of the agent population to dynamically take on multiple, different goals/roles. • Individual agents become “managers” of different aspects of the problem. • Managers form high-level plans to address their goals, and negotiate with other nodes to achieve them.
Sector Manager Tracking Manager Scanning Agent Tracking Agent Sectored-Based Agent OrganizationAgents multiplex among different roles
Sector Manager Tracking Manager Scanning Agent Tracking Agent Organizationally-Structured Communication among Agents DrA DrQ DrR TB RR TD PTC RB PC DA TBU ES
Sensors Conflicting scanning tasks from different sector managers Locally resolved by agent connected to sensor — SRTA agent Tracking tasks wanting same sensor resources Negotiation among track managers — SPAM protocol Communication Communication degradation due to lack of locality Track manager migration among sectors Communication channel overload Sector manager assignment of track manager roles Processors Data fusion overload/knowledge locality Sector manager assignment of data fusion/track manager roles Multiplexing Roles SRTA local agent control/scheduling Managing Conflicted Resources:Sensors, Processors, Communication
SRTA: Soft Real-Time Agent Architecture • Facilitates creation of multi-resource management agents • Basis of building complex “virtual” agent organizations • Allows for abstract negotiation — maps abstract assignment into detailed resource allocations • Ability to resolve conflicts locally that are not resolved through negotiation These are key to building soft real-time distributed allocation policies
Soft Real-Time Control Architecture Schedule Failure Problem solver Periodic Task Controller Negotiation (e.g. SPAM) Commitments/ Decommitments Goal Description/Objective Update Expectations TÆMS Library Schedule failure/ Abstract view Other Agents TAEMS-Plan Network/Objective Learning Update Cache Cache Check Resource Modeler Cache Hit DTC-Planner Resource Uses Linear Plan Schedule Conflict Resolution Module Partial Order Scheduler Schedule Failure Parallel Schedule Multiple Structures Results Parallel Execution Module Task Merging
SPAM: Resource Adaptive Family of Anytime Negotiation Strategies • Low bandwidth or not a lot of time: • Single-shot — single assignment message to the sensor agent based on uncertain/incomplete information • Relaxing of objectives based on local information • High bandwidth, a lot of time: • Multi-step negotiation with track managers and sensors
Mediation-Based Negotiation Mediator View World View- Multi-Linking of Resource Allocations Interdependency Graph S12, S22 1 S32 M14 M25 M25 1 S15 S25,S20 S18 2 1 M33 M20 M7 M33 M20 M7 1 1 S7 S53 1 S18 S2, S14 M8 M0 M8 1 S5 S8
Stage 2 - Track Manager to Track Manager Negotiation • Originating track manager acts as mediator • Generates solution space • Recommends solution quality reductions • Chooses final solution • Negotiation Mediator gets partial non-local information • Some/All of the sensor schedules relevant to specific track • Used to find neighbors (other track managers) in the constraint graph • Conflicting track managers’ information • Domain of acceptable assignments • Current solution quality • Number of possible sensors that can be used for tracking • Sensors that are in conflict (mediator to neighbor and neighbor to neighbor) • Additional constraints – fuzzy notion of constraints on non-directly conflicted sensors
Major accomplishments/contributions of the project • Development of SPAM heuristic resource allocation protocol • Showed importance of mediation-based negotiation (partial centralization) with overlapping context and extended views along critical paths for search and communication efficiency • Development of APO distributed constraint algorithm based on SPAM concepts • Better performance than best known algorithm - AWC • Development of SRTA soft real-time architecture • Demonstrated that a sophisticated domain-independent agent architecture that operates in soft real-time could be built • Demonstrated importance of organizational structuring for distributed resource allocation • Showed how using negotiation organization could be dynamically constructed and efficiently modified as the environment changed
Recent Accomplishments • New Results on SPAM, APO and Opt APO • First Results on Organizational-Structured Coalition Formation • Performance Improvements in FARM • Organization Design Framework
SPAM’s Effectiveness • 20 randomly placed sensors. • Between 2 and 9 randomly placed fixed targets. • 160 test runs (20 runs for each number of targets) • Ran until SPAM converged. • Optimal • utility was computed using a Branch and Bound search where the domain for each track was the possible objective levels. • tracks were computed by using a Branch and Bound search where the domain for each track was either the minimal utility for tracking or nothing. • Greedy • Each track manager requests 4 of the available sensors at random for every time slot. • Commitments override each other in the sensor schedules.
Utility Comparison as a % of Optimal • SPAM stays closer to the optimal value and has less variance in its utility.
Tracking Comparison • SPAM tracks nearly 100% of the optimal number of targets that can be tracked. • Greedy ignores more targets as contention increases.
Time to Convergence The time to converge increases linearly with an increase in contention.
SPAM’s Scalability • 100-800 agents. • Each agent was either a sensor or track manager. • Fixed ratio of sensors to targets of 2.5 sensors per target. • Fairly overconstrained • Sensors are randomly placed. • Targets move with a random velocity that is uniformly distributed from 0.0 to 1.0 ft/s. • Environment size had a fixed expected sensor density of 4 sensors per point. • Twenty 3-minute runs per data point.
Utility Scalability SPAM consistently maintains a higher utility than a greedy assignment
Tracking Scalability SPAM also tracks a higher percentage of the targets that are viewable by 3 or more sensors.
Communication Scalability There is no apparent increase in the communications per agent as the number of agents increase.
Asynchronous Partial Overlay (APO) • A new algorithm for Distributed Constraint Satisfaction (DCSP) • Three basic principles • Mediation-based • Overlapping views and exploiting local context • Extending views along critical paths • Proven to be both complete and sound
How it Works • Agents take the role of mediator when they have conflict • Mediator gathers information from other agents in the session concerning value preferences and their effects • Mediator chooses a solution that removes local constraint violations and minimizes the effect outside of its view • Mediator then links with agents for which it caused violations (expanding context along critical paths)
Testing APO • Implemented the graph coloring domain in the Farm simulator • 3 Coloring problems • nodes = 15, 30, 45, 60, 75, 90 • edges • 2.0 X nodes (Low, left of phase transition) • 2.3 X nodes (Medium, in phase transition) • 2.7 X nodes (High, right of phase transition) • Compared APO against the Asynchronous Weak-Commitment (AWC) protocol (Yokoo ‘95) • 10 random, solvable problems, each with 10 different starting assignments (Minton et al. ‘92) • AWC is currently the best known method for solving DCSPs
SPAM II — Optimal APO • Currently working on an optimization version of APO. • Based on the three main APO principles. • Mediation-based • Overlapping views and exploiting local context • Extending views along critical paths
How it works • Each agent computes the optimal value for their local sub-problem (upper bound) and the current value based on their current view. • Mediation occurs when: • upper bound is greater than its current value • one of the links they own has a value that is not the highest available • Mediator gathers information from other agents in the session concerning value preferences and their effects • Mediator chooses a solution that is optimal and minimizes the impact outside of its view • Mediator then links with agents that had their current value lowered as a result of the mediation (expanding context along critical paths)
Testing Optimal APO • Preliminary testing on partial constraint satisfaction in 3-coloring • Optimal APO appears to be sound and optimal • It may be better than other DCOP techniques • More testing is needed to confirm these suspicions • Optimality and soundness proofs are underway
Organizationally-Structured Distributed Coalition Formation Formal Definition of the Task Allocation Problem: • Let R = {R1, …, Rk} be the set of resources. • Let A = {a1, … , an} be the set of agents, where each agent ai controls a set of resources CRi= {cri,1, …, cri,k} • Let T = {T1, …, Tm} be the set of tasks to be undertaken, where each task Tj has a utility, a set of required resources RRj = {rrj,1, …, rrj,k}, an arrival time, a duration, and possibly a deadline. • The goal is to maximize the total utility of accomplished tasks. • A task Tj is accomplished if it was assigned to a coalition Cj of agents that collectively has enough resources to accomplish Tj while satisfying its timing constraints. How to construct for a large number of agents an organization of agents and associated allocation policy that optimizes this allocation process over an ensemble of tasks
The only way to achieve scalability is to “organize” agents into a hierarchical structure. We can then use this structure in allocating agents (teams, coalitions, etc.) to incoming tasks. a0 a1 a5 a1 a2 a3 a4 a5 a0 a6 a2 a3 a4 An Organization for Distributed Coalition Brokering
Task T2 (200,400) arrives at a5 a1 a5 a0 Task T1 (100, 50) arrives at a3 a3 Now the organization in action…
a1 a5 a5 failed to form the coalition and sending task up to a0. a0 a3’s schedule a3 successfully formed a coalition for T1. a6’s schedule a3 a6 a7 a7’s schedule The Need for Search Can we learn a policy for deciding how to search at an agent based on meta-level information on resources at children agents.
Elements of an Organization • Organization structure • Decision making • Information abstraction • Goal decomposition
The Local Decision Problem • Each manager has the following state: • For each sub-cluster, its size, average resources, and standard deviation • and is required to make an action: • serial: select the best candidate agent for asking for resources • parallel: decompose the required resources over the sub-managers • Both versions of the decision problem can be modeled as an MDP, where RL can be used. • Q learning algorithm with neural nets as functional approximators
Experiments Setup We tested two organizations: • A small organization, consisting of 40 individual agents, managed by 10 managers. • A larger organization consisting of 90 individual agents, managed by 13 managers. • Tasks were chosen randomly from a fixed pool (one pool for each organization). • We compared learned policies (different exploration rates) against random and heuristic policies. We measured two quantities: • the average utility achieved by the organization • the average number of messages exchanged by the organization.
Conclusion on Organization for Distributed Coalition Brokering • The learned policies outperformed both random and heuristic policies for both the small and large organizations, achieving higher utility with less communication. • Less exploration seems to better due to interaction among learners • Though using neural nets threatens policy convergence, our learned policies always converged. Abstraction and decomposition functions highly affects convergence. • We expect more improvement in the performance of the learned policy with better abstraction and decomposition functions. • Our next step is to study the optimization of the organizational structure and how this interacts with optimizing the decision making (different organizations have different optimal policies).
Farm Distributed, generic multi-agent simulation environment • Provides • Environmental state accessors • Controllable communication mechanism • Plug-in mechanism for adding functionality • Agents run in “allocated” real time. • Each agent receives an amount of real CPU time to run in
Meta-Agent • Thread scheduling • Communication Meta-Agent • Thread scheduling • Communication Meta-Agent • Thread scheduling • Communication Agent Agent Agent Agent Agent Agent … … … Agent Agent Agent System Architecture • Each component may be run on a separate processor • Most components are optional, and may be added dynamically Analyses • State / trend analysis Farm Core • Plug-in management • Control flow Driver • Non-agent activity GUIs • State visualization …
Global Data • Allows global data “properties” to disseminate information • Environmental simulation • (e.g. target location, visibility lists) • Statistics and instrumentation • (e.g. current utility, message totals) • Data flows among components • Readers: Analysis, agents, visualization.. • Writers: Environmental drivers, agents…
Global Data Bottleneck • Central storage of these properties is impractical • Thousands of agents may cause millions of accesses • Creates a potential bottleneck, as well as high communication overhead • Distribute data across components • Farm core tracks ownership of properties • Storage and access control is distributed • Data may be proactively pushed