Scalability Aspects of Agent-based Naming Services

This paper discusses the scalability aspects of agent-based naming services in distributed systems, focusing on the Cougaar white pages design. Experimental results show the accuracy of the model-based predictions. Conclusions and future work are also presented.

Scalability Aspects of Agent-based Naming Services

Presentation Transcript

  1. Scalability Aspects of Agent-based Naming Services Todd Wright, Karl Kleinmann BBN Technologies twright@bbn.com, kkleinmann@bbn.com

  2. Outline • Introduction to Agent Naming Services • Cougaar White Pages Design • Experimental Results • Accuracy of Model-Based Predictions • Conclusions and Future Work

  3. Introduction • Why do agents need a naming service? • A White Pages (WP) maps an agent name to a physical address (e.g. agent “X” is on host “Y”) • A WP is required in distributed systems that contain dynamic entities (e.g. dynamically added/removed/mobile agents) • The WP primarily supports the Message Transport Service (MTS) • Why not use a standard naming solution? (LDAP, DNS ...) • Our target application has custom requirements (e.g. frequent agent mobility, integrated security policies, ..) • We wanted to explore the benefits of an agent-based approach • Why build an agent-based WP? • Ease of integration with the agent system • Leverage the capabilities of the underlying agent framework • Support complex, agent-like behavior

  4. Cougaar Background TRANSCOM 1BDE 2BDE • Cougaar Agent Architecture • Open-Source Java-based Agent Framework • DARPA-funded research (8 years, multiple programs) • Used in several real-world applications (military logistics, inventory control, ...) • Significant WP requirements: • Support 1000+ agents distributed over 100+ hosts • Rapid, dynamic agent mobility • Focus on Security and Robustness http://www.cougaar.org

  5. Cougaar White Pages Design • Agent-based design • Implemented WP servers as agents • Integrated client-side WP caches • Two-tier design • Clients can talk to any server • Servers replicate between one another • Leases used throughout • Client-side leases on cached lookup data • Server-side leases on bind data • The lease duration “time to live” (TTL) values control the scalability and performance • Many additional features: • Server selection, bootstrap, security… • See AAMAS’04 “Naming Services in Multi-Agent Systems”, T.Wright. Server1 Fully connected Server-to-Server replication Server2 Server3 Clients .. Clients select any server Agent1 Agent2

  6. Design: Lookup • There are two patterns: lookup and bind. The figure on the right shows the lookup pattern. • To lookup data, a client can ask any server, since all servers are full replicas of one another • Leases are specified by the server to control client-side caching • Short leases ensures up-to-date data but increases traffic • Long leases reduces traffic but increases stale data • This tradeoff is explicit Server2 Server3 .. Server1 2: lookup 3: data + ttl WP Cache + Bind Renewer + Server Selector 4: renew after ttl 1: lookup Agent1 Agent2 ..

  7. Design: Bind • To bind data, a client can tell any server, then that server forwards the message to all the other servers • Leases are used to control server-side expiration of bound data • Short leases ensure quick cleanup of “dead” agent entries, but this increases traffic • Long leases reduce traffic but delay the “Garbage Collection” cleanup • This is another explicit tradeoff • The number of servers is another tradeoff • Fewer servers minimizes traffic but increases per-server load and single points of failure • Many servers balance load but increase overall traffic Server2 Server3 .. 4: fwd 5: fwd Server1 2: bind 3: ack + ttl WP Cache + Bind Renewer + Server Selector 6: renew after ttl 1: bind Agent1 Agent2 ..

  8. Experimental Results: Scalability Scalability Comparison: 10x 6x 14x • A 10x increase in the number of agents produces a 6x increase in messages and 14x increase in byte traffic

  9. Experimental Results: Model Accuracy Accuracy of our Predictive Model: Good Accuracy • We created a simple model based on our design, which was found to be accurate in our experiments

  10. Model Analysis • Our model is accurate enough to predict the overall WP message traffic • Model simplifications did not significantly impact the model accuracy: • Models steady-state costs, not startup • Assumes evenly balanced clients & server selections • Aggregates messaging across all clients • Confirmed design scalability

  11. Conclusions • Naming services can benefit from agent-based solutions • Autonomous client and server behaviors • Complex, dynamic behaviors (e.g. server selection, leases) • Leverage agent-internal capabilities (e.g. MTS, security) • Scalability estimated through models & verified through large-scale tests • Models clarify your design & guide system tuning • Verification refines the model & guides future enhancements • Cougaar’s naming service is scalable • Well-defined models • Can be enhanced, since it’s component-based & open source

  12. Future Work • Implement models for non-steady-state states • Model startup costs • Model bootstrap & server selection • Model non-uniform layouts & access patterns • Implement a configuration wizard based on our model • Input: number of agents, layout, expected rate of mobility • Output: number of server agents, lease durations • Enhance our design, based on our new insights • Reduce the steady-state cost (e.g. through leased server-to-client change callbacks) • Increase scalability through hierarchical servers • Implement more complex, adaptive behaviors (e.g. server-controlled load balancing)

  13. For more information … • BBN Technologies: • http://www.bbn.com • Cougaar Agent Architecture: • http://www.cougaar.org • Other Cougaar-related KIMAS’05 papers: • “Watching Your Own Back: Self Managing Multi-Agent Systems”, M. Thome, T. Wright, et al • “Using QoS-Adaptive Coordination Artifacts to Increase Scalability of Communication in Distributed Multi-Agent Systems”, J. Zinky, S. Siracuse, et al • “A Reconfigurable Multiagent Society for Transportation Scheduling and Dynamic Rescheduling”, D. Montana, G.Vidaver, et al • “Adaptive Optimization of Solution Time in a Distributed Multi-Agent System”, A. Fedyk, et al

