MASS: Multiagent Security and Survivability

MASS: Multiagent Security and Survivability V.S. Subrahmanian University of Maryland Joint work with Sarit Kraus, Cihan Tas, and Yingqian Zhang

Survivability of Multi-Agent Systems (MASs) • Problem: External events may cause an MAS to crash. Examples of such events are: • power failures, • OS crashes, • Malignant attacks, etc. • Approach: Replication of agents. • Questions to ask: • When to replicate? • Where to replicate? • Which agents (who) to replicate?

Talk Outline • Architectures for multiagent survivability • Centralized probabilistic survivability • Agent-oriented probabilistic survivability • Centralized Probabilistic Survivability Details • Outline of 3 algorithms for agent oriented distributed survivability • Experimental results

Centralized approach A MAS (set of agents) is deployed over a given network of host nodes. A special “survivability program” is place on a node selected by the MAS developer.

Agent oriented approach A special survivability agent is deployed at one or more nodes in the network. These agents automatically collaborate to increase survivability of the MAS.

MAS and Network Assumptions • Agents: • provide one or more services; located on a host computer; require resources. • Multiagent application (MAS): • a finite set A of agents. • Memory Requirements: • Each agent requires a certain amount of memory from host. • Each host node has some fixed amount of memory to give the set of agents. • Network Assumptions: • fully connected, defined as Ne(N, edges, mem)

Definition:  (Deployment) • A deployment, : N  2A, specifies which agents are located at a given node.  must satisfy the following: • Every agent must be deployed somewhere. • The agents deployed at a node cannot use more memory than that node makes available. • Example: (n1) = {a,b} (n2) = {a,c}

Definition:Disconnect probability • A disconnect probability function for a network (N,Edges,mem) is a mapping dp:NC[0,1] • Statistical, past experience, expert opinion • C[0,1] is the set of all closed subintervals of [0,1] • Example: • dp(n1)=[0.2,0.3] says that there is a 20%-30% probability that the node n will get disconnected. • dp(n2)=[0.25,0.25] says that disconnect prob. is exactly 25%. • Dp can be extended easily to include temporal projections.

Definition: Future Networks A possible future network of (N,Edges,mem) consists of a subset of the nodes and a subset of the edges Involving the selected nodes.

Example of Future Networks

Related Work • Methods for agent cloning for load balancing (Sycara, Shehory,Decker) • Methods for agent replication for fault tolerance (Fedoruk, Marin,Sens,Fan) • Network reliability– studied extensively. • Fault-Tolerance software systems: N-version approach (Lyu,He). • BUT: In the case of MAS, • No answer to: who to replicate, how many replicas, where these replicas should be located. • No work on: Probabilistic methods of survivability

Outline • Motivation • Assumptions and definitions • Related Work • Problem statement • What is survivability? • Finding optimal survivability. • Node based heuristics • Agent base heuristics • Experiment, comparison and Results • Conclusion

Computing Optimal Deployment Given a network (Ne) and a disconnect Probability (dp) function: Find a deployment whose probability of survivability is maximal.

What is Survivability? • What is the probability with which it is guaranteed that the MAS will survive? • Survivability: at least one copy of each agent will keep functioning.

Constraints on Future Networks CONS(dp,Ne) • Prob(Ne)—the probability that future network Ne will arise. • Suppose, Ne1,…,Nek are all possible future networks. • prob(Ne1)+…+prob(Nek ) =1; • prob (Nej ) >=0; • Suppose Ne’1,…,Ne’l are the future networks that includes node n: • 1-dp(n)=prob(Ne’1)+….+prob(Ne’l) • In this talk, we assume dp returns a point probability. • (Paper allows interval probabilities.)

A probability of survival of  • Suppose, Ne1,…,Ner are all the possible future networks such that  is a deployment w.r.t. Nej • Minimize prob(Ne1)+…+prob(Ner ) subject to CONS(dp,Ne) • Guaranteed that the actual prob. of survival is greater than or equal to this.

Example Deployment Disconnect Prob. Possible FutureNetworks Minimize p3+p4+p5+p6+p7 Subject to p1+p4+p5+p7=0.9 pi>=0 p2+p4+p6+p7=0.8 p3+p5+p6+p7=0.7 p1+p2+p3+p4+p5+p6+p7+p8=1 Solution 0.7

Computing the Survival of a Deployment  • We can solve the linear program using simplex or any other method. • Problem: the size of the linear program is enormous as the number of possible future networks is huge. • We do NOT assume independence of dp. • Else, we could add a new constraint, prob({n1,n2})=[1-dp(n1)] * [1-dp(n2)]

Proposition • If an agent is located in a given set of nodes and another agent is located in those nodes and some others, then nothing is gained by putting the second agent in any of the other nodes. • Example: • If a2 is not in purple node, we do not lose. • Saves time. • Corollary: when searching for an optimal deployment, there is no need to look at ones where the set of “locations” of one agent is a superset of the set of locations of another agent. a1 a2 a2

Proposition • If an agent is located in a given set of nodes and another agent is located in those nodes and some others, then nothing is gained by putting the second agent in any of the other nodes. • Example: • If a2 is not in purple node, we do not lose. • Saves time. • Corollary: when searching for an optimal deployment, there is no need to look at ones that the “locations” of one agent is a superset of others. a1 a2 a1 a2 a2

Definitions • An agent a is relevant with respect to  and Ne if there is no other agent which is deployed at a strict subset of nodes at which a is deployed. • Nodes in which no relevant agents are deployed are not necessary. a1 a2 a2

Theorem • Suppose MAS is a multiagent application, Ne=(N,Edges,mem) is a network, dp is a disconnect probability function and  is a deployment for MAS on Ne.Let Ne’=(N’,Edges’,mem) where N’ is the set of necessary nodes of Ne w.r.t , and Edges’={(n1,n2)|n1,n2 N’}. If ’ is the the restriction of  on Ne’ then surv()=surv(’) • Proof: need to show the equivalent of 2 minimization expressions with different set of constraints. • BOTTOM LINE: When computing survivability of a deployment, it is enough to restrict interest to necessary nodes.

CDP: compute survivability given Ne,MAS, ,dp. • Remove unnecessary nodes and create Ne’ ;’. • Compute hitting sets for Ne’ ;’ • For any possible future network • Check if the pfn contains at least one hitting set. • Create and return the result of the appropriate minimization problem  Will be used for finding optimal deployment.

CDP: compute survivability given Ne,MAS, ,dp. h1 = {n1,n2} h2 = {n3} • Remove unnecessary nodes and create Ne’ ;’. • Compute hitting sets for Ne’ ;’ • For any possible future network • Check if the pfn contains at least one hitting set. • Create and return the result of the appropriate minimization problem  Will be used for finding optimal deployment.

CDP1 : Adding Efficiency • Proposition 1: • Removing an agent from a node cannot add new hitting sets to the system • Proposition 2: • Any hitting set that contains a removed node, is not going to be a hitting set anymore. • These two suggest that re-computation of hitting sets is not necessary. Use the old network. • Application: • For each element of old network’s hitting set • If the node changed is not an element of that set • OR h – {node_changed} can support the removed agent • USE THIS HITTING SET IN YOUR NEW NETWORK ALSO

Computing Optimal Deployment Given a network and a disconnect probability function: Find a deployment whose probability of survivability is maximal.

Search for Optimal Deployment:Branch and Bound algo. • Initial state: all agents on all nodes (if a valid deployment stop). • Children of a state are all the states that are obtained by the removal of one agent from one node of the state. • In each stage: if a valid deployment is found, compute its survivability and use to bound the search. • No need to consider deployments whose survivability is lower than the bound (proposition). • Theorem: Problem of finding an optimal deployment is NPNP complete. • NOT PRACTICAL TO FIND OPTIMAL DEPLOYMENTS.

10 20 10 Node Based Heuristic • Put as many agents as possible in nodes with low disconnect probability • Sort nodes in ascending order of disconnect probability. • For each such node put as many agents as possible on it using Knapsack algorithm. Don’t deploy an agent twice. (variant is to use a greedy knapsack approx). • If all agents are deployed and you still have available nodes, start from the beginning. 0.3 0.2 0.35 25 20 30

20 10 10 Agent Based Heuristic • Place agents with high resource requirements on nodes with low disconnect probabilities. • Sort agents in ascending order according to resource requirements. • Place the first agent on the node with the “lowest” disconnect probability; then second etc 0.2 0.3 0.35 30 25 20

Experiment Settings • Goal: to compare of the different algorithms and heuristics. • Evaluation of time and survivability as function of: • Problem’s size: Number of agents+ number of nodes • Number ratio: Number of agents/number of nodes • Size ratio: avg. of memory requirement of agents/avg. of memory available on nodes. • Varying: number of agents; number of nodes. • Random generation: • memory requirement/availability of agents and nodes • disconnect probability of nodes

Comparing Heuristics: Survivability Survivability ______ Agent Based H. --------- Node Based H. Num. of Agents + Num. of Nodes

Heuristics Comparison: Survivability Num. of Agents > Num. of Nodes Num. of Agents < Num. of Nodes ___ Agent Based H. ----- Nodes Based H. Survivability ___ Agent Based H. ----- Nodes Based H. Num. of Agents + Num. of Nodes

Heuristics Comparison: Computation Time in microsecs. ___Agent Based H. ___ Node Based H. Computation Time (Microseconds) Num. of Agents + Num. of Nodes

Results • As the sum of the number of agents and nodes increases, survivability decreases. • Node-based heuristic almost always gives slightly better results than the agent-based heuristic for survivability. • When there are more agents than nodes, the node-based heuristic will require less time, while there are more nodes than agents the agent based heuristic will take less time.

Key idea Add a special distributed survivability agent dsa to MAS. Add a copy of dsa to each node on the network. dsa replicas on different nodes are designed in such a way that they always know what the other replicas are doing. ASSUMPTIONS Each agent can kill/copy itself to another node. Each node has enough space for the dsa. dsa has knowledge of current deployment and of changes in disconnect probabilities of nodes. Developed network based models to estimate disconnect probability of nodes. Buffer manager can stream agents. Pretty standard to implement. ASA-1: Agent Survivability Algorithm

ASA-1 algorithm sketch Current deployment is mnow a If disconnect probability change occurs then 1. Use COD to find new deployment mnew 2. For each node n, compute Insert(n) and Delete(n) – nodes to be inserted/deleted from n. 3. Determine how to insert/delete within the constraints of space per node. Key step is (3). During execution of the algorithm, • at least one copy of each agent must be on the network at all times and • no host’s space should be exceeded (at all times).

ASA-2 algorithm • MAS’ = MAS U{ dsa2 } • Use COD to deploy MAS’. • Slight differences between the coding of this dsa as compared to the dsa used in ASA-1. • dsa2: • Arbitrarily deletes all but one copy of each agent • Moves/copies remaining agents to their new locations.

ASA3: Mobility-based survivability • Each agent in MAS has a mobility capability. • Each agent in MAS is augmented with the following rules: • When any dsa sends it a message to move to a new location, it does so. • After executing the move, it informs alldsa replices that it has done the move. • Assume no other move operations in the agent that interfere with the above rules.

ASA-3 When achange in disconnect prob is detected do the following: 1. For each node n, compute Insert(n) and Delete(n). 2. Send delete messages to all agent replicas that can be safely deleted (safe means at least one other copy of the agent exists) 3. Send a move/exchange message to various agents.(agents can be streamed ) 4. When an ack is received, send a move/exchange message to the next agent. 5. Continue till no more agents to be moved.

Experiments • Agent survivability based on agent characteristics from existing IMPACT agents (31 agent sample). • Agent memory requirements • Between 150KB and 250 KB with 3/31 probability. • Between 50K and 150K with 8/31 probability. • Between 0KB and 50KB with 20/31 probability. • Bandwidth = 100 KB/sec (actually a relatively low bandwidth assumption).

Effect of Problem Size on Compute Time

Effect of Problem Size On Compute Time

Effect of Survivability with ASA-1 and ASA-3

MASS: Multiagent Security and Survivability