190 likes | 332 Views
Increasing Security through Communication and Policy Randomization in Multiagent Systems. Praveen Paruchuri, Milind Tambe, Fernando Ordonez University of Southern California Sarit Kraus Bar-Ilan University,Israel University of Maryland, College Park. Motivation: The Prediction Game.
E N D
Increasing Security through Communication and Policy Randomization in Multiagent Systems Praveen Paruchuri, Milind Tambe, Fernando Ordonez University of Southern California Sarit Kraus Bar-Ilan University,Israel University of Maryland, College Park
Motivation: The Prediction Game • An UAV (Unmanned Aerial Vehicle) • Flies between the 4 regions • Can you predict the UAV-fly pattern ?? • Pattern 1 • 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4,…… • Pattern 2 • 1, 4, 3, 1, 1, 4, 2, 4, 2, 3, 4, 3,… (as generated by 4-sided dice) • Can you predict if 100 numbers in pattern 2 are given ?? • Randomization decreases Predictability • Increases Security
Problem Definition • Problem : Increase securityby decreasing predictability for agent-team acting in adversarial environments. • Even if Policy Given, it is Secure • Environment is stochastic and observable (MDP-based) • Communication is a limited • Efficient Algorithms for Reward/Randomization/Communication Tradeoff
Assumptions • Assumptions for agent-team: • Adversary is unobservable • Adversary’s actions/capabilities or payoffs are unknown • Communication is encrypted (safe) • Assumptions for Adversary: • Knows the agents plan/policy • Exploits action predictability • Can see the agent’s state
Solution Technique • Technique developed: • Intentional policy randomization • CMDP based framework : • Sequential Decision Making • Limited Communication Resources • CMDP Constrained Markov Decision Process • Increase Security=>Solve Multi-criteria problem for agents • Maximize action unpredictability (Policy randomization) • Maintain reward above threshold (Quality constraints) • Communication usage below threshold (Resource constraints)
Domains • Scheduled activities at airports like security check, refueling etc • Can be observed by adversaries • Randomization of schedules helpful • UAV-team patrolling humanitarian mission • Adversary disrupts mission – Can disrupt food, harm refugees, shoot down UAV’s etc • Randomize UAV patrol policy
Our Contributions • Randomized policies for Multi-agent CMDP (MCMDP) • Solve Miscoordination • Randomized polices in team settings • Policy not implementable! (Reward constraint gets violated) Maximize Policy Randomization Expected Team Reward > Threshold Communication Resource < Threshold
Miscoordination: Effect of Randomization • Meeting tomorrow • 9am – 40%, 10am – 60% • Communicate to coordinate • Limited Communication Should have been 0 (Violates Threshold Rewards)
Communication Issue • Generate Randomized Implementable policies • Limited communication • Problem of communication • M coordination points • N units of communication • Generatebest communication policy • Communication policy can also be randomized • Transform MCMDP to implementable MCMDP • Solution algorithm for transformed MCMDP
MCMDP: Formally Defined • An MCMDP (for a 2 agent case) is a tuple <S,A,P,R, C1,C2, T1,T2, N,Q>where, • S,A,R – Joint states, actions, rewards • P – Transition function • C1 - Cost vector for resource k • T1 - Threshold on expected resource k consumption. • N - Joint communication cost vector • Q - Threshold on communication costs • Basic terms used : • x(s,a) : Expected times action a is taken in state s • Policy (as function of x) :
Entropy : Measure of randomness • Randomness or information content quantified using Entropy ( Shannon 1948 ) • Entropy for CMDP - • Additive Entropy – Add entropies of each state • Weighted Entropy – Weigh each state by it contribution to total flow where alpha_j is the initial flow of the system
Issue 1: Randomized Policy Generation • Non-linear Program: Max entropy, Reward above threshold, Communication below threshold • Obtains required randomization • Appends communication for every action • Issue 2: Generate the Communication Policy
Issue 2: Transformed MCMDP a1b1 a1C a1b2 a1b1 a1o S1 a1b2 S1 a2b1 a2C a2b2 a2o a2b1 a2b2 For each state, for each joint action, Introduce C (communication) and NC for different individual action, add corresponding new states Transition between original and new states Transitions between new states and original target states
Non-linear Constraints • Need to introduce non-linear constraints • For each original state • For each new state introduced by no communication action • Conditional probability of corresponding actions equal Ex: P(b1/ ) = P(b1/ ) && P(b2/ ) = P(b2/ ) , - Observable, Reached by Comm action , - Unobservable, No Comm action
Non-Linear constraints: Handling Miscoordination • Agent B has no hint of state if NC actions. • Necessity to make its actions independent of source state. • Probability of action b1 from state should equal probability of same action (i.e b1) from . • Meeting scenario: • Irrespective of agent A’s plan • If agent B’s plan is 20% 9am & 80% 10am • B is independent of A • Miscoordination avoided Actions independent of state.
Experimental Results Z-axis Y – axis X-axis
Experimental Conclusions • Reward Threshold decreases => Entropy increases • Communication increases => Agents coordinate better • Coordination invisible to adversary • Agents coordinate better to fool the adversary • Increased communication Higher entropy !!!
Summary • Randomized Policies in Multiagent MDP settings • Developed NLP to maximize weighted entropy with reward and communication constraints. • Provided transformation algorithm to explicitly reason about communication actions. • Showed that communication increases security.
Thank You Any Questions ???