Partha Mukherjee & Sandip Sen Department of Math & CS University of Tulsa

Comparing Reputation Schemes for Detecting Malicious Nodes in Sensor Networks Partha Mukherjee & Sandip Sen Department of Math & CS University of Tulsa

Motivation • ASSUMPTION :A network of sensors deployed for sensing data over a region • Correlation between data sensed at different nodes • Correlation pattern may change over time • Colluding malicious nodes may attempt to subvert the data reported by the sensor network • GOAL: Comparing the performances of the reputation mechanisms used to detect malicious / erroneous nodes in the network

Sensor Networks • Monitor physical / environmental conditions • Resource constraints • Sensed/aggregated data reported back to Base station • Susceptible to security breaches/compromise

Sensor Network Organization • Sensor field consists of nodes laid out on a grid • Nodes organized in a hierarchy • Assumption:time-varying data sensed by different nodes are correlated • Example: Temperatures at different grid points over the day

Schemes used to detect malicious nodes • Reinforcement learning • Q-learning approach • Statistically grounded scheme: • -reputation approach • Discount factors: weights on past / present experiences • Un-weighted • Linear • Exponential • Varying parameters: • Patterns in the sensed data • Delay of onset of malicious data

Detecting Malicious Nodes • Collect sufficient data when sensor network is operating normally for mining correlation patterns • Use neural networks to model correlation between data sensed by siblings in the sensor node hierarchy • The value sensed at any node is predicted from the values sensed by its siblings • Offline training of the nets using back-propagation • Use learning techniques to discover patterns • Each malicious node adds a random offset in the range [0,] to the reported value

Detecting Malicious Nodes • At each reporting time step error between actual and predicted data sensed by a node is calculated • This sequence of “errors” is used to incrementally update the reputation of the node • Node labeled malicious if reputation falls below threshold

Detecting Malicious nodes • Choose Reputation Threshold,  • For each node: • Compute relative error at time t : t • Compute error statistic : (t) • Update Reputations : • Q-Learning :tQL = (1 - ). (t-1)QL + . (t) • Balance Factor :  • - Reputation :t = (t + 1) / (t + t + 1) • Cooperative Response: , Non-cooperative Response :  • Un-weighted : • Linear : • Exponential : Exponential discount factor :  Node is malicious : if QL< or if  < 

Experiment • Computation of sensed data • Based on generation function : g • Model fluctuation • Add Gaussian Noise : N • Variation of the sensed parameter is represented by the stochastic function ƒ • ƒ(x,y,t) = g(x,y) + h(t) + N(0,) • h : T [l, u]

Experiment • Considered two generation functions g to generate data patterns over the 85 node sensor network • g1: exp(-(x2 + y2)) • g2 : (x + y) / 2 • Considered error-free time interval set • D = {0,10,20,30,40,50} • Considered exponential discount factor set •  = {0.2,0.4,0.6,0.8}

Q-learning and -reputation Schemes with Linear and Two Extreme Discount Factors • Q-learning scheme detects the erroneous nodes earlier than -reputation for distributionexp(-(x2 + y2))

Q-learning and -reputation Schemes with Linear and Two Extreme Discount Factors • Q-learning scheme detects the erroneous nodes earlier than -reputation for distribution(x + y)/2

Comparison Between -Reputation Schemes with Different discount factors • -reputation schemes of lower discount factors detects the erroneous nodes earlier for distributionexp(-(x2 + y2))

Comparison Between -Reputation Schemes with Different discount factors • -reputation schemes of lower discount factors detects the erroneous nodes earlier for distribution(x + y)/2

Conclusions • Q-Learning is more efficient than β-Reputation for higher values of initial error free time steps • β-Reputation is more efficient than Q-learning to detect first malicious node when the initial delay of attack is in between 0 to 4 iterations • Among β-Reputation schemes with discount factors, schemes with lower discount values exhibit higher efficiency. The un-weighted one ( = 1) is least efficient • The combination of learning and reputation management makes this scheme work with the following observations • All faulty nodes are detected (No false positives) • No normal node labeled faulty (No false negatives)

Future Work • Testing with different complex data patterns. • Testing with different topologies. • Exploring the possibility of developing more robust scheme. • Handling sophisticated collusion. • Hierarchical structure : If nodes in higher level collude.

THANK YOU

Partha Mukherjee & Sandip Sen Department of Math & CS University of Tulsa