SybilInfer: How to Win the Zombie Wars!

SybilInfer: How to Win the Zombie Wars! Prateek Mittal, George Danezis (MSRC Intern) (MSR Cambridge)

Motivation • Distributed Systems Security • Byzantine Consensus • Secure routing in DHTs • Reputation Systems • Assumption : bound on fraction of dishonest identities • A single entity/user can pretend to have multiple identities • Sybil Attack

Sybil Attack • Sybil identities can own a large fraction of all identities • Redundancy does not help! • Distributed systems security solutions fail... • Botnets • Zombie machines • Average size > 20,000

How to bound the fraction of dishonest nodes? • Trusted Central authority • Distributed Solutions? • Resource constraints • CPU, Bandwidth etc • CAPTCHAS • Social Networks

Leveraging Social Networks • Resource Constraint • bound on number of trust relationships between attackers and honest nodes • Attacker cannot create edges between honest nodes and Sybil identities Attack edges Honest Nodes Sybil Nodes

Leveraging Social Networks • Social networks are Fast Mixing • Random walks quickly convergence to stationary distribution • Sybil attacks induce a bottleneck cut • Fast mixing is disrupted • Knowledge of an apriori honest node • Breaks Symmetry Attack edges Honest Nodes Sybil Nodes

Related Work • SybilGuard[SIGCOMM 06] & SybilLimit [Oakland 08] • Assumes short random walks lie mostly in the honest region • Results in poor threshold to colluding attackers • Heuristic validation approach • Honest nodes random walks intersect • Birthday paradox • High false negatives Attack edges Honest Nodes Sybil Nodes

Our Approach: “Gap” • Design Philosophy • Optimal use of all information available in the graph • No assumptions on threshold of attack edges Compute Probability of Cut being a bottleneck Compute Marginal probabilities of nodes being honest Sample cuts from the distribution of bottleneck cuts Estimate the gap (EXX)

Formal Model • Properties of Mixing times • Depend on random walks • and where they end • Each vertex performs S random walks • length l =log(|V|) • Transition probability • Uniform stationary distribution (without attack) • Let T = set of vertex pairs <start vertex, end vertex> for each random walk

Formal Model • Assign probabilities of cuts being honest • Using Bayes Theorem, we have that : • Next Challenge: Model

Formal Model X X

Estimating EXX / probxx • We could sample EXX as well • P(X,EXX|T) • Expensive • Instead, we shall directly estimate the best EXX

Sampling • Sample from above distribution • Marginal Probabilities • P(Node j is honest) = # j appears in samples/ #samples • Can label nodes as honest/dishonest • Sampling algorithm : Metropolis-Hastings • Current State : X0 • Propose a new state X1 with probability Q(X1|X0) • Accept new state with probability

Security Evaluation • Theoretical Guarantees • Synthetic Topologies • Real Data Sets • LiveJournal • DBLP

Theoretical Guarantees • Ideal Scenario: • Without attack, the cuts obtained from model have EXX=0 • Under attack, the cuts obtained from the model have EXX >0 regardless of attacker strategy • Real World: • Without attack, we obtain cuts with EXX approx 0 (upper bounded by Emax) • Under a major Sybil attack, we obtain cuts with EXX > Emaxregardless of attacker strategy

Implementation • 2 implementations • Roughly 1000 LOC • C++ • Can handle about 50000 nodes • Python • Can handle about 10000 nodes • Performance • 20 samples • < 60 s

Synthetic Topology • Scale Free Topology • 1000 nodes • 100 nodes are malicious and colluding • Application • considers the set of nodes whose marginal probability of being honest is > 0.5 • Attacker Goal: Try to insert x addition Sybil identities. • What is the optimal x?

Security Evaluation Maximum malicious identities Maximum False Positives < 5% Optimal number of additional Sybils

1 additional Sybil identity/ entity Threshold for state of art

State of art false negatives are significantly worse

LiveJournal • Extract a social sub graph from LiveJournal • Three hop neighbourhood of a random node • Processing • Remove nodes with degree < 3 • 33170 nodes • Our model found a bottleneck cut is this topology • False positive or Sybil attack? • Remove the bottleneck cut • 31603 nodes

DBLP • Extract a social sub graph from DBLP • Two hops from IEEE S&P (Oakland) • about 10000 nodes • Process the extracted social network • Remove nodes with degree < 15 • Impose upper bound on node degree (1.2 *mean) • 2000 nodes • 250 colluding nodes/300 Sybil identities

Applications • SPAM • Facebook/Orkut • Tor • File Sharing

Future Work • Scaling • Model works well for • Local 2/3 hop neighbourhoods • Interest groups • Will it work for multi million node topologies? • Better detection of small attacks • Distributed SybilInfer • Sybil Proof DHT

Conclusions • Proposed a formal model for inferring Sybil identities in a Social Network • Proposed solution can be applied to security critical centralized/distributed applications • High tolerance to colluding adversary • Low false negatives

SybilInfer: How to Win the Zombie Wars!