Commensal Cuckoo : Secure Group Partitioning for Large-Scale Services

Commensal Cuckoo: Secure Group Partitioning for Large-Scale Services Siddhartha Sen and Mike Freedman Princeton University

Scalable peer-to-peer service Peer-to-peer service untrusted participants Shard data/ functionality Clients

Scalable peer-to-peer service How do we make it reliable? Byzantine Fault Tolerant (BFT) f < 1/3 Mask failures with replication untrusted participants • Observe: • Ff • Want small groups F < 1/4 f < 1/3 Clients f < 1/3

Prior work using many small groups • Systems: • [Rampart95], [SecureRing98], [OceanStore00], [Farsite02], [CastroDGRW02], [Rosebud03], [Myrmic06], [Fireflies06], [Salsa06], [SinghNDW06], [Halo08], [Flightpath08], [Shadowwalker09], [Census09] • Theory: • [HildrumK03], [NaorW07] Problem: Assume randomly or perfectly distributed faults (i.e., static)

Rosebud [RL03] 1 0 BFT group Consistent hashing ring

Rosebud [RL03] 1 0 • Unrealistic: • Don’t know faulty nodes • Best case is uniformly random •  (1) faults per group • Real adversary is dynamic! BFT group F = f < 1/3

Join-leave attack f > 1/3 1 0 Vanish system compromised by join-leave attack (2010) join leave

Prior work tolerating join-leave attacks • [FiatSY05], [AwerbuchS04], [Scheideler05] • State-of-the-art is cuckoo rule [AwerbuchS06, AwerbuchS07] • Problems: • Impractical (large constant factors) • Groups must be impractically large or F trivially low

Goal: Provably secure + practical group partitioning scheme • Contributions: • Demonstrate failures of prior work • Analyze and understand failures • Devise algorithm that overcomes them • Assumptions • Correct nodes randomly distributed and stable • Adversary controls global fraction F of nodes in system, rejoins them maliciously • System fails when one group fails, i.e. f 1/3

Cuckoo rule (CR) [AS06] 1 0 1 2 F < f < 1/3 3 4

Cuckoo rule (CR) [AS06] 1 0 primary join random location in [0,1) random locations in [0,1) 1 2 k-region • For poly(n) rounds, all regions of size O(log n)/n have: • O(log n) nodes • f < 1/3 Adversary strategy: rejoin from least faulty group join 3 4 secondary join secondary join leave

Cuckoo rule (CR) [AS06] • In summary: • On primary join, cuckoo (evict) nodes in immediate k-region to selected random ID • Select new random IDs for cuckood nodes, join them as secondary joins (i.e., no subsequent cuckoos) • Ignore implementation issues: • Route messages securely • Verify messages from other groups • Bootstrap the system, handle heavy churn

CR tolerates very few faults in practice Group size = 64, Rounds = 100,000

What if we allow larger groups? Increased group size in powers of 2

CR: Evolution of a faulty group Expected faulty fraction per group N = 4096, F 5%, Group size = 64, k = 4

Why does this happen? 1 0 closely-spaced primary joins = bad news primary joins create holes faulty group! 1 2 3 4 empty k-regions cuckoo less

CR: Cuckoo size is erratic clumps Expected cuckoo size holes N = 4096, F 5%, Group size = 64, k = 4

CR: Primary join spacing is erratic Expected secondary joins N = 4096, F 5%, Group size = 64, k = 4

Cuckoo rule is “parasitic”

New algorithm (Fixing CR) • Holes and clumpiness: • Cuckoo k nodes chosen randomly from group • Scale k relative to average group size (larger groups cuckoo more, smaller groups cuckoo less) • Inconsistently spaced primary joins: • Group vets join attempt, deny if insufficient secondary joins since last primary join

“Commensal” cuckoo rule Commensalism. A symbiotic relationship in which one organism derives benefit while causing little or no harm to the other.

Commensal cuckoo rule (CCR) 1 0 received secondary join! too few secondary joins cuckoo k random nodes primary join accepted 1 2 holes don’t matter 3 4 (recall CR cuckood only 1 node)

Commensal cuckoo rule (CCR) • In summary: • On primary join to selected random ID, if fewer than k secondary joins since last primary join, start over with new random ID • Otherwise, cuckoo k nodes weighted by group size, join them as secondary joins (i.e., no subsequent cuckoos)

Techniques are synergistic • Join vetting forces adversary to join distinct groups  all groups joined (roughly) • Weighted cuckoos ensure sufficient secondary joins  O(1) join attempts needed

Cuckoo size is consistent CR: CCR:

CCR: Primary join spacing is consistent

CCR tolerates significantly more faults f < 1/3

CCR tolerates significantly more faults • How to use BFT with f < 1/2? • Idea: Separate correctness from availability • Group is correct, but unresponsive • Use other groups to revive group! f < 1/2

Join vetting has deeper benefits • Security vulnerability in CR: adversary retries a primary join (w/o causing cuckoos) until gets location it likes • CCR avoids problem: group won’t accept primary join if insufficient secondary joins • Don’t care how many previous attempts or where

Summary • CR suffers from random bad events, which CCR avoids by derandomizing • Cuckoos weighted by group size • Primary join attempts vetted by groups • CCR tolerates F 7% for f < 1/3 F 18% for f < 1/2

Extensions (A complete solution) • Route messages securely • O(1)-hop routing • Verify messages from other groups • Distributed key generation, threshold signatures  constant public/private key per group • Bootstrap the system, handle heavy churn • Choose target group size at onset (e.g. 64); Split/merge locally • Handle DoS and data layer attacks • Reactive approach, e.g. reactive replication

Conclusion • Secure group membership partitioning for open P2P systems • Most previous systems assumed (impossible) perfect distribution, ignored join-leave attacks • CCR can handle much higher fractions of faulty nodes than prior algorithms

Commensal Cuckoo : Secure Group Partitioning for Large-Scale Services

Commensal Cuckoo : Secure Group Partitioning for Large-Scale Services

Presentation Transcript

AnyGL: A Large Scale Hybrid Distributed Graphics System

Pricing Approaches and Considerations for Large Accounts

Large-Scale SQL Server Deployments for DBAs

The Horizontal Boundaries of the Firm: Economies of Scale and Scope

4. SCALE-UP OF BIOREACTOR SYSTEMS

Oracle Partitioning in Oracle Database 11g

CS 267: Applications of Parallel Computers Graph Partitioning

Impact, Washback and Consequences of Large-scale Testing

Partitioning, Divide and Conquer

Introduction to Large Scale Modeling Systems

Large-Scale Copy Detection

Thesis Defense Large -Scale Graph Computation on Just a PC

Understanding the Viability of Large-Scale System Designs

GraphChi : Large-Scale Graph Computation on Just a PC

The Partitioning Decision

Scalable Web Architectures

Malware Analysis Using Cuckoo Sandbox

CS 54001-1: Large-Scale Networked Systems

Week 4 The Large Scale Universe

Large Scale Studies of Dyslexia in Florida

PGENESIS Tutorial WAM-BAMM 05