330 likes | 471 Views
Commensal Cuckoo : Secure Group Partitioning for Large-Scale Services. Siddhartha Sen and Mike Freedman Princeton University. Scalable p eer -to-peer service. Peer-to-peer service. untrusted participants. Shard data/ functionality. Clients. Scalable p eer -to-peer service.
E N D
Commensal Cuckoo: Secure Group Partitioning for Large-Scale Services Siddhartha Sen and Mike Freedman Princeton University
Scalable peer-to-peer service Peer-to-peer service untrusted participants Shard data/ functionality Clients
Scalable peer-to-peer service How do we make it reliable? Byzantine Fault Tolerant (BFT) f < 1/3 Mask failures with replication untrusted participants • Observe: • Ff • Want small groups F < 1/4 f < 1/3 Clients f < 1/3
Prior work using many small groups • Systems: • [Rampart95], [SecureRing98], [OceanStore00], [Farsite02], [CastroDGRW02], [Rosebud03], [Myrmic06], [Fireflies06], [Salsa06], [SinghNDW06], [Halo08], [Flightpath08], [Shadowwalker09], [Census09] • Theory: • [HildrumK03], [NaorW07] Problem: Assume randomly or perfectly distributed faults (i.e., static)
Rosebud [RL03] 1 0 BFT group Consistent hashing ring
Rosebud [RL03] 1 0 • Unrealistic: • Don’t know faulty nodes • Best case is uniformly random • (1) faults per group • Real adversary is dynamic! BFT group F = f < 1/3
Join-leave attack f > 1/3 1 0 Vanish system compromised by join-leave attack (2010) join leave
Prior work tolerating join-leave attacks • [FiatSY05], [AwerbuchS04], [Scheideler05] • State-of-the-art is cuckoo rule [AwerbuchS06, AwerbuchS07] • Problems: • Impractical (large constant factors) • Groups must be impractically large or F trivially low
Goal: Provably secure + practical group partitioning scheme • Contributions: • Demonstrate failures of prior work • Analyze and understand failures • Devise algorithm that overcomes them • Assumptions • Correct nodes randomly distributed and stable • Adversary controls global fraction F of nodes in system, rejoins them maliciously • System fails when one group fails, i.e. f 1/3
Cuckoo rule (CR) [AS06] 1 0 1 2 F < f < 1/3 3 4
Cuckoo rule (CR) [AS06] 1 0 primary join random location in [0,1) random locations in [0,1) 1 2 k-region • For poly(n) rounds, all regions of size O(log n)/n have: • O(log n) nodes • f < 1/3 Adversary strategy: rejoin from least faulty group join 3 4 secondary join secondary join leave
Cuckoo rule (CR) [AS06] • In summary: • On primary join, cuckoo (evict) nodes in immediate k-region to selected random ID • Select new random IDs for cuckood nodes, join them as secondary joins (i.e., no subsequent cuckoos) • Ignore implementation issues: • Route messages securely • Verify messages from other groups • Bootstrap the system, handle heavy churn
CR tolerates very few faults in practice Group size = 64, Rounds = 100,000
What if we allow larger groups? Increased group size in powers of 2
CR: Evolution of a faulty group Expected faulty fraction per group N = 4096, F 5%, Group size = 64, k = 4
Why does this happen? 1 0 closely-spaced primary joins = bad news primary joins create holes faulty group! 1 2 3 4 empty k-regions cuckoo less
CR: Cuckoo size is erratic clumps Expected cuckoo size holes N = 4096, F 5%, Group size = 64, k = 4
CR: Primary join spacing is erratic Expected secondary joins N = 4096, F 5%, Group size = 64, k = 4
New algorithm (Fixing CR) • Holes and clumpiness: • Cuckoo k nodes chosen randomly from group • Scale k relative to average group size (larger groups cuckoo more, smaller groups cuckoo less) • Inconsistently spaced primary joins: • Group vets join attempt, deny if insufficient secondary joins since last primary join
“Commensal” cuckoo rule Commensalism. A symbiotic relationship in which one organism derives benefit while causing little or no harm to the other.
Commensal cuckoo rule (CCR) 1 0 received secondary join! too few secondary joins cuckoo k random nodes primary join accepted 1 2 holes don’t matter 3 4 (recall CR cuckood only 1 node)
Commensal cuckoo rule (CCR) • In summary: • On primary join to selected random ID, if fewer than k secondary joins since last primary join, start over with new random ID • Otherwise, cuckoo k nodes weighted by group size, join them as secondary joins (i.e., no subsequent cuckoos)
Techniques are synergistic • Join vetting forces adversary to join distinct groups all groups joined (roughly) • Weighted cuckoos ensure sufficient secondary joins O(1) join attempts needed
Cuckoo size is consistent CR: CCR:
CCR tolerates significantly more faults • How to use BFT with f < 1/2? • Idea: Separate correctness from availability • Group is correct, but unresponsive • Use other groups to revive group! f < 1/2
Join vetting has deeper benefits • Security vulnerability in CR: adversary retries a primary join (w/o causing cuckoos) until gets location it likes • CCR avoids problem: group won’t accept primary join if insufficient secondary joins • Don’t care how many previous attempts or where
Summary • CR suffers from random bad events, which CCR avoids by derandomizing • Cuckoos weighted by group size • Primary join attempts vetted by groups • CCR tolerates F 7% for f < 1/3 F 18% for f < 1/2
Extensions (A complete solution) • Route messages securely • O(1)-hop routing • Verify messages from other groups • Distributed key generation, threshold signatures constant public/private key per group • Bootstrap the system, handle heavy churn • Choose target group size at onset (e.g. 64); Split/merge locally • Handle DoS and data layer attacks • Reactive approach, e.g. reactive replication
Conclusion • Secure group membership partitioning for open P2P systems • Most previous systems assumed (impossible) perfect distribution, ignored join-leave attacks • CCR can handle much higher fractions of faulty nodes than prior algorithms