1 / 40

Brahms

Brahms. B yzantine- R esilient R a ndom M embership S ampling. Bortnikov, Gurevich, Keidar, Kliot, and Shraer. Edward (Eddie) Bortnikov. Maxim (Max) Gurevich. Idit Keidar. Alexander (Alex) Shraer. Gabriel (Gabi) Kliot. Why Random Node Sampling. Gossip partners

amalia
Download Presentation

Brahms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer

  2. Edward (Eddie) Bortnikov Maxim (Max) Gurevich Idit Keidar Alexander (Alex) Shraer Gabriel (Gabi) Kliot

  3. Why Random Node Sampling • Gossip partners • Random choices make gossip protocols work • Unstructured overlay networks • E.g., among super-peers • Random links provide robustness, expansion • Gathering statistics • Probe random nodes • Choosing cache locations

  4. The Setting • Many nodes – n • 10,000s, 100,000s, 1,000,000s, … • Come and go • Churn • Every joining node knows some others • Connectivity • Full network • Like the Internet • Byzantine failures

  5. Byzantine Fault Tolerance (BFT) • Faulty nodes (portion f) • Arbitrary behavior: bugs, intrusions, selfishness • Choose f ids arbitrarily • No CA, but no panacea for Cybil attacks • May want to bias samples • Isolate nodes, DoS nodes • Promote themselves, bias statistics

  6. Previous Work • Benign gossip membership • Small (logarithmic) views • Robust to churn and benign failures • Empirical study [Lpbcast,Scamp,Cyclon,PSS] • Analytical study [Allavena et al.] • Never proven uniform samples • Spatial correlation among neighbors’ views [PSS] • Byzantine-resilient gossip • Full views [MMR,MS,Fireflies,Drum,BAR] • Small views, some resilience [SPSS] • We are not aware of any analytical work

  7. Our Contributions • Gossip-based BFT membership • Linear portion f of Byzantine failures • O(n1/3)-size partial views • Correct nodes remain connected • Mathematically analyzed, validated in simulations • Random sampling • Novel memory-efficient approach • Converges to proven independent uniform samples The view is not all bad Better than benign gossip

  8. Brahms Sampling - local component Gossip - distributed component Gossip view Sampler sample

  9. Sampler Building Block • Input: data stream, one element at a time • Bias: some values appear more than others • Used with stream of gossiped ids • Output: uniform random sample • of unique elements seen thus far • Independent of other Samplers • One element at a time (converging) next Sampler sample

  10. Sampler Implementation • Memory: stores one element at a time • Use random hash function h • From min-wise independent family [Broder et al.] • For each set X, and all , next init Sampler Choose random hash function Keep id with smallest hash so far sample

  11. Component S: Sampling and Validation id streamfrom gossip init next Sampler Sampler Sampler Sampler using pings sample Validator Validator Validator Validator S

  12. Gossip Process • Provides the stream of ids for S • Needs to ensure connectivity • Use a bag of tricks to overcome attacks

  13. Gossip-Based Membership Primer • Small (sub-linear) local view V • V constantly changes - essential due to churn • Typically, evolves in (unsynchronized) rounds • Push: send my id to some node in V • Reinforce underrepresented nodes • Pull: retrieve view from some node in V • Spread knowledge within the network • [Allavena et al. ‘05]: both are essential • Low probability for partitions and star topologies

  14. Brahms Gossip Rounds • Each round: • Sendpushes, pulls to random nodes from V • Wait to receive pulls, pushes • Update S with all received ids • (Sometimes) re-compute V • Tricky! Beware of adversary attacks

  15. Problem 1: Push Drowning Push Alice A E M M Push Bob Push Mallory Push Carol M B M Push Ed Push Dana Push M&M D Push Malfoy M

  16. Trick 1: Rate-Limit Pushes • Use limited messages to bound faulty pushes system-wide • E.g., computational puzzles/virtual currency • Faulty nodes can send portion p of them • Views won’t be all bad

  17. Problem 2: Quick Isolation Ha! She’s out! Now let’s move on to the next guy! Push Alice A E M Push Bob Push Carol Push Mallory Push Ed Push Dana Push M&M Push Malfoy C D

  18. Trick 2: Detection & Recovery • Do not re-compute V in rounds when too many pushes are received • Slows down isolation;does not prevent it Push Bob Push Mallory Hey! I’m swamped! I better ignore all of ‘em pushes… Push M&M Push Malfoy

  19. Trick 3: Balance Pulls & Pushes • Control contribution of push - α|V| ids versus contribution of pull - β|V| ids • Parameters α, β • Pull-only  eventually all faulty ids • Pull from faulty nodes – all faulty ids, from correct nodes – some faulty ids • Push-only  quick isolation of attacked node • Push ensures: system-wide not all bad ids • Pull slows down (does not prevent) isolation

  20. Trick 4: History Samples • Attacker influences both push and pull • Feedback γ|V| random ids from S • Parameters α + β + γ = 1 • Attacker loses control - samples are eventually perfectly uniform Yoo-hoo, is there any good process out there?

  21. View and Sample Maintenance Pushed ids Pulled ids S |V|  |V| |V| View V Sample

  22. Key Property • Samples take time to help • Assume attack starts when samples are empty • With appropriate parameters • E.g., • Time to isolation > time toconvergence Prove lower bound using tricks 1,2,3(not using samples yet) Prove upper bound until some good sample persists forever Self-healing from partitions

  23. History Samples: Rationale • Judicious use essential • Bootstrap, avoid slow convergence • Deal with churn • With a little bit of history samples (10%) we can cope with any adversary • Amplification!

  24. Analysis Sampling - mathematical analysis Connectivity - analysis and simulation Full system simulation

  25. Connectivity  Sampling • Theorem:If overlay remains connected indefinitely, samples are eventuallyuniform

  26. Sampling  Connectivity Ever After • Perfect sample of a sampler with hash h: the id with the lowest h(id) system-wide • If correct, sticks once the sampler sees it • Correct perfect sample  self-healing from partitions ever after • We analyze PSP(t) – probability of perfect sample at time t

  27. Convergence to 1st Perfect Sample • n = 1000 • f = 0.2 • 40% unique ids in stream

  28. Scalability • Analysis says: • For scalability, want small and constant convergence time • independent of system size, e.g., when

  29. Connectivity Analysis 1: Balanced Attacks • Attack all nodes the same • Maximizes faulty ids in views system-wide • in any single round • If repeated, system converges to fixed point ratio of faulty ids in views, which is < 1 if • γ=0 (no history) and p < 1/3 or • History samples are used, any p There are always good ids in views!

  30. Fixed Point Analysis: Push Local view node i Local view node 1 i Time t: lost push push push from faulty node 1 Time t+1: • x(t) – portion of faulty nodes in views at round t; • portion of faulty pushes to correct nodes : • p / ( p + ( 1 − p )( 1 − x(t) ) )

  31. Fixed Point Analysis: Pull Local view node i Local view node 1 i Time t: pull from i: faulty with probability x(t) pull from faulty Time t+1: E[x(t+1)] =  p / (p + (1 − p)(1 − x(t))) +  ( x(t) + (1-x(t))x(t) ) + γf

  32. Faulty Ids in Fixed Point Assumed perfect in analysis, real history in simulations With a few history samples, any portion of bad nodes can be tolerated Perfectly validated fixed pointsand convergence

  33. Convergence to Fixed Point • n = 1000 • p = 0.2 • α=β=0.5 • γ=0

  34. Connectivity Analysis 2:Targeted Attack – Roadmap • Step 1: analysis without history samples • Isolation in logarithmic time • … but not too fast, thanks to tricks 1,2,3 • Step 2: analysis of history sample convergence • Time-to-perfect-sample < Time-to-Isolation • Step 3: putting it all together • Empirical evaluation • No isolation happens

  35. Targeted Attack – Step 1 • Q: How fast (lower bound) can an attacker isolate one node from the rest? • Worst-case assumptions • No use of history samples ( = 0) • Unrealistically strong adversary • Observes the exact number of correct pushes and complements it to α|V| • Attacked node not represented initially • Balanced attack on the rest of the system

  36. Isolation w/out History Samples • n = 1000 • p = 0.2 • α=β=0.5 • γ=0 Isolation time for |V|=60 Depend on α,β,p

  37. Step 2: Sample Convergence • n = 1000 • p = 0.2 • α=β=0.5, γ=0 • 40% unique ids Perfect sample in 2-3 rounds Empirically verified

  38. Step 3: Putting It All TogetherNo Isolation with History Samples • n = 1000 • p = 0.2 • α=β=0.45 • γ=0.1 Works well despite small PSP

  39. Sample Convergence (Balanced) • p = 0.2 • α=β=0.45 • γ=0.1 Convergence twice as fast with

  40. Summary O(n1/3)-size views Resist Byzantine failures of linear portion Converge to proven uniform samples Precise analysis of impact of failures

More Related