1 / 41

Brahms

Brahms. B yzantine- R esilient R a ndom M embership S ampling. Bortnikov, Gurevich, Keidar, Kliot, and Shraer. Edward (Eddie) Bortnikov. Maxim (Max) Gurevich. Idit Keidar. Alexander (Alex) Shraer. Gabriel (Gabi) Kliot. Why Random Node Sampling. Gossip partners

gwen
Download Presentation

Brahms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer

  2. Edward (Eddie) Bortnikov Maxim (Max) Gurevich Idit Keidar Alexander (Alex) Shraer Gabriel (Gabi) Kliot

  3. Why Random Node Sampling • Gossip partners • Random choices make gossip protocols work • Unstructured overlay networks • E.g., among super-peers • Random links provide robustness, expansion • Gathering statistics • Probe random nodes • Choosing cache locations

  4. The Setting • Many nodes – n • 10,000s, 100,000s, 1,000,000s, … • Come and go • Churn • Every joining node knows some others • Connectivity • Full network • Like the Internet • Byzantine failures

  5. Adversary Attacks • Faulty nodes (portion f of ids) • Attack other nodes • May want to bias samples • Isolate nodes, DoS nodes • Promote themselves, bias statistics

  6. Previous Work • Benign gossip membership • Small (logarithmic) views • Robust to churn and benign failures • Empirical study [Lpbcast,Scamp,Cyclon,PSS] • Analytical study [Allavena et al.] • Never proven uniform samples • Spatial correlation among neighbors’ views [PSS] • Byzantine-resilient gossip • Full views [MMR,MS,Fireflies,Drum,BAR] • Small views, some resilience [SPSS] • We are not aware of any analytical work

  7. Our Contributions • Gossip-based attack-tolerant membership • Linear portion f of failures • O(n1/3)-size partial views • Correct nodes remain connected • Mathematically analyzed, validated in simulations • Random sampling • Novel memory-efficient approach • Converges to proven independent uniform samples The view is not all bad Better than benign gossip

  8. Brahms Sampling - local component Gossip - distributed component Gossip view Sampler sample

  9. Sampler Building Block • Input: data stream, one element at a time • Bias: some values appear more than others • Used with stream of gossiped ids • Output: uniform random sample • of unique elements seen thus far • Independent of other Samplers • One element at a time (converging) next Sampler sample

  10. Sampler Implementation • Memory: stores one element at a time • Use random hash function h • From min-wise independent family [Broder et al.] • For each set X, and all , next init Sampler Choose random hash function Keep id with smallest hash so far sample

  11. Component S: Sampling and Validation id streamfrom gossip init next Sampler Sampler Sampler Sampler using pings sample Validator Validator Validator Validator S

  12. Gossip Process • Provides the stream of ids for S • Needs to ensure connectivity • Use a bag of tricks to overcome attacks

  13. Gossip-Based Membership Primer • Small (sub-linear) local view V • V constantly changes - essential due to churn • Typically, evolves in (unsynchronized) rounds • Push: send my id to some node in V • Reinforce underrepresented nodes • Pull: retrieve view from some node in V • Spread knowledge within the network • [Allavena et al. ‘05]: both are essential • Low probability for partitions and star topologies

  14. Brahms Gossip Rounds • Each round: • Sendpushes, pulls to random nodes from V • Wait to receive pulls, pushes • Update S with all received ids • (Sometimes) re-compute V • Tricky! Beware of adversary attacks

  15. Problem 1: Push Drowning Push Alice A E M M Push Bob Push Mallory Push Carol M B M Push Ed Push Dana Push M&M D Push Malfoy M

  16. Trick 1: Rate-Limit Pushes • Use limited messages to bound faulty pushes system-wide • E.g., computational puzzles/virtual currency • Faulty nodes can send portion p of them • Views won’t be all bad

  17. Problem 2: Quick Isolation Ha! She’s out! Now let’s move on to the next guy! Push Alice A E M Push Bob Push Carol Push Mallory Push Ed Push Dana Push M&M Push Malfoy C D

  18. Trick 2: Detection & Recovery • Do not re-compute V in rounds when too many pushes are received • Slows down isolation;does not prevent it Push Bob Push Mallory Hey! I’m swamped! I better ignore all of ‘em pushes… Push M&M Push Malfoy

  19. Problem 3: Pull Deterioration Pull Pull M3 M7 Pull Pull M8 E  75% faulty ids in views 50% faulty ids in views

  20. Trick 3: Balance Pulls & Pushes • Control contribution of push - α|V| ids versus contribution of pull - β|V| ids • Parameters α, β • Pull-only  eventually all faulty ids • Push-only  quick isolation of attacked node • Push ensures: system-wide not all bad ids • Pull slows down (does not prevent) isolation

  21. Trick 4: History Samples • Attacker influences both push and pull • Feedback γ|V| random ids from S • Parameters α + β + γ = 1 • Attacker loses control - samples are eventually perfectly uniform Yoo-hoo, is there any good process out there?

  22. View and Sample Maintenance Pushed ids Pulled ids S |V|  |V| |V| View V Sample

  23. Key Property • Samples take time to help • Assume attack starts when samples are empty • With appropriate parameters • E.g., • Time to isolation > time toconvergence Prove lower bound using tricks 1,2,3(not using samples yet) Prove upper bound until some good sample persists forever Self-healing from partitions

  24. History Samples: Rationale • Judicious use essential • Bootstrap, avoid slow convergence • Deal with churn • With a little bit of history samples (10%) we can cope with any adversary • Amplification!

  25. Analysis Sampling - mathematical analysis Connectivity - analysis and simulation Full system simulation

  26. Connectivity  Sampling • Theorem:If overlay remains connected indefinitely, samples are eventuallyuniform

  27. Sampling  Connectivity Ever After • Perfect sample of a sampler with hash h: the id with the lowest h(id) system-wide • If correct, sticks once the sampler sees it • Correct perfect sample  self-healing from partitions ever after • We analyze PSP(t) – probability of perfect sample at time t

  28. Convergence to 1st Perfect Sample • n = 1000 • f = 0.2 • 40% unique ids in stream

  29. Scalability • Analysis says: • For scalability, want small and constant convergence time • independent of system size, e.g., when

  30. Connectivity Analysis 1: Balanced Attacks • Attack all nodes the same • Maximizes faulty ids in views system-wide • in any single round • If repeated, system converges to fixed point ratio of faulty ids in views, which is < 1 if • γ=0 (no history) and p < 1/3 or • History samples are used, any p There are always good ids in views!

  31. Fixed Point Analysis: Push Local view node i Local view node 1 i Time t: lost push push push from faulty node 1 Time t+1: • x(t) – portion of faulty nodes in views at round t; • portion of faulty pushes to correct nodes : • p / ( p + ( 1 − p )( 1 − x(t) ) )

  32. Fixed Point Analysis: Pull Local view node i Local view node 1 i Time t: pull from i: faulty with probability x(t) pull from faulty Time t+1: E[x(t+1)] =  p / (p + (1 − p)(1 − x(t))) +  ( x(t) + (1-x(t))x(t) ) + γf

  33. Faulty Ids in Fixed Point Assumed perfect in analysis, real history in simulations With a few history samples, any portion of bad nodes can be tolerated Perfectly validated fixed pointsand convergence

  34. Convergence to Fixed Point • n = 1000 • p = 0.2 • α=β=0.5 • γ=0

  35. Connectivity Analysis 2:Targeted Attack – Roadmap • Step 1: analysis without history samples • Isolation in logarithmic time • … but not too fast, thanks to tricks 1,2,3 • Step 2: analysis of history sample convergence • Time-to-perfect-sample < Time-to-Isolation • Step 3: putting it all together • Empirical evaluation • No isolation happens

  36. Targeted Attack – Step 1 • Q: How fast (lower bound) can an attacker isolate one node from the rest? • Worst-case assumptions • No use of history samples ( = 0) • Unrealistically strong adversary • Observes the exact number of correct pushes and complements it to α|V| • Attacked node not represented initially • Balanced attack on the rest of the system

  37. Isolation w/out History Samples • n = 1000 • p = 0.2 • α=β=0.5 • γ=0 Isolation time for |V|=60 Depend on α,β,p

  38. Step 2: Sample Convergence • n = 1000 • p = 0.2 • α=β=0.5, γ=0 • 40% unique ids Perfect sample in 2-3 rounds Empirically verified

  39. Step 3: Putting It All TogetherNo Isolation with History Samples • n = 1000 • p = 0.2 • α=β=0.45 • γ=0.1 Works well despite small PSP

  40. Sample Convergence (Balanced) • p = 0.2 • α=β=0.45 • γ=0.1 Convergence twice as fast with

  41. Summary O(n1/3)-size views Resist attacks / failures of linear portion Converge to proven uniform samples Precise analysis of impact of attacks

More Related