1 / 67

Gossip Algorithms and Emergent Shape

Gossip Algorithms and Emergent Shape. Ken Birman. On Gossip and Shape. Why is gossip interesting? Scalability…. Protocols are highly symmetric Although latency often forces us to bias them Powerful convergence properties… Especially in support of epidemics New forms of consistency…

sunnim
Download Presentation

Gossip Algorithms and Emergent Shape

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Gossip Algorithms and Emergent Shape Ken Birman Gossip-Based Networking Workshop

  2. On Gossip and Shape • Why is gossip interesting? • Scalability…. • Protocols are highly symmetric • Although latency often forces us to bias them • Powerful convergence properties… • Especially in support of epidemics • New forms of consistency… • Probabilistic… but this is often adequate Gossip-Based Networking Workshop

  3. Consistency • Captures intuition that if A and B compare their states, no contradiction is evident • In systems with “logical” consistency, we say things like “A’s history and B’s are closed under causality, and A is a prefix of B” • With probabilistic systems we seek rapidly decreasing probability (as time elapses) that A knows “x” but B doesn’t …. Probabilistic convergent consistency Gossip-Based Networking Workshop

  4. Exponential convergence • A subclass of convergence behaviors • Not all gossip protocols offer exponential convergence • But epidemic protocols do have this property… and many gossip protocols implement epidemics Gossip-Based Networking Workshop

  5. Value of exponential convergence • An exponentially convergent protocol overwhelms mishaps and even attacks • Requires that new information reach relevant nodes with at most log(N) delay • Can talk of “probability 1.0” outcomes • Even model simplifications (such as idealized network) are washed away! • Predictions rarely “off” by more than a constant • A rarity: a theory relevant to practice! Gossip-Based Networking Workshop

  6. Convergent consistency • To illustrate our point, contrast Cornell’s Kelips system with MIT’s Chord • Kelips is convergent. Chord isn’t Gossip-Based Networking Workshop

  7. Kelips (Linga, Gupta, Birman) Take a a collection of “nodes” 110 230 202 30 Gossip-Based Networking Workshop

  8. - N N 1 Kelips Affinity Groups: peer membership thru consistent hash Map nodes to affinity groups 0 1 2 110 230 202 members per affinity group 30 Gossip-Based Networking Workshop

  9. - N N 1 Kelips 110 knows about other members – 230, 30… Affinity Groups: peer membership thru consistent hash Affinity group view 0 1 2 110 230 202 members per affinity group 30 Affinity group pointers Gossip-Based Networking Workshop

  10. - N N 1 Kelips 202 is a “contact” for 110 in group 2 Affinity Groups: peer membership thru consistent hash Affinity group view 0 1 2 110 Contacts 230 202 members per affinity group 30 Contact pointers Gossip-Based Networking Workshop

  11. - N N 1 Kelips “cnn.com” maps to group 2. So 110 tells group 2 to “route” inquiries about cnn.com to it. Affinity Groups: peer membership thru consistent hash Affinity group view 0 1 2 110 Contacts 230 202 members per affinity group 30 Resource Tuples Gossip protocol replicates data cheaply Gossip-Based Networking Workshop

  12. How it works • Kelips is entirely gossip based! • Gossip about membership • Gossip to replicate and repair data • Gossip about “last heard from” time used to discard failed nodes • Gossip “channel” uses fixed bandwidth • … fixed rate, packets of limited size Gossip-Based Networking Workshop

  13. How it works Node 175 is a contact for Node 102 in some affinity group 175 Hmm…Node 19 looks like a much better contact in affinity group 2 • Heuristic: periodically ping contacts to check liveness, RTT… swap so-so ones for better ones. RTT: 235ms Node 102 19 RTT: 6 ms Gossip data stream Gossip-Based Networking Workshop

  14. Work in progress… • Prakash Linga is extending Kelips to support multi-dimensional indexing, range queries, self-rebalancing • Kelips has limited incoming “info rate” • Behavior when the limit is continuously exceeded is not well understood. • Will also study this phenomenon Gossip-Based Networking Workshop

  15. Replication makes it robust • Kelips should work even during disruptive episodes • After all, tuples are replicated to N nodes • Query k nodes concurrently to overcome isolated crashes, also reduces risk that very recent data could be missed • … we often overlook importance of showing that systems work while recovering from a disruption Gossip-Based Networking Workshop

  16. Chord (MIT group) • The MacDonald’s of DHTs • A data structure mapped to a network • Ring of nodes (hashed id’s) • Superimposed binary lookup trees • Other cached “hints” for fast lookups • Chord is not convergently consistent Gossip-Based Networking Workshop

  17. Chord picture 0 255 Finger links 30 248 241 Cached link 64 202 199 108 177 123 Gossip-Based Networking Workshop

  18. Transient Network Partition Chord picture USA Europe 0 0 255 255 30 30 248 248 241 64 241 64 202 202 199 108 199 108 177 177 123 123 Gossip-Based Networking Workshop

  19. … so, who cares? • Chord lookups can fail… and it suffers from high overheads when nodes churn • Loads surge just when things are already disrupted… quite often, because of loads • And can’t predict how long Chord might remain disrupted once it gets that way • Worst case scenario:Chord can become inconsistent and stay that way The Fine Print The scenario you have been shown is of low probability. In all likelihood, Chord would repair itself after any partitioning failure that might really arise. Caveat emptor and all that. Gossip-Based Networking Workshop

  20. Saved by gossip! • Epidemic gossip: remedy for what ails Chord! • c.f. Epichord (Liskov), Bambou • Key insight: • Gossip based DHTs, if correctly designed, are self-stabilizing! Gossip-Based Networking Workshop

  21. Connection to self-stabilization • Self-stabilization theory • Describe a system and a desired property • Assume a failure in which code remains correct but node states are corrupted • Proof obligation: show that property is reestablished within bounded time • But doesn’t bound badness when transient disruption is occuring Gossip-Based Networking Workshop

  22. Beyond self-stabilization • Tardos poses a related problem • Consider behavior of the system while an endless sequence of disruptive events occurs • System never reaches a quiescent state • Under what conditions will it still behave correctly? • Results of form “if disruptions satisfy then correctness property is continuously satisfied” • Hypothesis: with convergent consistency we may be able to develop a proof framework for systems that are continuously safe. Gossip-Based Networking Workshop

  23. Let’s look at a second example • Astrolabe system uses a different emergent data structure – a tree • Nodes are given an initial location – each knows its “leaf domain” • Inner nodes are elected using gossip and aggregation Gossip-Based Networking Workshop

  24. Astrolabe Intended as help for applications adrift in a sea of information Structure emerges from a randomized gossip protocol This approach is robust and scalable even under stress that cripples traditional systems Developed at RNS, Cornell By Robbert van Renesse, with many others helping… Today used extensively within Amazon.com Astrolabe

  25. Astrolabe is a flexible monitoring overlay swift.cs.cornell.edu Periodically, pull data from monitored systems Gossip-Based Networking Workshop cardinal.cs.cornell.edu

  26. Astrolabe in a single domain • Each node owns a single tuple, like the management information base (MIB) • Nodes discover one-another through a simple broadcast scheme (“anyone out there?”) and gossip about membership • Nodes also keep replicas of one-another’s rows • Periodically (uniformly at random) merge your state with some else… Gossip-Based Networking Workshop

  27. State Merge: Core of Astrolabe epidemic swift.cs.cornell.edu Gossip-Based Networking Workshop cardinal.cs.cornell.edu

  28. State Merge: Core of Astrolabe epidemic swift.cs.cornell.edu Gossip-Based Networking Workshop cardinal.cs.cornell.edu

  29. State Merge: Core of Astrolabe epidemic swift.cs.cornell.edu Gossip-Based Networking Workshop cardinal.cs.cornell.edu

  30. Observations • Merge protocol has constant cost • One message sent, received (on avg) per unit time. • The data changes slowly, so no need to run it quickly – we usually run it every five seconds or so • Information spreads in O(log N) time • But this assumes bounded region size • In Astrolabe, we limit them to 50-100 rows Gossip-Based Networking Workshop

  31. Big systems… • A big system could have many regions • Looks like a pile of spreadsheets • A node only replicates data from its neighbors within its own region Gossip-Based Networking Workshop

  32. Scaling up… and up… • With a stack of domains, we don’t want every system to “see” every domain • Cost would be huge • So instead, we’ll see a summary Gossip-Based Networking Workshop cardinal.cs.cornell.edu

  33. Astrolabe builds a hierarchy using a P2P protocol that “assembles the puzzle” without any servers Dynamically changing query output is visible system-wide SQL query “summarizes” data New Jersey San Francisco Gossip-Based Networking Workshop

  34. Large scale: “fake” regions • These are • Computed by queries that summarize a whole region as a single row • Gossiped in a read-only manner within a leaf region • But who runs the gossip? • Each region elects “k” members to run gossip at the next level up. • Can play with selection criteria and “k” Gossip-Based Networking Workshop

  35. Hierarchy is virtual… data is replicated Yellow leaf node “sees” its neighbors and the domains on the path to the root. Gnu runs level 2 epidemic because it has lowest load Falcon runs level 2 epidemic because it has lowest load New Jersey San Francisco Gossip-Based Networking Workshop

  36. Hierarchy is virtual… data is replicated Green node sees different leaf domain but has a consistent view of the inner domain New Jersey San Francisco Gossip-Based Networking Workshop

  37. Worst case load? • A small number of nodes end up participating in O(logfanoutN) epidemics • Here the fanout is something like 50 • In each epidemic, a message is sent and received roughly every 5 seconds • We limit message size so even during periods of turbulence, no message can become huge. Gossip-Based Networking Workshop

  38. Self-stabilization? • Like Kelips, it seems that Astrolabe • Is convergently consistent, self-stabilizing • And would “ride out” a large class of possible failures • But Astrolabe would be disrupted by • Incorrect aggregation (Byzantine faults) • Correlated failure of all representatives of some portion of the tree Gossip-Based Networking Workshop

  39. Focus on emergent shape • Kelips: Nodes start with a-priori assignment to affinity groups, end up with a superimposed pointer structure • Astrolabe: Nodes start with a-priori leaf domain assignments, build the tree • What other data structures can be constructed with emergent protocols? Gossip-Based Networking Workshop

  40. Emergent shape • We know a lot about a related question • Given a connected graph, cost function • Nodes have bounded degree • Use a gossip protocol to swap links until some desired graph emerges • Another related question • Given a gossip overlay, improve it by selecting “better” links (usually, lower RTT) Example: The “Anthill” framework of Alberto Montresor, Ozalp Babaoglu, Hein Meling and Francesco Russo Gossip-Based Networking Workshop

  41. Example of an open problem • Given a description of a data structure (for example, a balanced tree) • … design a gossip protocol such that the system will rapidly converge towards that structure even if disrupted • Do it with bounced per-node message rates, sizes (network load less important) • Use aggregation to test tree quality? Gossip-Based Networking Workshop

  42. Van Renesse’s dreadful aggregation tree  D L B J F N A C E G I K M O An event e occurs at H P learns O(N) time units later! G gossips with H and learns e A B C D E F G H I J K L M N O P Gossip-Based Networking Workshop

  43. What went wrong? • In Robbert’s horrendous tree, each node has equal “work to do” but the information-space diameter is larger! • Astrolabe benefits from “instant” knowledge because the epidemic at each level is run by someone elected from the level below Gossip-Based Networking Workshop

  44. Insight: Two kinds of shape • We’ve focused on the aggregation tree • But in fact should also think about the information flow tree Gossip-Based Networking Workshop

  45. A – B I – J C – D K – L M – N E – F O – P G – H Information space perspective • Bad aggregation graph: diameter O(n) • Astrolabe version: diameterO(log(n)) H – G – E – F – B – A – C – D – L – K – I – J – N – M – O – P Gossip-Based Networking Workshop

  46. Gossip and bias • Often useful to “bias” gossip, particularly if some links are fast and others are very slow Demers: Shows how to adjust probabilities to even the load. Ziao later showed that must also fine-tune gossip rate Roughly half the gossip will cross this link! A X C Y B D Z F E Gossip-Based Networking Workshop

  47. Gossip and bias • Idea: “shaped” gossip probabilities • Gravitational Gossip (Jenkins) • Groups carry multicast event streams reporting evolution of a continuous function (like electric power loads) • Some nodes want to watch “closely” while others only need part of the data Gossip-Based Networking Workshop

  48. Inner ring: Nodes want 100% of the traffic Middle ring: Nodes want 75% of the traffic Outer ring: Nodes want 20% of the traffic Gravitational Gossip Jenkins: When a gossips to b, includes information about topic t in a way weighted by b’s level of interest in topic t b a c Gossip-Based Networking Workshop

  49. Gravitational Gossip Gossip-Based Networking Workshop

  50. How does bias impact information-flow graph? • Earlier, all links were the same • Now, some links carry • Less information • And may have longer delays • (Bounded capacity messages: “similar” to long links?) • Question: Model biased information flow graphs and explore implications Gossip-Based Networking Workshop

More Related