Probabilistic Broadcast

Probabilistic Broadcast Presented by Keren Censor

Traditional client-server model • Central point of failure • Performance bottleneck • Heavy load on servers

Peer-to-Peer (P2P) • Central point of failure • Performance bottleneck • Heavy load on servers

Information Dissemination • Deterministic solutions • Flooding– send a message to every neighbor • #Messages = O(#edges) • Time = diameter • Deterministic routing– send according to a spanning tree • Non-resilient to failures • Time = O(#nodes)

Requirements • Reliable broadcast • Reach all nodes • Resilient to failures Considering: • Dynamic network topology • Crashes • Disconnections • Packet losses

Random information spreading • Trade reliability with scalability • Algorithm may be less reliable, but should scale well with system size • Basic gossip algorithm • Forward information to a randomly chosen subset of your neighbors. Design parameters: • Buffer capacity B • Fan-out F • Number of times a message is forwarded T

Previous algorithms • First developed for consistency management of replicated database [Demers et al. 1987] • Reliability in Bimodal Multicast[Birman et al. 1999]: The set of nodes that a message reaches is • Almost all of the nodes, with high probability • Almost none of the nodes, with small probability • Other subsets, with vanishingly small probability

Design constraints • Membership • Knowledge of the participants • Network awareness • Knowledge of real network topology • Buffer management • Memory usage • Message filtering • According to different interests

Design constraints • Membership • Knowledge of the participants in the system • Previous algorithms assume this knowledge • Problems: • Storage increases linearly with system size n • Maintenance imposes extra load

Design constraints • Membership • Solution: integrate membership with gossip and maintain partial view • Uniformity: how to gossip to members chosen uniformly at random from the entire system • Adaptivity: some parameter must grow with system size, how do we estimate the system size? • Bootstraping: how is the system initialized?

Design constraints • Network awareness • Knowledge of real network topology • Problem: a message sent by p to a nearby q may be routed through a remote w p q w

Design constraints • Network awareness • Solution: organize processes in a hierarchy that reflects the network topology • Distributed? • Fault tolerant? • Scalable?

in each round grows with each round Related approaches • Directional gossip[Lin and Marzullo, 1999]: a weight is given for each neighbor, according to its connectivity. A higher probability is given for neighbors with less weight

Design constraints • Buffer management • Memory usage • Problem: limited buffers. When buffer is full: • Drop new messages? • Drop old messages? • In Bimodal Multicast: a message is gossiped by a node for a limited number of rounds, and then it is erased

Design constraints Buffer management Solutions: Age-based priorities Application semantics Elaboration later

p q w Design constraints • Message filtering • According to different interests • Problem: redundancy if there are topics of interest • How does a process know the interest of its neighbors? • Assume magically that this info is available, does p decide not to send to q a message it is not interested in? • Solution: hierarchy of processes What if wis interested?

LPBCAST • Lightweight Probabilistic Broadcast • [Eugster, Guerraoui, Handurukande, Kouznetsov, and Kermarrec, 2003] • Main contribution: • Scalable memory consumption for • Membership management • Message buffering

Model • Set of processes П = {p1 , p2 , …} • Synchronous rounds • Complete logical network • LPBCAST has partial views Application Application LPBCAST LPBCAST

Buffers • Event notifications –events • Event notifications identifiers –eventIds • Unsubscriptions –unSubs • Subscriptions –subs and view • Each buffer L has a maximum size |L|max • Truncation of L: removing random elements so that |L|≤|L|max events eventIds unSubs subs view

Receiving event from application • Upon LPCAST(e) • Add e to events Application events e e LPBCAST

Gossiping • Periodically (gossip period T ms) generate a message and send to F (fanout) members chosen randomly from view events = Ø view Gossipmessage eventIds unSubs F random elements subs ∪{pi}

unSubs unSubs Gossip reception • Unsubscriptions: • Remove from view and subs • Add to unSubs and truncate view subs unSubs pi pi pi random elements pj pj pj

view subs view subs Gossip reception • Subscriptions: • Add to view and subs • Truncate view into subs • Truncate subs view subs subs pi pi random elements random elements pj pj

Gossip reception • Events: • Deliver new event notifications to application • Add to eventIds and events, and truncate • If received id not in eventIds then add to retrieveBuf Keeps ids of all delivered events Application events eventIds retrieveBuf e LPBCAST e e.id id

Retrieving events • If > k rounds passed since an eventId was inserted into retrieveBuf, and the matching event was not yet received: • Ask for the event from the process q from whom eventId was received. If no reply for r rounds: • Ask for the event from a random neighbor. • Ask for the event from the source of the event.

Subscriptions and unsubscriptions • Subscribe: pi subscribes by some known pj , which gossips this subscription. Gossip messages will start reaching pi , otherwise it subscribes again • Unsubscribe: have timestamps after which unsubscriptions become obsolete • Subscriptions are gossiped continuously to insure uniformly distributed views: a failed process will be removed from all views with high probability

Analysis – Assumptions • n processes, П is constant • Latency is smaller than gossip period T • Failures are stochastically independent: • Probability of a message lost ≤ ε • Number of crashes ≤ f • Event notification identifiers are unique

Analysis – Distribution of views • Assume each p has an independent uniformly distributed random view of size l • In round r: • In round r+1: • For l << |subs|maxF , this is estimated by l/(n-1) p in view p not removed p not in view p enters view

Analysis – Event propagation Doesn’t depend on l • pr = Pr[p receives certain gossip message] ≥ • sr,e = #processes that received event eby roundr • Markov chain: pij= Pr[sr+1=j | sr=i] = p in view p is chosen message not lost p doesn’t crash q=1-pr

Analysis – Event propagation • Markov chain: pij= Pr[sr+1=j | sr=i] = • Distribution of sr: q=1-pr

Analysis – Gossip rounds • #rounds decreases as the fanout F increases • Claim: #rounds increases logarithmically with system size n • Compare: diameter of a random graph is O(logn) • View size l does not influence #rounds • But we needed to assume l << |subs|maxF , so the subs buffer pays the price?

Analysis – Partitioning Pr[partition of size i]= • For constant n: decreases as l increases • For constant l: decreases as n increases • In one set: i views include only the other i-1, • In the other set: n-i views include only the other n-i-1

Analysis – Partitioning Pr[no partition up to round r]= Decreases very slowly with r Design: can choose l as a function of some required probability of not partitioning In practice, add a hierarchy – a set of processes that are always known to everyone

Age-based message purging • Replaces the randomly truncating of the events buffer • Each event e has an age parameter • Initialized to 0 by the application • Incremented by every gossiping processes • While |events|>|events|max • Remove smallest id events from the same source • Remove oldest events, according to their age

Age-based message purging • Evaluation: • Delivery ratio: ratio between number of messages delivered and number of messages sent per round. • Redundancy: fraction of same messages received by a certain process in a given round • Throughput: as a function of stability – delivered by 90% of the processes • Fault tolerance: delivery ratio in the presence of faults

Frequency-based membership purging • With random truncating, a new member has the same probability of being removed as a well known member • Each element in subs has a frequency parameter • Incremented each time it is received • Truncating: avg = average frequency in list • Choose random element from list • If its frequency > k·avg , then remove this element • Otherwise, increment its frequency and goto 1

Frequency-based membership purging • Evaluation: • Propagation delay: number of informed processes as a function of the round number • Membership management: number of times membership information about a process is seen by others. Measured on processes removed from the subs buffer

References Epidemic Algorithms for Replicated Database Maintenance [Demers et al. 1987] Bimodal Multicast[Birman et al. 1999] Directional Gossip: Gossip in a Wide Area Network[Lin and Marzullo 1999] Lightweight Probabilistic Broadcast[Eugster et al. 2003]

Thank you :)

Probabilistic Broadcast