Diffusing updates without false rumors

Diffusing updates without false rumors Presented by Alex Kogan

Byzantine environment • Uncorrupted hosts follow their protocol • Corrupted hosts can behave arbitrary • Send conflicting updates • Fake forwarded messages • Conspire and form coalitions • Stop sending messages for limited/unlimited time • A corruption may occur due to • hardware/software failure • virus/hacker attack

First shot: signatures • Every message sent is signed • Prevents malicious changes in the forwarded messages • Does not answer all problems • corrupted hosts can still send conflicting updates • Computationally expensive • keys distribution • signing every message

Outline • Motivation • Problem definition • Performance measures • Direct verification • Lower bounds • Random propagation • l-Tree propagation • Path verification • Direct diffusion • Youngest diffusion • Bundle sampling • Short-Path diffusion

Diffusion problem • Synchronous fully-connected network • n hosts (or replicas, or nodes) • α correct hosts hold an initial update u • up to t corrupted hosts, t < α • In every round, a correct host h sends up to Fout messages • Goal: cause u to be accepted by all correct hosts • Safety: No correct host accepts an update other than u • Liveness: Every correct host eventually accepts some update, with probability 1

Performance measures • Delay (diffusion time), D • Expected number of rounds until all correct hosts accept u • Fan-in, Fin • Expected max number of messages any correct host receives in any round from correct hosts • Amortized Fan-in defined respectively to multiple rounds

General framework • When h receives u: • Accepts only when got evidence on u’s veracity • what is the evidence? • if accepts, h is called active for u • so all initially updated nodes are • Forwards u to other hosts? • If done “carelessly”, may violate safety • Conservative approach: only active nodes forward updates • Liberal approach: forward update without accepting it

Direct verification (by Malkhi et.al.) • h accepts u when it receives t+1 copies of u from different sources • Conservative approach • only active nodes forward updates • Partitions the diffusion into two phases • only initially updated hosts are active • slow! • other hosts become active and start forwarding updates • exponentially fast

Gossip partners selection • Two protocols: • Random propagation • l-Tree propagation • Trade-off host load (fan-in) and diffusion time (delay)

General results For any direct verification algorithm A: • D is Ω(tlog(n/α) / Fout) • D * Fin = Ω (tn/α), for t ≥ 2logn • Fin- D-amortized fan-in • D * Fin- max # incoming messages at h in D rounds • inherent tradeoff between two measures • good delay must incur a high load, and vice versa • very discouraging • with crash-stop failures, epidemic diffusion achieves D * Fin = O (logn)

D is Ω(t log(n/α) / Fout) Proof: • mk - # times u is sent by correct hosts in first k rounds • αk - # hosts that accepted u by first k rounds • Then we have: • αk ≤ α + mk / (t + 1) • t+1 copies must reach any host to be accepted

D is Ω(t log(n/α) / Fout) Proof: • mk - # times u is sent by correct hosts in first k rounds • αk - # hosts that accepted u by first k rounds • Then we have: • αk ≤ α + mk / (t + 1) • mk+1 ≤ mk + Foutαk≤ Fout∑αj • every correct host sends at most Fout messages

D is Ω(t log(n/α) / Fout) Proof: • mk - # times u is sent by correct hosts in first k rounds • αk - # hosts that accepted u by first k rounds • Then we have: • αk ≤ α + mk / (t + 1) • mk+1 ≤ mk + Foutαk≤ Fout∑αj • Hence, αk≤α(1+Fout/(t+1))k • Comparing with n, we get that when k < (t log(n/α)) / Fout, αk < n ■

D * Fin = Ω (tn/α) Proof sketch: • In D rounds, with Pr=0.9, h receives less than 10DFinmessages • Markov’s inequality Iu: random subset of hosts that are initially active (|Iu| = ) Xh: number of hosts in Iuthat send a message to h during D rounds • Show that if 10DFinis too small,w.h.p. h does not become active, i.e., Xh ≤ t

D * Fin = Ω (tn/α) (cont.) • When D ≤ nk/20eFin, Pr(Xh≥k) ≤ 2(10eDFin/kn)k • When t ≥ 2logn Pr(Xh≥t) < O(1/n2) • For any host h, when D ≤ nt/20eFin,Pr(Xh<t) ≥ 1-O(1/n) • With Pr=0.9, at least (nt/Fin) rounds are required

Random propagation • At every round, active h selects Fout targets uniformly and randomly

l-Tree propagation • Partition all hosts into blocks of size l ≥ 4t

l-Tree propagation • Partition all hosts into blocks of size l ≥ 4t • Arrange blocks as nodes of a binary tree

l-Tree propagation • Partition all hosts into blocks of size l ≥ 4t • Arrange blocks as nodes of a binary tree • At every round, active h selects Fouttargets randomly out of a candidate set • l hosts at the root • l hosts at h’s node • 2l hosts in the children of h’s node 4l hosts in total

l-Tree propagation • Partition all hosts into blocks of size l ≥ 4t • Arrange blocks as nodes of a binary tree • At every round, active h selects Fouttargets randomly out of a candidate set • l hosts at the root • l hosts at h’s node • 2l hosts in the children of h’s node 4l hosts in total • In Random propagation, l = n

Complexity properties - Fan-in Theorem: Fin is O(nFout / l + log n) Proof: • r - root host • Root hosts have the highest Fin • Case 1: 4t ≤ l ≤ nFout/12 log n Pr(r gets message from h) ≤ Fout / (2l) • Exp(r’s Fin) ≤ n*Fout / (2l) • Pr(r’s Fin > 2n*Fout / (2l))≤ (Chernoff) • Pr(r’s Fin > 2n*Fout / (2l)) ≤ 1/n2 • w.h.p., any root host gets at most 2n*Fout / (2l) messages

Fin is O(nFout / l + logn) • Case 2: nFout/12 logn < l Pr(r’s Fin ≥ k messages) ≤ • Pr(r’s Fin ≥ k messages) ≤ • when k = 2 * 18 logn, Pr(r’s Fin ≥ k messages) ≤ 1/n2 • w.h.p., any root host gets at most O(logn) messages ■

Fin is O(nFout / l + logn) cont. Corollary: Finin Random propagation is O(Fout + logn) Theorem: logn-amortized Finin Random propagation is O(Fout) Proof: • Pr(h gets ≥ k msgs in logn rounds) ≤ • For k =6Foutlogn, this Pr ≤ 1/n2 ■

Complexity properties - Delay Theorem: D is O((t/Fout)(l/α)(1-1/(3t)) + log(l)/Fout + (t/Fout)log(n/l)) Proof sketch: • Split the analysis into two stages: • Expected delay until all root hosts are active • Expected delay of propagating updates down the tree

Activating root hosts • Split to phases • in each phase, the number of active hosts is doubled • Count # messages required • to get # rounds, divide by # active nodes * Fout • Use coupon collector analysis • bounds # messages to be received to collect t + 1 messages from distinct sources This stage contributes most to the total delay • especially, its first phase

Propagating updates down the tree • Bound Pr(s is active after O((t + logl)/Fout) rounds | p is active) • Each leaf node has log(n/l) nodes on the path to the root • Split to meta-rounds, each of O((t + logl)/Fout) rounds • in each meta-round, another node in the path becomes active p s This stage contributes a logarithmic factor to the total delay

Direct verification Summary of results low delay high fan-in high delay low fan-in 4t-Tree Random propagation

Path verification (by Minsky and Shneider) • Liberal approach • allows forwarding u without accepting • Track the path through which u is gossiped • nodes exchange proposals <u, path> • accept only when got t+1 proposals with the same u, but disjoint paths

Example Overlapping paths Disjoint paths

Proposals management • Storing and exchanging all possible proposals is unpractical • Two sub-protocols for managing • selection protocol • chooses a single distinguished proposal at each host • sampling protocol • gathers selected proposals from a set of selected hosts • targets are selected uniformly and randomly

General scheme selection sampling if found a satisfying subset, accept

Direct diffusion Selection: ifh accepted uthendh = (u, ø) elsedh =  Sampling (given partner j): ifdj = (u, ø)thenDh = Dh {(u, [ j ])} Random propagation! Every proposal with a non-empty path is discarded

Improving performance • Non- proposals selected by random nodes should contain disjoint paths • Such proposals should change quickly • avoid “bad” distributions to persist • Shorter paths tend to have less common nodes • Try: select the proposal with the shortest path • many nodes may hold the same update with the same short path • Better: select a proposal based on its “age”

Youngest selection If h is a source: dh = (u,ø), ageh = 0 Otherwise: Initially: dh = , ageh =  Given partner j: if (agej ≤ ageh) then dh = dj::j ageh = min(ageh, agej) + 1

Simple sampling • Queue yh holds S most recent proposals If h is not a source: Initially: yh is empty Given partner j with dj  : yh.enqueue(dj::j) if ( yh > S ) thenyh.dequeue() Youngest diffusion = Youngest selection + Simple sampling

More efficient sampling • Simple sampling obtains at most one proposal per round • Hosts may share their sampled proposals • sampling may obtain multiple proposals • Which proposals to keep? • based on path length? • remaining samples are not likely to change • based on proposal’s age? • corrupted nodes may displace many legitimate proposals • based on sample’s age!

Bundle accumulation Initially: bundleh is empty Given partner j: bundleh = UpdateBundle(bundleh  bundlej::j  {(dh, 0)}) UpdateBundle(bundle) = {(d,sAge+1) | (d,sAge)bundle  sAge < SA)} controls the age of the oldest sample

Bundle accumulation cont. • Samples are collected from ≤ SA hosts • SA must be greater than t • otherwise, termination is impossible • Space complexity is (2t) • Solution: keep a queue of bundles • Requires O(t * 2SA) space • but now SA can be arbitrary small!

Bundle sampling Initially: yh is empty Given partner j: yh.enqueue(bundlej::j) if ( yh > S ) thenyh.dequeue()

Final protocol Run: Youngest selection Bundle Accumulation Bundle Sampling

p1 : (u, {(1,2), (1,3)}) p2 : (u, {(1,2), (2,3), (2,4)}) p3 : (u, {(1,3), (2,3), (3,4)}) p4 : (u, {(2,4), (3,4)}) 2 1 ‘ 4 3 Local computation time • How do we find t+1 disjoint paths? • Independent set  disjoint proposals • NP-hard problem • practical only for small values of t and number of proposals

Direct vs. Path verification * an algorithm with the same analytical bounds using much larger messages exists

Short-Path diffusion (by Malkhi et.al.) • Keep and send all proposals with path length < log(n/(t+1)) • similar to Direct diffusion with longer paths • Optimal analytical delay and delay*fan-in product - O(t + logn) • But, message and storage size is O((n/t)O(log(t+logn))) • non-exponential, but grows faster than any polynomial in n • finding disjoint paths is computationally expensive

Delay analysis outline Denote: b=t+1 bk=b/2k • gossip-circle C(p,d,r) set of correct hosts that received u originated at p over “good” paths of length up to d in r rounds Initially active bk hosts create disjoint low-diameter (up to 2logn/(bbk))gossip-circles of size n/4bk in O(b + logn/(bbk)) rounds

Delay analysis outline (cont.) • Initially active bk hosts create disjoint low-diameter (up to 2logn/(bbk))gossip-circles of size n/4bk in O(b + logn/(bbk)) rounds • Given such disjoint gossip-circles, it takes exp. 4bk rounds for a correct h to receive u from bk/2 disjoint gossip-circles • Coupon-collector (bk/2 coupons out of bk) • It takes O(b+logn) rounds to receive b disjoint-path copies of u • By induction on bk, for k=0 …logb - 1

Delay analysis outline (cont.) • For any constant c, (n-b)(1-1/c) hosts are active in exp. O(b+logn) rounds • Markov’s inequality • Choose a particular value for c, e.g., c=2 • Assuming b<n/60, (n-b)1/2 > 2/5n • If at least 2/5ncorrect hosts are active, then within exp. O(b+logn) rounds all hosts become active • Chernoff bound • The expected delay is O(b+logn)

References • “On diffusing updates in a Byzantine environment”, by D. Malkhi, Y. Mansour and M.K. Reiter, at SRDS, 1999 • “Diffusion without false rumors: on propagating updates in a Byzantine environment”, by D. Malkhi, Y. Mansour and M.K. Reiter, at Theoretical Computer Science, 2003 • “Tolerating malicious gossip”, by Y.M. Minsky, F.B. Shneider, at Distr. Computing, 2003 • “Optimal Unconditional Information Diffusion“, by D. Malkhi, E. Pavlov and Y. Sella, at SRDS, 2001

Questions? Thank you!

Diffusing updates without false rumors

Diffusing updates without false rumors

Presentation Transcript

Diffusing the Confusing

HANDLING GOSSIP AND RUMORS

Diffusing the Confusing

False Messiahs Wars and Rumors of Wars Natural Catastrophes Pestilences

Rumors and Routes

Rumors

False

Diffusing Distraction

Diffusing Computation

COMP 2903 A31 – False Reporting on the Internet and the Spread of Rumors

HANDLING GOSSIP AND RUMORS

Diffusing Aggression

Online rumors

Diffusing Capacity

Top iPhone 8 Rumors

NFL News And Rumors

Rumors and Other Fantasies

HANDLING GOSSIP AND RUMORS

HANDLING GOSSIP AND RUMORS

HANDLING GOSSIP AND RUMORS

How to garmin express updates without computer?