510 likes | 627 Views
Capacity of Agreement with Finite Link Capacity. Guanfeng Liang @ Infocom 2011 Electrical and Computer Engineering University of Illinois at Urbana-Champaign Joint work with Prof. Nitin Vaidya. Motivation. Motivation. Distributed systems are emerging
E N D
Capacity of Agreementwith Finite Link Capacity Guanfeng Liang @ Infocom 2011 Electrical and Computer Engineering University of Illinois at Urbana-Champaign Joint work with Prof. NitinVaidya
Motivation • Distributed systems are emerging • Cloud computing (e.g. Windows Azure), distributed file systems, data centers, multiplayer online games • Large number of distributed components • Distributed components need to be coordinated
Motivation • Distributed primitives • Clock synchronization • Mutual exclusion • Agreement • etc. • Large body of literature in Distributed Algorithms
Motivation A networking guy asks: “How would constraints of the network affect the performance of these primitives?” A algorithm guy replies: “……” Network-aware distributed algorithm design
Byzantine Agreement (BA): Broadcast • A sender wants to send message to n-1 receivers • Fault-free receivers must agree • Sender fault-free agree on its message • Any ≤ f nodes may fail
Why agreement? • Distributed systems are failure-prone • Non-malicious: crashed nodes, buggy codes • Malicious: attacker tries to crack the system • Robust system against faults: Important to maintain consistent state
Impact of the Network • How does capacity (rate region) of the network affect agreement performance? • How to quantify the impact?
Rate Region • Defines the way “links” may share channel • Interference posed to each other determines whether a set of transmissions can succeed together
“Ethernet” Rate Region S Rate S2 1 2 Rate S1 Rate S1 +Rate S2 ≤C
Point-to-Point Network Rate Region Rate ij≤ Capacity ij S Each directed linkindependent of other links 1 2
Capacity of Agreement • b(t) = # bits agreed in [0,t] • Capacity of agreement: supremum of achievable throughput for a given rate region
Upper Bound of Capacity in P2P Networks • NC1: C ≤ min-cut(S,X | freceivers removed) S 3 1 2
Upper Bound of Capacity in P2P Networks • NC2:C ≤ In(X | f nodes removed) S 3 1 2
Upper Bound of Capacity in P2P Networks • NC1: C ≤ min-cut(S,X | freceivers removed) • NC2: C ≤ In(X | f nodes removed) S ε 3 1 2 Upper bound = 1+ε
Classic Solution for Broadcast value v S v v v 3 1 Faulty peer 2
Classic Solution for Broadcast value v S v v v 3 1 v v 2
Classic Solution for Broadcast value v S v v v 3 1 v v 2 ? ?
Classic Solution for Broadcast value v S v v v 3 1 v v 2 v ? ? v
Classic Solution for Broadcast value v S v v v 3 1 [v,v,?] v v 2 v ? [v,v,?] ? v
Classic Solution for Broadcast value v S v v v 3 1 v v v 2 Majority vote resultsin correctresult atgood receiver v ? v ? v
Classic Solution for Broadcast S Faulty source v x w 3 1 2
Classic Solution for Broadcast S v x w 3 1 w w 2
Classic Solution for Broadcast S v x w 3 1 w w 2 x v v x
Classic Solution for Broadcast S v x w 3 1 [v,w,x] w w [v,w,x] 2 x v [v,w,x] v x
Classic Solution for Broadcast S v x w 3 1 [v,w,x] w w [v,w,x] 2 x v [v,w,x] Vote resultidentical atgood receivers v x
Classic Solution in P2P Networks • Whole message is sent on every link S Throughput ≤ slowest link ε 3 1 Throughput≤ ε but Upper bound = 1+ε 2
Improving Broadcast Throughput • Observation: classic solution is in fact an “error correction code” • “Error detection codes” are more efficient
Error Detection Code Two-bit value a, b S a a+b b 3 1 2
Error Detection Code Two-bit value a, b S a a+b b 3 1 b [a,b,a+b] b [a,b,a+b] 2 a+b a [a,b,a+b] a a+b
Error Detection Code Two-bit value a, b S a a+b b 3 1 b [a,b,a+b] b [a,b,a+b] 2 a+b a [a,b,a+b] Parity check passes at all nodes Agree on (a,b) a a+b
Error Detection Code Two-bit value a, b S a a+b b 3 1 b [?,b,a+b] b 2 a+b ? [?,b,a+b] Parity checkfails at a node if 1 misbehaves ? a+b
Error Detection Code Two-bit value a, b Only detection is not what we want S a z b 3 1 b [a,b,z] b [a,b,z] 2 z a [a,b,z] Check fails at a good node if S sends bad codeword (a,b,z) a z
Modification • Agree on small pieces of data in each “round” • If X misbehaves with Y in a given round, avoid using XY link in the next round (for next piece of data) • Repeat
Algorithm Structure • Fast round (as in the example)
Algorithm Structure • Fast round (as in the example) S a a+b b 3 1 b [a,b,a+b] b [a,b,a+b] 2 a+b a [a,b,a+b] a a+b
Algorithm Structure • Fast round (as in the example) • Fast round … • Fast round in which failure is detected • Expensive round to learn new info about failure
Algorithm Structure • Fast round (as in the example) • Fast round … • Fast round in which failure is detected • Expensive round to learn new info about failure • Fast round • Fast round … • Expensive round to learn new info about failure.
Algorithm Structure • Fast round (as in the example) • Fast round … • Fast round in which failure is detected • Expensive round to learn new info about failure • Fast round • Fast round … • Expensive round to learn new info about failure. After a small number of expensive rounds, failures completely identified
Algorithm Structure • Fast round (as in the example) • Fast round … • Fast round in which failure is detected • Expensive round to learn new info about failure • Fast round • Fast round … • Expensive round to learn new info about failure. • Only fast rounds hereon After a small number of rounds failures identified
Algorithm “Analysis” • Many fast rounds • Few expensive rounds • When averaged over time,the cost of expensive rounds is negligible • Average usage of link capacity depends only on the fast round, which is very efficient Achieves capacity for 4-node networks, and symmetric networks
Open Problems • Capacity of agreement for general rate regions
Open Problems • Capacity of agreement for general rate regions • Even the multicast problem with Byzantine nodes is unsolved - For multicast, sources fault-free
Rich Problem Space • Wireless channel allows overhearing • Transmit to 2 at highrate, or low rate ? - Low rate allows reception at 1 1 2 S 3
Rich Problem Space • Similar questions relevant for anymulti-party computation Distributed Computation Communication Multi-party computing under Communication Constraints
How many bits needed? • N nodes each has a k-bit input • Check if all inputs are identical • At least 1 node “detects” if not identical 2 Intuitive guess: (N-1)k bit Is it the best we can do? 1 3