130 likes | 244 Views
Fault-Tolerant. Distributed Computing:. Atomic Broadcast. Outline. Why distributed computing? Atomic Broadcast The atom system Relevance for e-textiles What’s next? Q&A. Why Distributed Computing?. Spread and balance the computational weight of applications Solve bigger problems
E N D
Fault-Tolerant Distributed Computing: Atomic Broadcast
Outline • Why distributed computing? • Atomic Broadcast • The atom system • Relevance for e-textiles • What’s next? • Q&A
Why Distributed Computing? • Spread and balance the computational weight of applications • Solve bigger problems • Deal with problems locally instead of centralizing all the data
Example • Space filtering vs. raw consensus • Acoustic Beam Forming: master collects information from slaves and decides according to the relevance of data • Consensus: no master, all processes decide upon one common value
Atomic Broadcast: Definition (1) • Atomic Broadcast = the same set of messages is delivered by all the processes in the same order • Consensus = all processes decide upon one common value among those proposed
Atomic Broadcast: Definition (2) • Validity: If a correct process broadcasts a message m it will eventually receive it • Uniform agreement: If a process delivers a message m then every correct process will deliver it • Uniform integrity: Every message m is delivered at most once and only if it was reliably broadcasted by sender(m) • Total order: If 2 correct processes p and q deliver 2 messages m and m’ then p delivers m before m’ iff q delivers m before m’
Atomic Broadcast: Bad News • Impossibly to achieve in a totally asynchronous system [Fisher, Lynch, Patterson 85]
Atomic Broadcast: Good News • Can be done using unreliable failure detectors • Based on a Consensus algorithm described in [Chandra, Toueg 96]
Atom • Open source Atomic Broadcast system
Producer Atom A-broadcast AB task1 FD suspect transmission do_Consensus R-broadcast start AB task 3 One_run start do_decide cancel AB task 2 RB A-deliver Consumer FD trust
Relevance to E-textiles • Synchronization of data • Coordination of decisions and actions • Light-weight process • Buffer sizes can be predicted
What’s Next? • Scalability is a problem for classic fault-tolerant distributed algorithms • Bimodal Multicast[Ken Birman, Mark Hayden, Oznur Ozkasap, Zhen Xiao, Mihai Budiu, Yaron Minsky – 1998] • Gossip protocol • Relaxes the “strong” reliability guarantees replacing them with probabilistic guarantees • Converges to “strong” reliability in the absence of failures • Scalable with steady throughput