Distributed Systems Overview

Distributed Systems Overview Ali Ghodsi alig@cs.berkeley.edu

Replicated State Machine (RSM) • Distributed Systems 101 • Fault-tolerance(partial, byzantine, recovery,...) • Concurrency (ordering, asynchrony, timing,...) • Generic solution for distributed systems: Replicated State Machine approach • Represent your system with a deterministic state machine • Replicate the state machine • Feed input to all replicas in the same order

Total Order Reliable Broadcast aka Atomic Broadcast • Reliable broadcast • All or none correct nodes get the message (even if src fails) • Atomic Broadcast • Reliable broadcast that guarantees: All messages delivered in the same order • Replicated state machine trivial with atomic broadcast

Consensus? • Consensus problem • All nodes propose a value • All correct nodes must agree on one of the values • Must eventually reach a decision (availability) • Atomic Broadcast → Consensus • Broadcast proposal, Decide on first received value • Consensus → Atomic Broadcast • Unreliably broadcast message to all • 1 consensus per round: • propose set of messages seen but not delivered • Each round deliver one decided message • Atomic Broadcast equivalent to Atomic Broadcast

Consensus impossible • No deterministic 1-crash-robust consensus algorithm exists for the asynchronous model • 1-crash-robust • Up to one node may crash • Asynchronous model • No global clock • No bounded message delay • Life after impossibility of consensus? What to do?

Solving Consensus with Failure Detectors • Black box that tells us if a node has failed • Perfect failure detector • Completeness • It will eventually tell us if a node has failed • Accuracy (no lying) • It will never tell us a node has failed if it hasn’t • Perfect FD → Consensus xi = input for r:=1 to N do if r=p then forall j do send <val, xi, r> to j; decide xi if collect<val, x´, r> from r then xi = x´; end decide xi

Solving Consensus • Consensus → Perfect FD? • No. Don’t know if a node actually failed or not! • What’s the weakest FD to solve consensus? • Least assumptions on top of asynchronous model!

Enter Omega • Leader Election • Eventually every correct node trusts some correct node • Eventually no two correct nodes trust different correct nodes • Failure detection and leader election are the same • Failure detection captures failure behavior • detect failed nodes • Leader election also captures failure behavior • Detect correct nodes (a single & same for all) • Formally, leader election is an FD • Always suspects all nodes except one (leader) • Ensures some properties regarding that node

Weakest Failure Detector for Consensus • Omega the weakest failure detector for consensus • How to prove it? • Easy to implement in practice

High Level View of Paxos • Elect a single proposer using Ω • Proposer imposes its proposal to everyone • Everyone decides • Done! • Problem with Ω • Several nodes might initially be proposers (contention) • Solution is abortable consensus • Proposer attempts to enforce decision • Might abort if there is contention (safety) • Ω ensures eventually 1 proposer succeeds (liveness)

Replicated State Machine • Paxos approach(Lamport) • Client sends input to leader Paxos • Leader executes Paxos instance to agree on command • Well-understood, many papers, optimizations • View-stamp approach (Liskov) • Have one leader that writes commands to a quorum (no Paxos) • When failures happen, use Paxos to agree • Less understood (Mazieres tutorial)

Paxos Siblings • Cheap Paxos (LM’04) • Fewer messages • Directly contact a quorum (e.g. 3 nodes out of 5) • If fail to get response from 3, expand to 5 • Fast Paxos (L’06) • Reduce from 3 delays to 2 delays (delays ~ delays) • Clients optimistically write to a quorum • Requires recovery

Paxos Siblings • Gaios/SMARTER (Bolosky’11) • Make logging to disk efficient for crash-recovery • Uses pipelining and batching • Generalized Paxos (LM’05) • Commutative operations for repl. state machine

Atomic Commit • Atomic Commit • Commit IFF no failures and everyone votes commit • Else Abort • Consensus on Transaction Commit (LG’04) • One Paxos instance for every TM • Only commit if every instance said Commit

Reconfigurable Paxos • Change the set of nodes • Replace failed nodes • Add/remove new nodes (change size of quorum) • Lamport’s idea • Part of the state of state-machine: set of nodes • SMART (Eurosys’06) • Many problems (e.g. {A,B,C}->{A,B,D} and A fails) • Basic idea, run multiple Paxos instances side by side

Distributed Systems Overview

Distributed Systems Overview

Presentation Transcript

Distributed Systems

Distributed Systems Security Overview

Distributed Systems

Distributed Systems

Distributed Systems

Distributed Systems

Distributed Systems Brief Overview

Distributed Systems Lecture 1: Overview

Distributed Systems

Distributed Systems

Distributed Systems Course Distributed Multimedia Systems

Distributed Systems Course Distributed File Systems

Distributed Systems

Distributed Systems

Distributed Systems

Visualization in distributed systems. Overview.

Distributed Systems Course Distributed File Systems

Distributed Systems