1 / 15

Distributed Systems Overview

Distributed Systems Overview. Ali Ghodsi alig@cs.berkeley.edu. Replicated State Machine (RSM). Distributed Systems 101 Fault-tolerance (partial, byzantine, recovery,...) Concurrency (ordering, asynchrony, timing,...) Generic solution for distributed systems:

joyce
Download Presentation

Distributed Systems Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distributed Systems Overview Ali Ghodsi alig@cs.berkeley.edu

  2. Replicated State Machine (RSM) • Distributed Systems 101 • Fault-tolerance(partial, byzantine, recovery,...) • Concurrency (ordering, asynchrony, timing,...) • Generic solution for distributed systems: Replicated State Machine approach • Represent your system with a deterministic state machine • Replicate the state machine • Feed input to all replicas in the same order

  3. Total Order Reliable Broadcast aka Atomic Broadcast • Reliable broadcast • All or none correct nodes get the message (even if src fails) • Atomic Broadcast • Reliable broadcast that guarantees: All messages delivered in the same order • Replicated state machine trivial with atomic broadcast

  4. Consensus? • Consensus problem • All nodes propose a value • All correct nodes must agree on one of the values • Must eventually reach a decision (availability) • Atomic Broadcast → Consensus • Broadcast proposal, Decide on first received value • Consensus → Atomic Broadcast • Unreliably broadcast message to all • 1 consensus per round: • propose set of messages seen but not delivered • Each round deliver one decided message • Atomic Broadcast equivalent to Atomic Broadcast

  5. Consensus impossible • No deterministic 1-crash-robust consensus algorithm exists for the asynchronous model • 1-crash-robust • Up to one node may crash • Asynchronous model • No global clock • No bounded message delay • Life after impossibility of consensus? What to do?

  6. Solving Consensus with Failure Detectors • Black box that tells us if a node has failed • Perfect failure detector • Completeness • It will eventually tell us if a node has failed • Accuracy (no lying) • It will never tell us a node has failed if it hasn’t • Perfect FD → Consensus xi = input for r:=1 to N do if r=p then forall j do send <val, xi, r> to j; decide xi if collect<val, x´, r> from r then xi = x´; end decide xi

  7. Solving Consensus • Consensus → Perfect FD? • No. Don’t know if a node actually failed or not! • What’s the weakest FD to solve consensus? • Least assumptions on top of asynchronous model!

  8. Enter Omega • Leader Election • Eventually every correct node trusts some correct node • Eventually no two correct nodes trust different correct nodes • Failure detection and leader election are the same • Failure detection captures failure behavior • detect failed nodes • Leader election also captures failure behavior • Detect correct nodes (a single & same for all) • Formally, leader election is an FD • Always suspects all nodes except one (leader) • Ensures some properties regarding that node

  9. Weakest Failure Detector for Consensus • Omega the weakest failure detector for consensus • How to prove it? • Easy to implement in practice

  10. High Level View of Paxos • Elect a single proposer using Ω • Proposer imposes its proposal to everyone • Everyone decides • Done! • Problem with Ω • Several nodes might initially be proposers (contention) • Solution is abortable consensus • Proposer attempts to enforce decision • Might abort if there is contention (safety) • Ω ensures eventually 1 proposer succeeds (liveness)

  11. Replicated State Machine • Paxos approach(Lamport) • Client sends input to leader Paxos • Leader executes Paxos instance to agree on command • Well-understood, many papers, optimizations • View-stamp approach (Liskov) • Have one leader that writes commands to a quorum (no Paxos) • When failures happen, use Paxos to agree • Less understood (Mazieres tutorial)

  12. Paxos Siblings • Cheap Paxos (LM’04) • Fewer messages • Directly contact a quorum (e.g. 3 nodes out of 5) • If fail to get response from 3, expand to 5 • Fast Paxos (L’06) • Reduce from 3 delays to 2 delays (delays ~ delays) • Clients optimistically write to a quorum • Requires recovery

  13. Paxos Siblings • Gaios/SMARTER (Bolosky’11) • Make logging to disk efficient for crash-recovery • Uses pipelining and batching • Generalized Paxos (LM’05) • Commutative operations for repl. state machine

  14. Atomic Commit • Atomic Commit • Commit IFF no failures and everyone votes commit • Else Abort • Consensus on Transaction Commit (LG’04) • One Paxos instance for every TM • Only commit if every instance said Commit

  15. Reconfigurable Paxos • Change the set of nodes • Replace failed nodes • Add/remove new nodes (change size of quorum) • Lamport’s idea • Part of the state of state-machine: set of nodes • SMART (Eurosys’06) • Many problems (e.g. {A,B,C}->{A,B,D} and A fails) • Basic idea, run multiple Paxos instances side by side

More Related