1 / 39

Systems Research

Systems Research. Barbara Liskov October 2007. Replication. Goal: provide reliability and availability by storing information at several nodes. Single Server. Server. Clients. Single Server. X. Server. Clients. Replicated Servers. X. Servers. Clients. Replication Issues. Semantics

brendy
Download Presentation

Systems Research

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Systems Research Barbara Liskov October 2007

  2. Replication • Goal: provide reliability and availability by storing information at several nodes

  3. Single Server Server Clients

  4. Single Server X Server Clients

  5. Replicated Servers X Servers Clients

  6. Replication Issues • Semantics • What is being replicated • Failure assumptions

  7. Issue 1: Semantics One-copy consistency Or weaker Servers Clients

  8. Issue 2: Type of Operations Only reads and writes General operations acct.deposit($$); acct.withdraw($$$);

  9. Replication protocols • Data replication • Quorums and voting • Operations • State machine replication • System performs a sequence of operations

  10. Issue 3: Failure Assumptions • Network is asynchronous • Eventual delivery • Network is malicious • Corruption • Replay • Spoofing • Handled via cryptography • Nodes are failstop or Byzantine

  11. Failstop Failures • Nodes fail by crashing • A machine is either working correctly or it is doing nothing! • The assumption made in the 1980s

  12. Failstop failures • Requires 2f+1 replicas • Operations must intersect at at least one replica • In general want availability for both reads and writes: f+1 nodes is sufficient • Read and write quorums

  13. Quorums State: State: State: … … … Servers X write A write A write A Clients

  14. Quorums State: State: State: … … … A A X Servers Clients

  15. Quorums State: State: State: … … … A A X Servers X write B write B write B Clients

  16. Data Replication • R.H. Thomas, A majority consensus approach to concurrency control for multiple copy databases, ACM TODS, 1979 • D.K. Gifford, Weighted voting for replicated data, SOSP 1979 • H. Attiya, A. Bar-Noy, and D. Dolev, Sharing memory robustly in message-passing systems, JACM , Jan. 1995

  17. Quorum Consensus • Each data item has a version number • A sequence of values • write(d, val, v#) • Waits for f+1 oks • read(d) returns (val, v#) • Waits for f+1 matching v#’s • Else does a write-back

  18. State Machine Replication Replicas must execute operations in the same order Implies replicas will have the same state, assuming replicas start in the same state operations are deterministic

  19. Failstop Replication Viewstamped replication: a new primary copy method to support highly available distributed systems, B. Oki and B. Liskov, PODC 1988 Thesis, May 1988 Replication in the Harp file system, S. Ghemawat et. al, SOSP 1991 The part-time parliament, L. Lamport, TOCS 1998 Paxos made simple, L. Lamport, Nov. 2001

  20. Approach Use a primary It orders the operations Other replicas obey this order

  21. Views System moves through a sequence of views Primary runs the protocol Replicas watch the primary and do a view change if it fails

  22. Normal Case Client sends request to primary Primary sends prepare message

  23. Normal Case Client sends request to primary Primary sends prepare message Replicas receive prepare Send prepare-ok message to the primary

  24. Normal Case Client sends request to primary Primary sends prepare message to all Replicas receive prepare Send prepare-ok message to the primary Primary waits for f prepare-oks Sends response to client

  25. Normal Case • A 2-phase protocol: • Prepare; commit • Only 3 message delays

  26. Byzantine Failures • Nodes fail arbitrarily • They lie, they collude • Causes • Malicious attacks • Non-deterministic software errors

  27. Quorums 3f+1 replicas are needed to survive f failures 2f+1 replicas is a quorum Insures intersection The minimum in an asynchronous network

  28. Quorums … … … … State: State: State: State: A A A Servers X write A write A write A write A Clients

  29. Quorums … … … … State: State: State: State: A A B B B Servers X write B write B write B write B Clients

  30. BFT • M. Castro and B. Liskov, Practical Byzantine faulty tolerance and proactive recovery, ACM TOCS, 2002

  31. Strategy Primary runs the protocol in the normal case Replicas watch the primary and do a view change if it fails Key difference: replicas might lie Solution: add a pre-prepare phase

  32. Normal Case Client sends request to primary

  33. Normal Case Client sends request to primary Primary sends pre-prepare message to all

  34. Normal Case Client sends request to primary Primary sends pre-prepare message to all Why not a prepare message? Because primary might be malicious

  35. Normal Case Client sends request to primary Primary sends pre-prepare message to all Replicas check the pre-prepare and if it is ok: Send prepare messages to all

  36. Normal Case Replicas wait for 2f+1 matching prepares Send commit message to all

  37. Normal Case Replicas wait for 2f+1 matching prepares Send commit message to all Replicas wait for 2f+1 matching commits Execute operation and send result to client

  38. Follow-on Work • BASE: using abstraction to improve fault tolerance, R. Rodrigo et al, SOSP 2001 • R.Kotla and M. Dahlin, High Throughput Byzantine Fault tolerance. DSN 2004 • J. Li and D. Mazieres, Beyond one-third faulty replicas in Byzantine fault tolerant systems, NSDI 07 • Abd-El-Malek et al, Fault-scalable Byzantine fault-tolerant services, SOSP 05 • HQ replications: a hybrid quorum protocol for Byzantine Fault tolerance, OSDI 06

  39. Papers in SOSP 07 • Monday 1:30-3:30 • Zyzzyva: Speculative Byzantine fault tolerance • Tolerating Byzantine faults in database systems using commit barrier scheduling • Low-overhead Byzantine fault-tolerant storage • Attested append-only memory: making adversaries stick to their word • Tuesday: 11:00-12:00 • PeerReview: practical accountability for distributed systems

More Related