760 likes | 909 Views
HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems. James Cowling 1 , Daniel Myers 1 , Barbara Liskov 1 Rodrigo Rodrigues 2 , Liuba Shrira 3 1 MIT CSAIL 2 INESC-ID and Instituto Superior T é cnico 3 Brandeis University. Byzantine Fault Tolerance.
E N D
HQ Replication:Efficient Quorum Agreement forReliable Distributed Systems James Cowling1, Daniel Myers1, Barbara Liskov1 Rodrigo Rodrigues2, Liuba Shrira3 1MIT CSAIL 2INESC-ID and Instituto Superior Técnico 3Brandeis University
Byzantine Fault Tolerance • Reliable client-server distributed systems • Server replicated across group of replica machines • General operations • Bounded number f of Byzantine replicas • Must ensure correct system state • Consistent ordering of client operations
State of the Art • Approaches: • State Machine Replication – BFT • 3f+1 replicas • Byzantine Quorums – Q/U • 5f+1 replicas • Increased performance • Degradation when writes contend
Contributions • Low overhead Byzantine Fault Tolerance • Performance of Byzantine Quorums without 5f+1 replicas or contention degradation • Hybrid Quorum scheme for Byzantine Fault Tolerance • Quorum approach in normal-case • Use Byzantine agreement to resolve write contention
Outline • Current Approaches • HQ Replication • BFT Improvements • Performance Evaluation • Conclusions
Request Pre-Prepare Prepare Commit Reply Client Primary Replica 2 Replica 3 Replica 4 State Machine Replication • BFT - Castro and Liskov TOCS ’02 • Operations ordered by primary • Agreed upon by replicas
Update Reply Client Replica 1 Replica 2 Replica 3 Replica 4 Replica 5 Replica 6 Byzantine Quorums • Q/U - Abd-El-Malek et al. SOSP ’05 • Client controlled protocol • Replicas order operations independently • Optimistic • Best case one-phase protocol • Worst case unbounded • Randomized backoff
Advantages/Disadvantages Q/U • Good • Best-case performance • One-phase write • Low replica load • Bad • 5f+1 replicas • Degraded performance when writes contend BFT • Good • 3f+1 replicas • Bounded number of phases • Bad • Higher latency • Quadratic communication
HQ Replication • 3f+1 replicas • Supports general operations • No all-to-all communication in normal-case • BFT used to resolve contention
Write1 Write1 OK Write2 Write2 OK Client Replica 1 Replica 2 Replica 3 Replica 4 HQ Replication • One-phase read • Two-phase write
High-level Write Protocol • Two-phase write protocol • Phase 1: • Client obtains timestamp grant from each replica • Phase 2: • Client forms certificate from 2f+1 matching grants • Sends to replicas to complete write
Grants • Promise to execute operation at given sequence number • Assuming agreement from quorum • Grant • Client ID • Object ID • Hash over requested operation • Sequence Number (timestamp) • Replica signature
Certificates • Certificate • Quorum (2f+1) matching grants • Proves quorum of replicas agree to ordering of operation • Uniquely identify client, operation and sequential ordering • Existence of certificate precludes existence of conflicting certificate
Replica State • Multiple independent objects • State per-object • Certificate supporting most recent write • Operation status • Active • Write in progress, outstanding grant • Quiescent • No current write operation
Write Phase 1 • Client sends write request to replicas • If quiescent, replica assigns new grant to client • If active, replica sends currently outstanding grant • Several Possibilities • All grants match • Grants for different client • Grants conflict
State: Quiescent State: Quiescent State: Quiescent replica 1 Client: ? Client: ? Client: ? Grant Grant Grant Seq No: 0 Seq No: 0 Seq No: 0 Operation: ? Operation: ? Operation: ? replica 2 client 1 replica 3 Isolated Write
State: Quiescent State: Quiescent State: Quiescent replica 1 Client: ? Client: ? Client: ? Grant Grant Grant Seq No: 0 Seq No: 0 Seq No: 0 Operation: ? Operation: ? Operation: ? replica 2 client 1 replica 3 Isolated Write Write A Write A Write A
State: Active State: Active State: Active replica 1 Client: 1 Client: 1 Client: 1 Grant Grant Grant Seq No: 1 Seq No: 1 Seq No: 1 Operation: A Operation: A Operation: A replica 2 client 1 replica 3 Isolated Write Write A Write A Write A
State: Active State: Active State: Active replica 1 Client: 1 Client: 1 Client: 1 Grant Grant Grant Seq No: 1 Seq No: 1 Seq No: 1 Operation: A Operation: A Operation: A replica 2 client 1 replica 3 Isolated Write Grant <1,1,A>1 Grant <1,1,A>2 Grant <1,1,A>3
State: Active State: Active State: Active replica 1 Client: 1 Client: 1 Client: 1 Grant Grant Grant Seq No: 1 Seq No: 1 Seq No: 1 Operation: A Operation: A Operation: A replica 2 client 1 replica 3 Isolated Write Grant <1,1,A>1 Grant <1,1,A>2 Grant <1,1,A>3 Matching grants: Phase 2 write
State: Active State: Active State: Active replica 1 Client: 1 Client: 1 Client: 1 Grant Grant Grant Seq No: 1 Seq No: 1 Seq No: 1 Operation: A Operation: A Operation: A replica 2 client 1 replica 3 Isolated Write Cert {G1,G2,G3} Cert {G1,G2,G3} Cert {G1,G2,G3} Matching grants: Phase 2 write
replica 1 replica 2 client 1 replica 3 Isolated Write execute A Cert {G1,G2,G3} Cert {G1,G2,G3} execute A Cert {G1,G2,G3} execute A
State: Quiescent State: Quiescent State: Quiescent replica 1 Client: 1 Client: 1 Client: 1 Grant Grant Grant Seq No: 1 Seq No: 1 Seq No: 1 Operation: A Operation: A Operation: A replica 2 client 1 replica 3 Isolated Write Result A Result A Result A
State: Quiescent State: Quiescent State: Quiescent replica 1 Client: 1 Client: 1 Client: 1 Grant Grant Grant Seq No: 1 Seq No: 1 Seq No: 1 Operation: A Operation: A Operation: A replica 2 client 1 replica 3 Isolated Write Result A result Result A Result A Write Complete
State: Quiescent State: Quiescent State: Quiescent replica 1 Client: ? Client: ? Client: ? Grant Grant Grant Seq No: 0 Seq No: 0 Seq No: 0 Operation: ? Operation: ? Operation: ? replica 2 client 2 client 1 replica 3 Incomplete Write
State: Quiescent State: Quiescent State: Quiescent replica 1 Client: ? Client: ? Client: ? Grant Grant Grant Seq No: 0 Seq No: 0 Seq No: 0 Operation: ? Operation: ? Operation: ? replica 2 client 2 client 1 replica 3 Incomplete Write Write A Write A Write A
State: Active State: Active State: Active replica 1 Client: 1 Client: 1 Client: 1 Grant Grant Grant Seq No: 1 Seq No: 1 Seq No: 1 Operation: A Operation: A Operation: A replica 2 client 2 client 1 replica 3 Incomplete Write Write A Write A Write A
State: Active State: Active State: Active replica 1 Client: 1 Client: 1 Client: 1 Grant Grant Grant Seq No: 1 Seq No: 1 Seq No: 1 Operation: A Operation: A Operation: A replica 2 client 2 client 1 replica 3 Incomplete Write Grant <1,1,A>1 Grant <1,1,A>2 Grant <1,1,A>3
State: Active State: Active State: Active replica 1 Client: 1 Client: 1 Client: 1 Grant Grant Grant Seq No: 1 Seq No: 1 Seq No: 1 Operation: A Operation: A Operation: A replica 2 client 2 client 1 replica 3 Incomplete Write Grant <1,1,A>1 Grant <1,1,A>2 Grant <1,1,A>3 Client 1 slow or failed
State: Active State: Active State: Active replica 1 Client: 1 Client: 1 Client: 1 Grant Grant Grant Seq No: 1 Seq No: 1 Seq No: 1 Operation: A Operation: A Operation: A replica 2 client 2 client 1 replica 3 Incomplete Write Write B Write B Write B
State: Active State: Active State: Active replica 1 Client: 1 Client: 1 Client: 1 Grant Grant Grant Seq No: 1 Seq No: 1 Seq No: 1 Operation: A Operation: A Operation: A replica 2 client 2 client 1 replica 3 Incomplete Write Grant<1,1,A>1 Grant <1,1,A>2 Grant <1,1,A>3 Replicas active: Return current grant
State: Active State: Active State: Active replica 1 Client: 1 Client: 1 Client: 1 Grant Grant Grant Seq No: 1 Seq No: 1 Seq No: 1 Operation: A Operation: A Operation: A replica 2 client 2 client 1 replica 3 Incomplete Write Grant<1,1,A>1 Grant <1,1,A>2 Grant <1,1,A>3 Grants for different client: Perform Writeback
State: Active State: Active State: Active replica 1 Client: 1 Client: 1 Client: 1 Grant Grant Grant Seq No: 1 Seq No: 1 Seq No: 1 Operation: A Operation: A Operation: A replica 2 client 2 client 1 replica 3 Incomplete Write Cert {G1,G2,G3}, Write B Cert {G1,G2,G3}, Write B Cert {G1,G2,G3}, Write B Grants for different client: Perform Writeback
replica 1 replica 2 client 1 client 2 replica 3 Incomplete Write execute A Cert {G1,G2,G3}, Write B execute A Cert {G1,G2,G3}, Write B Cert {G1,G2,G3}, Write B execute A
State: Quiescent State: Quiescent State: Quiescent replica 1 Client: 1 Client: 1 Client: 1 Grant Grant Grant Seq No: 1 Seq No: 1 Seq No: 1 Operation: A Operation: A Operation: A replica 2 client 2 client 1 replica 3 Incomplete Write Cert {G1,G2,G3}, Write B Cert {G1,G2,G3}, Write B Cert {G1,G2,G3}, Write B
State: Active State: Active State: Active replica 1 Client: 2 Client: 2 Client: 2 Grant Grant Grant Seq No: 2 Seq No: 2 Seq No: 2 Operation: B Operation: B Operation: B replica 2 client 2 client 1 replica 3 Incomplete Write Grant<2,2,B>1 Grant <2,2,B>2 Grant <2,2,B>3
State: Active State: Active State: Active replica 1 Client: 2 Client: 2 Client: 2 Grant Grant Grant Seq No: 2 Seq No: 2 Seq No: 2 Operation: B Operation: B Operation: B replica 2 client 2 client 1 replica 3 Incomplete Write Grant<2,2,B>1 Grant <2,2,B>2 Grant <2,2,B>3 Matching grants: Phase 2 write
State: Quiescent State: Quiescent State: Quiescent replica 1 Client: ? Client: ? Client: ? Grant Grant Grant Seq No: 0 Seq No: 0 Seq No: 0 Operation: ? Operation: ? Operation: ? replica 2 client 2 client 1 replica 3 Write Contention Write A
State: Quiescent State: Active State: Quiescent replica 1 Client: 1 Client: ? Client: ? Grant Grant Grant Seq No: 0 Seq No: 0 Seq No: 1 Operation: A Operation: ? Operation: ? replica 2 client 2 client 1 replica 3 Write Contention Write A Write A
State: Quiescent State: Active State: Active replica 1 Client: 1 Client: 1 Client: ? Grant Grant Grant Seq No: 0 Seq No: 1 Seq No: 1 Operation: A Operation: A Operation: ? replica 2 client 2 client 1 replica 3 Write Contention Write A Write A Write A Write B
State: Active State: Active State: Active replica 1 Client: 2 Client: 1 Client: 1 Grant Grant Grant Seq No: 1 Seq No: 1 Seq No: 1 Operation: A Operation: A Operation: B replica 2 client 2 client 1 replica 3 Write Contention Write A Write A Write A Write B
State: Active State: Active State: Active replica 1 Client: 2 Client: 1 Client: 1 Grant Grant Grant Seq No: 1 Seq No: 1 Seq No: 1 Operation: A Operation: A Operation: B replica 2 client 2 client 1 replica 3 Write Contention Grant <1,1,A>1 Grant <1,1,A>2 Grant <2,1,B>3
State: Active State: Active State: Active replica 1 Client: 1 Client: 2 Client: 1 Grant Grant Grant Seq No: 1 Seq No: 1 Seq No: 1 Operation: A Operation: A Operation: B replica 2 client 1 client 2 replica 3 Write Contention Grant <1,1,A>1 Grant <1,1,A>2 Grant <2,1,B>3 Conflicting grants: Request resolution
State: Active State: Active State: Active replica 1 Client: 1 Client: 2 Client: 1 Grant Grant Grant Seq No: 1 Seq No: 1 Seq No: 1 Operation: A Operation: A Operation: B replica 2 client 1 client 2 replica 3 Write Contention Resolve Request Cert {G1,G2,G3} Cert {G1,G2,G3} Cert {G1,G2,G3} Conflicting grants: Request resolution
State: Active State: Active State: Active replica 1 Client: 1 Client: 2 Client: 1 Grant Grant Grant Seq No: 1 Seq No: 1 Seq No: 1 Operation: A Operation: A Operation: B replica 2 Contention Resolution client 2 client 1 replica 3 Write Contention Resolve Request Cert {G1,G2,G3} Cert {G1,G2,G3} Cert {G1,G2,G3}
replica 1 replica 2 client 1 client 2 replica 3 Write Contention Resolve Request execute A Cert {G1,G2,G3} Cert {G1,G2,G3} execute A Cert {G1,G2,G3} execute A
replica 1 replica 2 client 1 client 2 replica 3 Write Contention Resolve Request execute B Cert {G1,G2,G3} Cert {G1,G2,G3} execute B Cert {G1,G2,G3} execute B