380 likes | 804 Views
Byzantine fault tolerance. Jinyang Li With PBFT slides from Liskov. What we’ve learnt so far: tolerate fail-stop failures. Traditional RSM tolerates benign failures Node crashes Network partitions A RSM w/ 2f+1 replicas can tolerate f simultaneous crashes. Req-x. Req-y. Reply-x. Reply-y.
E N D
Byzantine fault tolerance Jinyang Li With PBFT slides from Liskov
What we’ve learnt so far:tolerate fail-stop failures • Traditional RSM tolerates benign failures • Node crashes • Network partitions • A RSM w/ 2f+1 replicas can tolerate f simultaneous crashes
Req-x Req-y Reply-x Reply-y vid=0, seqno=1,reply vid=0, seqno=2,reply vid=0, seqno=2,req-y vid=0, seqno=1,req-x A traditional RSM client N0: primary N1: backup N2: backup What must N2 check before executing <vid=0, seqno=1, req>?
A reconfigurable RSM (lab 6) client Req-x Req-x N0: primary vid=1, seqno=2,req-x N1: backup vid=0, seqno=1,req-x Prepare vid=1, myh=N0:1 decide vid=1, N0:1, {N0,N1}, lastseq=1 Accept vid=1, N0:1, {N0,N1}, lastseq=1 N2: backup
Byzantine fault tolerance • Nodes fail arbitrarily • they lie • they collude • Causes • Malicious attacks • Software errors • Seminal work is PBFT • Practical Byzantine Fault Tolerance, M. Castro and B. Liskov, SOSP 1999
What does PBFT achieve? • Achieve sequential consistency (linearizability) if … • Tolerate f faults in a 3f+1-replica RSM • What does that mean in a practical sense?
Practical attacks that PBFT prevents (or not) • Prevent consistency attacks • E.g. a bad node fools clients into accepting at a stale bank balance • Protection is achieved only when <= f nodes fail • With enough bad nodes, attacker can fool clients into accept arbitrary bank balance, not just a stale one. • If an attacker hacks into one server, how likely is he to hack into the rest? • Does not prevent attacks like: • Turn a machine into a botnet node • Steal SSNs from servers
Why doesn’t traditional RSM work with Byzantine nodes? • Cannot rely on the primary to assign seqno • Malicious primary can assign the same seqno to different requests! • Cannot use Paxos for view change • Paxos uses a majority accept-quorum to tolerate f benign faults out of 2f+1 nodes • Does the intersection of two quorums always contain one honest node? • Bad node tells different things to different quorums! • E.g. tell N1 accept=val1 and tell N2 accept=val2
Paxos under Byzantine faults Prepare vid=1, myn=N0:1 OK val=null N2 N0 N1 nh=N0:1 nh=N0:1 Prepare vid=1, myn=N0:1 OK val=null
Paxos under Byzantine faults accept vid=1, myn=N0:1, val=xyz OK N2 X N0 N1 N0 decides on Vid1=xyz nh=N0:1 nh=N0:1
Paxos under Byzantine faults prepare vid=1, myn=N1:1, val=abc OK val=null N2 X N0 N1 N0 decides on Vid1=xyz nh=N0:1 nh=N0:1
Agreement conflict! Paxos under Byzantine faults accept vid=1, myn=N1:1, val=abc OK N2 X N0 N1 N0 decides on Vid1=xyz nh=N0:1 nh=N1:1 N1 decides on Vid1=abc
PBFT main ideas • Static configuration (same 3f+1 nodes) • To deal with malicious primary • Use a 3-phase protocol to agree on sequence number • To deal with loss of agreement • Use a bigger quorum (2f+1 out of 3f+1 nodes) • Need to authenticate communications
BFT requires a 2f+1 quorum out of 3f+1 nodes … … … … 1. State: 2. State: 3. State: 4. State: A A A Servers X write A write A write A write A Clients For liveness, the quorum size must be at most N - f
BFT Quorums … … … … 1. State: 2. State: 3. State: 4. State: A A B B B Servers X write B write B write B write B Clients For correctness, any two quorums must intersect at least one honest node: (N-f) + (N-f) - N >= f+1 N >= 3f+1
PBFT Strategy Primary runs the protocol in the normal case Replicas watch the primary and do a view change if it fails
Replica state A replica id i (between 0 and N-1) Replica 0, replica 1, … A view number v#, initially 0 Primary is the replica with id i = v# mod N A log of <op, seq#, status> entries Status = pre-prepared or prepared or committed
Normal Case Client sends request to primary or to all
Normal Case Primary sends pre-prepare message to all Pre-prepare contains <v#,seq#,op> Records operation in log as pre-prepared Keep in mind that primary might be malicious Send different seq# for the same op to different replicas Use a duplicate seq# for op
Normal Case Replicas check the pre-prepare and if it is ok: Record operation in log as pre-prepared Send prepare messages to all Prepare contains <i,v#,seq#,op> All to all communication
Normal Case Replicas wait for 2f+1 matching prepares Record operation in log as prepared Send commit message to all Commit contains <i,v#,seq#,op> Trust the group, not the individuals
Normal Case • Replicas wait for 2f+1 matching commits • Record operation in log as committed • Execute the operation • Send result to the client
Normal Case • Client waits for f+1 matching replies
Request Pre-Prepare Prepare Commit Reply Client Primary Replica 2 Replica 3 Replica 4 BFT
View Change Replicas watch the primary Request a view change Commit point: when 2f+1 replicas have prepared
View Change Replicas watch the primary Request a view change send a do-viewchange request to all new primary requires 2f+1 requests sends new-view with this certificate Rest is similar
Additional Issues State transfer Checkpoints (garbage collection of the log) Selection of the primary Timing of view changes
Possible improvements • Lower latency for writes (4 messages) • Replicas respond at prepare • Client waits for 2f+1 matching responses • Fast reads (one round trip) • Client sends to all; they respond immediately • Client waits for 2f+1 matching responses
BFT Performance Table 2: Andrew 100: elapsed time in seconds M. Castro and B. Liskov, Proactive Recovery in a Byzantine-Fault-Tolerant System, OSDI 2000
PBFT inspires much follow-on work • BASE: Using abstraction to improve fault tolerance, R. Rodrigo et al, SOSP 2001 • R.Kotla and M. Dahlin, High Throughput Byzantine Fault tolerance. DSN 2004 • J. Li and D. Mazieres, Beyond one-third faulty replicas in Byzantine fault tolerant systems, NSDI 07 • Abd-El-Malek et al, Fault-scalable Byzantine fault-tolerant services, SOSP 05 • J. Cowling et al, HQ replication: a hybrid quorum protocol for Byzantine Fault tolerance, OSDI 06 • Zyzzyva: Speculative Byzantine fault tolerance SOSP 07 • Tolerating Byzantine faults in database systems using commit barrier scheduling SOSP 07 • Low-overhead Byzantine fault-tolerant storage SOSP 07 • Attested append-only memory: making adversaries stick to their word SOSP 07
A2M’s main insights Main worry in PBFT is that malicious nodes lie differently to different replicas Decide on Seq1=req-x Decide on Seq1=req-y
A2M’s proposal • Introduce a trusted abstraction: attested append-only-memory • A2M properties: • Trusted (Attacker can corrupt the RSM implementation, but not A2M itself) • Prevent malicious nodes from making different lies to different replicas
A2M’s abstraction • A2M implements a trusted log • Append: append a value to log • Lookup: lookup the value at position i • End: lookup the value at the end of log • Truncate: garbage collection old values • Advance: skip positions in log
What A2M achieves • Smaller quorum size for BFT • Achieve correctness & liveness if <= f Byzantine faults with 2f+1 nodes Or • Achieve correctness if <= 2f nodes fail. Achieve liveness if <=f nodes fail