CS525 – In Byzantium

CS525 – In Byzantium The Byzantine Generals ProblemLeslie Lamport, Robert Shostak, and Marshall PeaseACM TOPLAS 1982 Presented by Keun Soo Yim March 19, 2009 • Dr. Lamport • Byzantine • Clock Sync. • Dist. Snapshot

Byzantine Generals Problem (BGP) A.C. 330 • Goals • Consensus (same plan) btw. loyal generals • A small number of traitors cannot cause the loyals to adopt a bad plan • Do not have to identify the traitors 100K • N Generals • Some are traitors • Message passing 50K 30K 10K 20K (commander) 40K

BGP in Distributed Systems A thousand years later… • Goals • All correct nodes share the same global info. • Ensure that N corrupted nodes can not change the shared global info., and maximize N • Identification of corrupted nodes would be needed • What’s difference btw. BGP and consensus algo.? • Fail-stop vs. fail-silent violation. Design goal. • N Computers • Some misbehave • HW Fault, SW bug, Security attack, misconfiguration • Message passing

Naïve Sol. & 3-General Impossibility • Naïve solution • Each general sends its value, v(i), to all others • Majority vote using v(1), v(2), …, v(n) • Is it true that no solutions with fewer than 3m+1 generals can cope with m traitors? If so, why?

3m-General Impossibility • If there is a solution for 3m generals with m traitors, it can be reduced to a solution of 3-General problem “3m+1<=n” “3m+1>n”

Solution I – Oral Messages • Formal definition of OM(M) • Command broadcasts its value to all lieutenants • Each lieutenant acts as commander of OM(m-1) • n = 4, m = 1 • L1 and L2 both receive v,v,x. (Consensus) • L1 and L2 obey C • All lieutenants receive x,y,z • Lieutenant can identify commander is a traitor • What is communication complexity of this algorithm?

Communication Complexity • O(nm) OM(m) triggers n-1 OM(m-1) OM(m-1) triggers n-2 OM(m-2) … OM(m-k) will be called by (n-1)…(n-k) times … OM(0)

Solution II – Signed Messages • Can we cope with any number of traitors? If so, how? • Prevent traitors lie about the commander’s order • Message are signed by commander • The sign can be verified by all loyal lieutenants • When lieutenant receives no new messages,and select majority as the desired action • All loyals receive the same set of cmds eventually • If the commander is loyal, it works • What if the commanderis not loyal?

Discussion Point • Are the assumptions realistic? • Reliable communication channel • Absence of a message can be detected. (e.g., Timeouts or synchronized clocks ) • Failure of communication line cannot be distinguished from failure of nodes. • This is acceptable since we tolerate failures of m nodes. • Can we determine the origin of message?Anyone can verify authenticity of signature? • Unforgeable signatures using asymmetric cryptograph.

PeerReview: Practical accountability for distributed systems Andreas Haeberlen, Petr Kuznetsov, and Peter DruschelSOSP 2007 (Acknowledgement: Some of this presentation slide are borrowed from the original author’s one)

Practical Use Case of BGP • Distributed file systems • Many small, latency-sensitive requests (tampering with files, lost updates) • Overlay multicast • Transfers large volume of data (tampering with content, freeloading) • P2P email • Complex, large, decentralized (Denial of service by misrouting) •  Not only consensus but also identifying faulty nodes is important!

PeerReview • Providing accountability for distributed systems • Stores all I/O events as a log • Selected nodes are responsible for auditing the log • Assumptions: • System is modeled as deterministic state machines • State machines have reference implementations • Eventual communication • Signe d message 12

Fault Detection • How to recognize faults in a log? • Assumption • Node can be modeled as a deterministic state machine • To audit a node • Start from a snapshot in the log • Replay inputs to a trusted copy of the state machine • Check outputs against the log Module A State machine Module B Network Log Module A Module B Input if ≠ =? Output

Communication Algorithrm • All nodes keep a log of their inputs & outputs • Including all messages • Each node has a set of witnesses, who audit its log periodically • If the witnesses detect misbehavior, they • generate evidence • make the evidence avai-lable to other nodes • Other nodes check evi-dence, report fault A's witnesses C D E M M A M B A's log B's log

B's log H4 Recv(M) H3 Hash chain Send(Z) H2 Recv(Y) H1 H0 Send(X) Tamper-Proofing Message • What if a node modifies its log entries ? • Log entries form a hash chain • Inspired by secure histories [Maniatis02] • Signed hash is included with every message •  mi = (si, ti, ci) • hi = H(hi-1||si||ti||H(ci)) • Commitement protocol •  Sender and receviercommit to its current state Hash(log) B A ACK Hash(log)

Provable Guarantees • Completeness: Faults will be detected • Accuracy: Good nodes cannot be accused If node commits a fault and has a correct witness, then witness obtains • a proof of misbehavior (PoM), or • a challenge that the faulty node cannot answer If node is correct • there can never be a PoM, and • it can answer any challenge

Communication Overhead 100 80 60 Checking logs Avg traffic (Kbps/node) 40 Signatures and ACKs 20 Baseline traffic 0 Baseline 2 1 5 3 4 Number of witnesses

Discussion Point • How would you determine the number of witnesses in a practical system? How to select them? • PeerReview is the first, practically applicable, faulty node detection technique. Then how can we make a consensus between correct nodes in a scalable manner?

Zyzzyva: Speculative Byzantine Fault Tolerance Ramakrishna Kotla, Lorenzo Alvisi, Mike Dahlin, Allen Clement and Edmund Wong University of Texas at Austin SOSP 2007 Presented by Hui Xue, UIUC

MotivationByzantine Fault Tolerance • Why we need BFT systems? • Software systems : Valuable + Not reliable enough • Amazon S3 crashed for hours in 2008 Reason: One corrupted bit • Akami central nodes • Hardware : Cheaper now • Idea • Use more hardwareMake software systems more reliable

Motivation for Zyzzyva

Assumptions (System Model) • (Almost) asynchronous system • Multicast; unordered • Independent failures • Replica: at most f any kind of faults • Network: unreliable – can delay, duplicate, corrupt or drop messages • Sufficiently strong cryptographic techniques • All public keys known by everyone • Need bounded msg delay in rare cases (liveness)

Background:Practical Byzantine Fault Tolerance PBFT: establish order before execution Pre-Prepare Prepare Commit Reply Client Primary Replica Replica FaultyReplica OK, Req, # n! Req, # n Req, # n? What is the problem? Before execution 4 network delays Many messages

Zyzzyva: Just Do It Speculative execution: Just do it! Pre-Prepare Spec-Exe Reply Client Primary Replica Replica Replica Just do it ! Req, # n GREAT! Just do it ! Who is making the difference? Just do it !

Client Can Correct Order CASE 1 Client’s Power OrderCorrect Pre-Prepare Spec-exe Reply Client Primary Replica Replica FaultyReplica OrderCorrectNow! Just do it ! Just do it ! To This state!

Restart Req! Client Can Correct Order CASE 2 Client’s Power Pre-Prepare Spec-exe Reply Client Primary Replica Replica FaultyReplica Just do it ! Just do it !

Client Can Correct Order CASE 3 Client’s Power Pre-Prepare Spec-exe Reply Client Primary Replica Replica Replica Just do it ! Change Primary! Just do it !

Design of Protocol • Other Sub protocols: • Fill hole Sequence # received: N+4 Sequence # expected: N+1 < N+4 (hole in between) Send <FILL-HOLE> to 1. Primary 2. Slow primary, then all replicas

Optimizations • Separating agreement from execution • Batching requests • Caching out of order requests • Read only operations: 2f+1 consistent is enough • Single full response

Performance: Throughput

Performance: Latency

Conclusion • Clever Observation: • We can execute before the order is established, hoping we are right. • Pros • Practical, High throughput + low latency • Cons • BFT suffer from deterministic bugs • Malicious behaviors may affect performance

Questions • Why Zyzzyva is fast? • What is the main difference between Zyzzyva and previous BFT papers? • What does “zyzzyva” mean? • Do you buy the idea of BFT at all? • Name some examples of BFT in real applications.

Thank you! • This is the end of Zyzzyva • Questions?

CS525 – In Byzantium

CS525 – In Byzantium

Presentation Transcript

Islam and Byzantium

Byzantium

Byzantium

Byzantium Chapter 12

Byzantium

CS525 – In Byzantium

CS525: Special Topics in DBs Large-Scale Data Management

CS525 : Big Data Analytics

Rome 2: Byzantium

CS525: Big Data Analytics

Byzantium 400ce-1476 ce

From Rome to Byzantium

BYZANTIUM

Byzantium

Sailing to Byzantium

Sailing to Byzantium

Lecture 15 Later Byzantium

CS525 : Big Data Analytics

Byzantium and the World

Byzantium

Byzantium the New Rome

Life in Byzantium