280 likes | 427 Views
High Throughput Byzantine Fault Tolerance. Ramakrishna Kotla, Mike Dahlin Laboratory for Advanced Systems Research, The University of Texas at Austin. Summary of the talk . High throughput is achievable along with Byzantine fault tolerance Contributions High Throughput BFT Architecture
E N D
High Throughput Byzantine Fault Tolerance Ramakrishna Kotla, Mike Dahlin Laboratory for Advanced Systems Research, The University of Texas at Austin
Summary of the talk • High throughput is achievable along with Byzantine fault tolerance • Contributions • High Throughput BFT Architecture • CBASE : Generic Prototype • CBASE-FS : High throughput replicated NFS Department of Computer Sciences, UT Austin
Outline • Overview • Architecture • Implementation • Evaluation • Conclusion Department of Computer Sciences, UT Austin
Motivation • Large scale Internet services • High Availability : 24 X 7 service • High Reliability : Correctness • High Security : Data integrity/Confidentiality • High Throughput : System load • Challenges : Byzantine failures • Malicious attacks • http://www.cert.org • Software and operator errors • ROC@USITS03 • Network and hardware failures Department of Computer Sciences, UT Austin
BFT State Machine Replication Department of Computer Sciences, UT Austin
Server Replicas Execution Execution Execution Execution Agreement Agreement Agreement Agreement Clients BFT state machine replication • Byzantine Fault Tolerance Protocol • Tolerates f Byzantine server failures using 3f+1 replicas • Agreement : Order requests from clients • Execution stage : Execute requests • Provide high availability, reliability and security • PBFT, Farsite, Oceanstore [OSDI99, OSDI01, SOSP01, SOSP03] Department of Computer Sciences, UT Austin
BFT : Tradeoff throughput for fault tolerance ? Department of Computer Sciences, UT Austin
Traditional BFT : Limitations • Fail to provide high throughput • Does not scale with hardware resources and application parallelism • Reason • Uses Generalized State Machine Replication • Correctness conditions: • Agreement : Every non-faulty state machine replica receives every request • Order : Every non-faulty state machine replica processes the requests in the same relative order • BFT State machine replication : • Execute requests sequentially to ensure order Department of Computer Sciences, UT Austin
High Throughput BFT : Idea • Modify Order without compromising consistency/safety • Relaxed order : Every non-faulty replica executes dependent requests in the same relative order • Dependent requests : Two requests are dependent if read set or write set of one intersects with write set of the other. • Requests that are not dependent can be concurrently executed • Exploit application parallelism to provide high throughput • Commercial applications like web server, file systems, databases have inherent data parallelism Department of Computer Sciences, UT Austin
Outline • Overview • Architecture • Implementation • Evaluation • Conclusion Department of Computer Sciences, UT Austin
HT BFT : Architecture • Goals : • Generic : Generic interface that exposes application parallelism • Extensible : Easily extensible to support any application • Modular : Support different fault models easily • Reuse : Reuse existing agreement protocols Server Replicas Execution Execution Execution Execution Parallelizer Parallelizer Parallelizer Parallelizer Agreement Agreement Agreement Agreement Department of Computer Sciences, UT Austin
Parallelizer • Application independent module • Receives ordered requests from agreement • Maintains/Updates dependency graph of requests • 2 level dependency analysis • Concurrency matrix • Schedules a request if it is not dependent on any outstanding requests (no outgoing edges at a request node) • Requests that are not dependent are concurrently executed Department of Computer Sciences, UT Austin
Parallelizer : Concurrency Matrix • Definition/Figure : Square matrix rows/columns represent operations • 1 represents independent, 0 represents dependent operations • Exports application level parallelism • Statically defined • Two matrices : Dependency also depends on objects • Related objects • Unrelated objects • Table Lookup • Low overhead Department of Computer Sciences, UT Austin
Parallelizer : Dependence Analysis • Parallelizer figure : agreement stage, input queue, dependency graph, multi thread execution stage Department of Computer Sciences, UT Austin
Advantages/Limitations • Advantages : • Supports high throughput applications • Simple : Minimal/No changes to client/agreement protocol/application • Flexible : Supports different fault models easily • Limitation : • Concurrency matrix requires inner workings of application • Conservative rules ensures correctness at the expense of performance • Incrementally refine the rules to gain performance Department of Computer Sciences, UT Austin
Outline • Overview • Architecture • Implementation • Evaluation • Conclusion Department of Computer Sciences, UT Austin
System Model • Asynchronous system • Nodes operate at arbitrarily different speeds • Network may delay, drop or deliver messages out of order • Assumption : Bounded fair links • Fault Model : Byzantine Faults • Faulty nodes may behave arbitrarily : crash, lose/alter data, send incorrect messages • Adversary : Strong adversary • Can coordinate faulty nodes in arbitrarily bad ways • Assumption : Computationally limited Department of Computer Sciences, UT Austin
CBASE : Concurrent BASE • Uses unmodified PBFT agreement protocol [OSDI 1999] • Built upon BASE library [SOSP 2001] • Agreement stage : Single thread • Execution stage : Multithreaded • Parallelizer : Producer/Consumer queue • Figure ?? Department of Computer Sciences, UT Austin
Parallelizer : Interface • Parallelizer.insert() • Parallelizer.next_request() • Parallelizer.sync() Department of Computer Sciences, UT Austin
CBASE-FS : BFT NFS • Figure • Brief description of NFS concurrency matrix rules • Related objects : Same NFS handle • Rules are conservative • Refer paper for more details Department of Computer Sciences, UT Austin
Outline • Overview • Architecture • Implementation • Evaluation • Conclusion Department of Computer Sciences, UT Austin
Evaluation • With 4 server replicas that tolerate 1 Byzantine failure • Replicas running on different uniprocessor machine • 933 MHz P3, 256 MB Ram • 5 Client machines • Dedicated network with 100MB ethernet hub • OS : Redhat Linux 7.2 with NFS 2.0 • Assumption : No correlated failures due to OS. Department of Computer Sciences, UT Austin
Microbenchmark : Overhead • BASE versus CBASE Department of Computer Sciences, UT Austin
Microbenchmark : Scalability • Scalability with hardware resources • Scalability with application level parallelism Department of Computer Sciences, UT Austin
Microbenchmark : CBASE-FS/BASE-FS/NFS • Latency versus Throughput with no sleep • Latency versus Throughput with 20 ms sleep • Iozone results summary Department of Computer Sciences, UT Austin
Macrobenchmarks • Postmark : • Andrew : Department of Computer Sciences, UT Austin
Conclusions • Commercial applications have parallelism • High throughput BFT provides a simple/flexible solution to achieve high throughput Department of Computer Sciences, UT Austin
Questions ? • Why don’t you have parallelizer in the agreement stage to reduce agreement cost ? Department of Computer Sciences, UT Austin