Tolerating Byzantine Faults in Database Systems using Commit Barrier Scheduling

Tolerating Byzantine Faultsin Database Systemsusing Commit Barrier Scheduling Ben Vandiver, HariBalakrishnan, Barbara Liskov, and Sam Madden CSAIL, MIT Sponsors: Quanta Computer Inc, NSF

Non-crash faults in Databases • Over 50% of reported bugs were non-crash faults • Incorrect answers, data or index corruption, etc. • Previous focus on fail-stop faults • Better model: Byzantine faults

Failure Independence • Heterogeneous replicas • Different implementations / versions • Easiest with non-invasive solution • Requires standard interface • SQL is moderately standard

Client Interaction • Organized into Transactions • Query, Query, …, Commit / Rollback • Interactive • Strong consistency • Single-copy serializable

Database Functionality • Each Database provides • Serializable isolation • Strict (rigorous) 2-phase locking • Databases don’t execute in issue-order • Limited control over execution order Issue S1 Replica 1 executes S1 Replica 2 executes S2 S2 S2 S1

Replica Coordination • BFT well known solution • 3f+1 replicas • Globally order client requests • Replicas execute in order • Exhibits no concurrency • Goal: mechanism to extract concurrency in database context

Architecture Client Client Client Shepherd SQL SQL SQL DB1 DB2 DB3

Architecture Client Client Client SQL Result Shepherd Vote Need f+1 matching votes SQL Result SQL Result SQL ? DB1 DB2 DB3

How to extract concurrency? • Just issue statements to replicas • Likely to get stuck • Solution: pre-determine which statements conflict • Inspecting SQL is very hard

Commit Barrier Scheduling • Primary / Secondary Scheme • Run transactions first on the primary • Duplicate primary’s ordering on the secondaries • Works best when primary is Sufficiently Blocking • Required for performance, not correctness

Commit Barrier Scheduling Client Client Client SQL SQL SQL Shepherd Result Result Result SQL SQL ? DB DB DB Primary

Correct Execution • Statement Ordering Rule • Execute statements of transaction in order • Commit Ordering Rule • All replicas commit transactions in the same order • Order determined by Shepherd

Execution Trace on Primary T1 SX C T2 SY SZ C Time

Extracting Conflict Info T1 SX C T2 SY SZ C Don’t Conflict!

Avoiding Conflicts T1 SX C T2 SY SZ C Might Conflict! Transaction-Ordering Rule: A query from transaction T2 that was executed by the primary after the COMMIT of transaction T1 can be sent to a secondary only after it has processed all queries of T1.

Commit Barrier Scheduling • Maintain barrier for each replica • Mark statements and transactions with barriers • Issue statements and commits when replica’s barrier reaches appropriate value • Simple to implement

Analysis of CBS:Non-faulty primary • Full concurrency on the Primary • Deadlocks detected and resolved locally • Ample concurrency on Secondaries • allows many statements to run in parallel • Secondaries hardly ever block • Latency increase

Early Return Client Client Client Result Next SQL Stmt Shepherd Pipelined Execution! SQL SQL DB DB DB Primary

Early Return Analysis • Cut latency in half • Must vote at Commit • Sent wrong answer, abort the transaction • Correctness Condition • Clients receive correct answers for all transactions that commit

Masking Faults • Faulty Secondary not a problem • Voting resolves wrong answers • Faulty Primary is a problem • Generates invalid schedule • Goal: correct execution

Faulty Primary Scenario T1 , T2 – Increment A by 1, return A A initially 0, should end up 2 f+1 matching votes for both answers!

Other Issues • Mechanics • Replica Repair • Shepherd crashes • Heterogeneity & SQL

Implementation • Prototype called HRDB • Implemented in Java • About 3500 semicolon-lines of code • JDBC interface to clients and databases • Works with MySQL, DB2, Derby, and SQLServer

Performance 17%

Heterogeneous Replication • Ran 2f+1=3 replica system, heterogeneous vendors • MySQL, DB2, Commerical DB X • Sufficiently Blockingholds in practice • System runs at slowest of f+1 fastest replicas, or primary

Fail-Stop Faults

Bugs and HRDB • Successfully masked bugs • Heterogeneous vendors & heterogeneous versions • Found a new bug in MySQL • While running TPC-C • Present since October 2001 • Patched in recent release • Starting to look for bugs actively with HRDB

Conclusion • First practical Byzantine Fault Tolerant Database • Failure independence by supporting heterogeneous replicas • Novel concurrency extraction scheme • Tool for finding new bugs in databases

Backup Slides

Snapshot Isolation • Allows read-after-write hazards • Converts fail-stop to Byzantine faults • Need write-sets to implement • Scheme called Snapshot Barrier Scheduling

Implement with Barriers B=0 B=1 B=2 B=3 T1 SW C T2 SX SY SZ C T3 SJ SK C • Primary • S – Annotate with current barrier upon completion • C – Increment barrier before issue • Secondary • S – Issue when replica barrier is at least the value of the annotation • C – Increment replica barrier after completion

Heterogeneity Issues • Non-determinism in answers • Result set ordering • Non-deterministic functions in queries • Database-assigned row IDs • Query Rewriting • SQL incompatibility • Translation Engine • SQL hiding – Views and Stored Procedures

Future Work • Replicating the Shepherd • Efficient Replica Repair • Finding Bugs

Replica Recovery • Replicas • Fail-stop crashes – Shepherd replays missing transactions • Uses transaction log table in database to discover which transactions to replay • Byzantine faults – Shepherd repairs faulty state, then replays • Efficient repair mechanism under development • Shepherd • Fail-stop crashes - Maintains a write-ahead log

Faulty Primary • Wrong answers result in transaction abort • Concurrency Faults • Can result in secondaries being unable to make progress • System is back to “Correct but Slow” solution • Same case as when primary is not sufficiently blocking • Can be hard to tell if primary is faulty • Replace primary by doing a view change

Tolerating Byzantine Faults in Database Systems using Commit Barrier Scheduling

Tolerating Byzantine Faults in Database Systems using Commit Barrier Scheduling

Presentation Transcript

Barrier Systems

Scheduling in Batch Systems

Scheduling in Wireless Communication Systems

Network Control Systems using Scheduling Strategies

Local Tolerance to Unbounded Byzantine Faults

Modeling Kanban Scheduling in Systems of Systems

Using Database Management Systems

Tolerating Timing faults

Scheduling in Distributed Systems

Tolerating Faults in Counting Networks

Local Tolerance to Unbounded Byzantine Faults

Scheduling in Wireless Systems

A Fusion-based Approach for Tolerating Faults in Finite State Machines

IRRIGATION SCHEDULING IN DRIP SYSTEMS

Scheduling in Representative Operating Systems

Temporary Barrier Systems

Concurrency Control and Reliable Commit Protocol in Distributed Database Systems

Scheduling in Staged- DB Systems

Tolerating Faults in Distributed Systems

Distributed Systems: Faults

Scheduling in Distributed Systems

Scheduling in Wireless Communication Systems