350 likes | 485 Views
Tolerating Byzantine Faults in Database Systems using Commit Barrier Scheduling. Ben Vandiver , Hari Balakrishnan , Barbara Liskov , and Sam Madden CSAIL, MIT. Sponsors: Quanta Computer Inc, NSF. Non-crash faults in Databases. Over 50% of reported bugs were non-crash faults
E N D
Tolerating Byzantine Faultsin Database Systemsusing Commit Barrier Scheduling Ben Vandiver, HariBalakrishnan, Barbara Liskov, and Sam Madden CSAIL, MIT Sponsors: Quanta Computer Inc, NSF
Non-crash faults in Databases • Over 50% of reported bugs were non-crash faults • Incorrect answers, data or index corruption, etc. • Previous focus on fail-stop faults • Better model: Byzantine faults
Failure Independence • Heterogeneous replicas • Different implementations / versions • Easiest with non-invasive solution • Requires standard interface • SQL is moderately standard
Client Interaction • Organized into Transactions • Query, Query, …, Commit / Rollback • Interactive • Strong consistency • Single-copy serializable
Database Functionality • Each Database provides • Serializable isolation • Strict (rigorous) 2-phase locking • Databases don’t execute in issue-order • Limited control over execution order Issue S1 Replica 1 executes S1 Replica 2 executes S2 S2 S2 S1
Replica Coordination • BFT well known solution • 3f+1 replicas • Globally order client requests • Replicas execute in order • Exhibits no concurrency • Goal: mechanism to extract concurrency in database context
Architecture Client Client Client Shepherd SQL SQL SQL DB1 DB2 DB3
Architecture Client Client Client SQL Result Shepherd Vote Need f+1 matching votes SQL Result SQL Result SQL ? DB1 DB2 DB3
How to extract concurrency? • Just issue statements to replicas • Likely to get stuck • Solution: pre-determine which statements conflict • Inspecting SQL is very hard
Commit Barrier Scheduling • Primary / Secondary Scheme • Run transactions first on the primary • Duplicate primary’s ordering on the secondaries • Works best when primary is Sufficiently Blocking • Required for performance, not correctness
Commit Barrier Scheduling Client Client Client SQL SQL SQL Shepherd Result Result Result SQL SQL ? DB DB DB Primary
Correct Execution • Statement Ordering Rule • Execute statements of transaction in order • Commit Ordering Rule • All replicas commit transactions in the same order • Order determined by Shepherd
Execution Trace on Primary T1 SX C T2 SY SZ C Time
Extracting Conflict Info T1 SX C T2 SY SZ C Don’t Conflict!
Avoiding Conflicts T1 SX C T2 SY SZ C Might Conflict! Transaction-Ordering Rule: A query from transaction T2 that was executed by the primary after the COMMIT of transaction T1 can be sent to a secondary only after it has processed all queries of T1.
Commit Barrier Scheduling • Maintain barrier for each replica • Mark statements and transactions with barriers • Issue statements and commits when replica’s barrier reaches appropriate value • Simple to implement
Analysis of CBS:Non-faulty primary • Full concurrency on the Primary • Deadlocks detected and resolved locally • Ample concurrency on Secondaries • allows many statements to run in parallel • Secondaries hardly ever block • Latency increase
Early Return Client Client Client Result Next SQL Stmt Shepherd Pipelined Execution! SQL SQL DB DB DB Primary
Early Return Analysis • Cut latency in half • Must vote at Commit • Sent wrong answer, abort the transaction • Correctness Condition • Clients receive correct answers for all transactions that commit
Masking Faults • Faulty Secondary not a problem • Voting resolves wrong answers • Faulty Primary is a problem • Generates invalid schedule • Goal: correct execution
Faulty Primary Scenario T1 , T2 – Increment A by 1, return A A initially 0, should end up 2 f+1 matching votes for both answers!
Other Issues • Mechanics • Replica Repair • Shepherd crashes • Heterogeneity & SQL
Implementation • Prototype called HRDB • Implemented in Java • About 3500 semicolon-lines of code • JDBC interface to clients and databases • Works with MySQL, DB2, Derby, and SQLServer
Performance 17%
Heterogeneous Replication • Ran 2f+1=3 replica system, heterogeneous vendors • MySQL, DB2, Commerical DB X • Sufficiently Blockingholds in practice • System runs at slowest of f+1 fastest replicas, or primary
Bugs and HRDB • Successfully masked bugs • Heterogeneous vendors & heterogeneous versions • Found a new bug in MySQL • While running TPC-C • Present since October 2001 • Patched in recent release • Starting to look for bugs actively with HRDB
Conclusion • First practical Byzantine Fault Tolerant Database • Failure independence by supporting heterogeneous replicas • Novel concurrency extraction scheme • Tool for finding new bugs in databases
Snapshot Isolation • Allows read-after-write hazards • Converts fail-stop to Byzantine faults • Need write-sets to implement • Scheme called Snapshot Barrier Scheduling
Implement with Barriers B=0 B=1 B=2 B=3 T1 SW C T2 SX SY SZ C T3 SJ SK C • Primary • S – Annotate with current barrier upon completion • C – Increment barrier before issue • Secondary • S – Issue when replica barrier is at least the value of the annotation • C – Increment replica barrier after completion
Heterogeneity Issues • Non-determinism in answers • Result set ordering • Non-deterministic functions in queries • Database-assigned row IDs • Query Rewriting • SQL incompatibility • Translation Engine • SQL hiding – Views and Stored Procedures
Future Work • Replicating the Shepherd • Efficient Replica Repair • Finding Bugs
Replica Recovery • Replicas • Fail-stop crashes – Shepherd replays missing transactions • Uses transaction log table in database to discover which transactions to replay • Byzantine faults – Shepherd repairs faulty state, then replays • Efficient repair mechanism under development • Shepherd • Fail-stop crashes - Maintains a write-ahead log
Faulty Primary • Wrong answers result in transaction abort • Concurrency Faults • Can result in secondaries being unable to make progress • System is back to “Correct but Slow” solution • Same case as when primary is not sufficiently blocking • Can be hard to tell if primary is faulty • Replace primary by doing a view change