190 likes | 453 Views
Calvin: Fast Distributed Transactions for Partitioned Database Systems Thomson et al SIGMOD 2012. Presented by Dr. Greg Speegle April 12, 2013. Distributed Transactions. Two-phase commit slow relative to local transaction processing CAP Theorem Option 1: Reduce availability
E N D
Calvin: Fast Distributed Transactions for Partitioned Database SystemsThomson et alSIGMOD 2012 Presented by Dr. Greg Speegle April 12, 2013
Distributed Transactions • Two-phase commit slow relative to local transaction processing • CAP Theorem • Option 1: Reduce availability • Option 2: Reduce consistency • Goal: Provide availability and consistency by changing transaction semantics
Deterministic Transactions • Normal transaction execution • Submit SQL statements • Subsequent operations dependent on results • Deterministic transaction execution • Submit all requests before start • Example: Auto-commit • Difficult for dependent execution
Architecture • Sequencing Layer • Per replica • Creates universal transaction execution order • Scheduling Layer • Per data store • Executes transactions consistently with order • Storage Layer • CRUD interface
Data Model • Dataset partitioned • Partitions are replicated • One copy of each partition forms replica • All replicas of one partition form replication group • Master/slave within replication group (for asynchronous replication)
Sequencer • Requests (deterministic transaction) submitted locally • Epoch – 10ms group of requests • Asynchronous replication – master receives all requests & determines order • Synchronous replication – Paxos determines order • Batch sent to scheduler
Scheduler • Logical concurrency control & recovery (e.g., no TIDs) • Lock manager distributed (lock only keys stored locally) • Strict 2PL with changes: • If t0 and t1 conflict and t0 precedes t1 in sequence order, t0 locks before t1 • All lock requests by transaction processed together in sequence order
Scheduler II • Transaction executes after all locks acquired • Read/Write set analysis • Local vs Remote • Read-only nodes are passive participants • Write nodes are active participants • Local Reads • Distribute reads to active participants • Collect remote read results • Apply local writes
Scheduler III • Deadlock Free (acyclic waits-for graph) • Dependent Transactions • Read-only reconnaissance query generates read set • Transaction executed with resulting read/write locks • Re-execute if changes • Maximum conflict footprint under 2PL
Storage • Disk I/O problem • Pause t0 when I/O required • A t1 can “jump ahead” of t0 (get conflicting lock before t0) • Solution: Delay t0, but request data • So t1 may precede t0 in sequence (assume) and execution
Checkpointing • Logging requires only ordered transactions to restore after failure • At checkpoint time (global epoch time) • Keep two versions of data, before & after • Transaction access appropriate data • After all “before” transactions terminate, flush all data • Throw away “before” version if “after” exists • 20% throughput impact
Performance • TPC-C benchmark (order placing) • Throughput scales linearly with number of machines • Per-node throughput appears asymptotic • At high contention, outperforms RDBMS • At low contention, worse performance
Conclusion • Adds ACID capability to any CRUD system • Performs nearly linear scale-up • Requires deterministic transactions