180 likes | 681 Views
Centiman : Scalable High Performance Transaction Processing. Bailu Ding Cornell University. 2013. Why Large Scale TP is Important?. Internet scale web applications Multi-tenancy in the cloud. What Is a Transaction?.
E N D
Centiman: Scalable High Performance Transaction Processing Bailu Ding Cornell University 2013
Why Large Scale TP is Important? • Internet scale web applications • Multi-tenancy in the cloud
What Is a Transaction? • Transaction is a sequence of read / write operations executed in a database • ACID properties • Atomicity: no partial completion • Consistency: bring the database from one valid state to another • Isolation: concurrent execution of transactions results in a database state as they were executed serially (serializability) • Durability: committed transactions will remain in the database in case of failures
Why Is Large Scale TP Hard • Lots of data • Scale beyond a single machine • Lots of transactions • High throughput • Strong consistency • Serializability • High availability
Current Solution: Key Value Store • Scalability and high availability • Not transactional • Weak consistency
Current Solution: Data Partitioning • Scalable • Strong consistency within a partition • Weak consistency across partitions • Strong consistency across partitions is expensive • Performance sensitive to choice of partition Jones, Evan et al. Low Overhead Concurrency Control for Partitioned Main Memory Databases. SIGMOD’10 Jason Baker et al. Megastore: Providing Scalable, Highly Available Storage for Interactive Services. CIDR’11
Current Solution: Google Spanner • Scalable • Strong consistency • Require special hardware: GPS or atomic clocks James Corbett et al. Spanner: Google's Globally-Distributed Database. OSDI’12
Our Goals • Large scale high performance transaction processing • Support strong consistency • No special hardware needed
Concurrency Control • Concurrency control: keep transactions isolated • Locking-based concurrency control • Optimistic concurrency control
Locking-based Concurrency Control Start T1(W(X)) T2(R(Y, X), W(X)) Obtain Locks Write Lock (X) Read Lock (Y) Release Locks and Commit Read / Execute / Write Read Lock (X) Release Locks and Commit Write Lock (X) Release Locks and Commit
Optimistic Concurrency Control Start T1(W(X)) T2(R(Y, X), W(X)) Read Local Write (X) Read (Y, X) Local Write (X) Validate Execute / Local Write Update and Commit Validate Validation Abort Update and Commit / Abort Retry (optional)
Centiman: Large Scale TP • Approach: optimistic concurrency control • Avoid overhead of locking • Centralized validation may become a bottleneck • Contribution: scalable validation • Parallel validation • Local timestamp generation • Watermark-based garbage collection
OCC: Centralized Validation Start Validator Read Processor Processor Execute / Local Write Processor Validation Storage Storage Storage Update and Commit / Abort
Parallel Validation Validator Validator Validator Client Client Processor Read / Update Database Read / Update Database Storage
Parallel Validation Vblue Vred Blue Items Red Items Processor T Storage
Future Work • Fault tolerance • Elasticity