240 likes | 359 Views
Tashkent : Uniting Durability & Ordering in Replicated Databases. Write-Many Replicated Database. Replica 1. Tx A. All replicas agree on which update tx commit their commit order Total order Determined by middleware Followed by each replica. durability. Replica 2. Tx B. durability.
E N D
Tashkent: Uniting Durability & Ordering in Replicated Databases
Write-Many Replicated Database Replica 1 Tx A • All replicas agree on • which update tx commit • their commit order • Total order • Determined by middleware • Followed by each replica durability Replica 2 Tx B durability Replica 3 durability separation
Order Determined Outside DB Replica 1 Tx A Tx A durability A B A B Tx B Replication MW (global ordering) Replica 2 Tx B durability A B A B A B Replica 3 A B durability A B One Replica
Database Proxy Task A SQL interface Task B Tx A Tx B A Enforce External Commit Order Middleware Commitorder: A B Replica durability B Cannot commit A & B concurrently! Must serialize
Database Proxy Task A SQL interface Task B Tx A Tx B B Enforce Order = Serial Commit Middleware Commitorder: A B Replica durability A Serialization slow
Commit Serialization is Slow Middleware order: A B C Commit orderA B C Proxy Ack B Ack A Ack C Root cause: Durability & ordering separated serial disk writes Database Commit A Commit B Commit C CPU DurabilityA CPU DurabilityA B CPU DurabilityA B C durability Solutions
Solution: Unite Durability & Ordering 1-Pass order info to DB 2-Move durability to MW Middleware (ordering) Middleware (ordering) Replica Replica durability durability OFF order durability Replica Replica durability OFF durability order Unite in DB
1- Unite Dur. & Ord. in Database Middleware order: A B C Commit orderA B C Proxy Commit A at 1 Commit B at 2 Commit C at 3 Ack AAck B Ack C Database order CPU DurabilityA B C Solution 1: pass order info to DB Durability & ordering in database group commit durability Solutions
Solution: Unite Durability & Ordering 1-Pass order info to DB 2-Move durability to MW Middleware (ordering) Middleware (ordering) Replica Replica durability durability OFF order durability Replica Replica durability OFF durability order Unite in DB
2- Unite D. & O. in Middleware Middleware order: A B C DurabilityA B C Commit orderA B C durability Proxy Ack B Ack A Ack C Database Commit A Commit B Commit C CPU CPU CPU durability OFF Solution 2: move durability to MW Durability & ordering in middleware group commit Roadmap
Roadmap • Durability & ordering • Separated serial commit slow • United group commit fast • Two Implementations • Tashkent-API: united in DB • Tashkent-MW: united in MW • Tashkent-MW • Implementation • Recovery • Performance
Tashkent-MW Tx A Replica 1 Tx A A B C durability OFF A B C Replication MW (global ordering) Tx B Replica 2 Tx B durability A B C A B C A B C durability OFF A B C Replica 3 Tx C A B C A B C durability OFF Tx C One Replica
Tashkent-MW Durability & Ordering in Middleware • Middleware logs tx effects • Durability of update tx • Guaranteed in middleware • Turn durability off at database • Middleware performs durability & ordering • United group commit fast • Database commits update tx serially • Commit = quick main memory operation Back to Example
Recovery in Tashkent-MW Replica 1 durability OFF Replication MW (global ordering) Replica 2 durability durability OFF Replica 3 durability OFF Db i/o
Database Standard Database I/O Log flushed for 1- Durability 2- Allow cleaning dirty data pages:{ physical integrity } Crash! Memory Tx A Data Log A A Disk Data Log A bad DB recovery
Database Database I/O with Durability=off Middleware order: A B C Simple SolutionRecover from a data dump (checkpoint) Crash! Memory Tx A Durability A Data Log A A Disk Data Log A bad DB recovery
Roadmap • Durability & ordering • Separated serial commit slow • United group commit fast • Two Implementations • Tashkent-API: united in DB • Tashkent-MW: united in MW • Tashkent-MW • Implementation • Recovery • Performance
Performance - Setup • Metrics: • Throughput • Response time • Workload: • AllUpdates: tx = { 1 update }, mix= %100 updates • TPC-B: tx={4 update,1 read}, mix=%100 updates • TPC-W: mix of long & short txs • System configuration: • Linux Cluster running PostgreSQL AllUpdates TH
AllUpdates Throughput Throughput
AllUpdates Response Time In paper
In the Paper • Design & Implementation • Tashkent-API • Performance results • TPC-B & TPC-W • Recovery times • Another I/O subsystems Conclusions
Conclusions • Durability & ordering • Separated serial commit slow • United group commit fast • Two Implementations • Tashkent-API: united in DB • Tashkent-MW: united in MW • Tashkent-MW system • Pure middleware replication • Significant performance improvement