420 likes | 583 Views
Zephyr: Live Migration in Shared Nothing Databases for Elastic Cloud Platforms. Aaron J. Elmore, Sudipto Das , Divyakant Agrawal , Amr El Abbadi Distributed Systems Lab University of California Santa Barbara. Cloud Application Platforms. Serve thousands of applications (tenants)
E N D
Zephyr: Live Migration in Shared Nothing Databases for Elastic Cloud Platforms Aaron J. Elmore, Sudipto Das, DivyakantAgrawal, Amr El AbbadiDistributed Systems LabUniversity of California Santa Barbara
Cloud Application Platforms • Serve thousands of applications (tenants) • AppEngine, Azure, Force.com • Tenants are (typically) • Small • SLA sensitive • Erratic load patterns • Subject to flash crowds • i.e. the fark, digg, slashdot,reddit effect (for now) • Support for Multitenancy is critical • Our focus: DBMSs serving these platforms Sudipto Das {sudipto@cs.ucsb.edu}
Multitenancy… What the tenant wants… What the service provider wants… Sudipto Das {sudipto@cs.ucsb.edu}
Cloud Infrastructure is Elastic Static provisioning for peak is inelastic Capacity Resources Resources Capacity Demand Demand Traditional Infrastructures Deployment in the Cloud Time Time Unused resources Slide Credits: Berkeley RAD Lab Sudipto Das {sudipto@cs.ucsb.edu}
Elasticity in a Multitenant DB Load Balancer Application/Web/Caching tier Database tier Sudipto Das {sudipto@cs.ucsb.edu}
Live Database Migration • Migrate a tenant’s database in a Live system • A critical operation to support elasticity • Different from • Migration between software versions • Migration in case of schema evolution Sudipto Das {sudipto@cs.ucsb.edu}
VM Migration for DB Elasticity • VM migration [Clark et al., NSDI 2005] • One tenant-per-VM • Pros: allows fine-grained load balancing • Cons • Performance overhead • Poor consolidation ratio [Curino et al., CIDR 2011] • Multiple tenants in a VM • Pros: good performance • Cons: Migrate alltenants Coarse-grained load balancing Sudipto Das {sudipto@cs.ucsb.edu}
Problem Formulation • Multiple tenants share the same database process • Shared process multitenancy • Example systems: SQL Azure, ElasTraS, RelationalCloud, and may more • Migrate individual tenants • VM migration cannot be used for fine-grained migration • Target architecture: Shared Nothing • Shared storage architectures: see our VLDB 2011 Paper Sudipto Das {sudipto@cs.ucsb.edu}
Shared nothing architecture Sudipto Das {sudipto@cs.ucsb.edu}
Why is Live Migration hard? • How to ensure no downtime? • Need to migrate the persistent database image (tens of MBs to GBs) • How to guarantee correctness during failures? • Nodes can fail during migration • How to ensure transaction atomicity and durability? • How to recover migration state after failure? • Nodes recover after a failure • How to guarantee serializability? • Transaction correctness equivalent to normal operation • How to minimize migration cost? … Sudipto Das {sudipto@cs.ucsb.edu}
Migration Cost Metrics • Downtime • Time tenant is unavailable • Service Interruption • Number of operations failing/transactions aborting • Migration Overhead/Performance impact • During normal operation, migration, and after migration • Additional Data Transferred • Data transferred in addition to DB’s persistent image Sudipto Das {sudipto@cs.ucsb.edu}
How did we do it? • Migration executed in phases • Starts with transfer of minimal information to destination (“wireframe”) • Source and destination concurrently execute transactions in one migration phase • Database pages used as granule of migration • Pages “pulled” by destination on-demand • Minimal transaction synchronization • A page is uniquely owned by either source or destination • Leverage page level locking • Logging and handshaking protocols to tolerate failures Sudipto Das {sudipto@cs.ucsb.edu}
Simplifying Assumptions • For this talk • Small tenants • i.e. not sharded across nodes. • No replication • No structural changes to indices • Extensions in the paper • Relaxes these assumptions Sudipto Das {sudipto@cs.ucsb.edu}
Design Overview P1 P2 P3 Owned Pages Pn TS1,…, TSk Active transactions Destination Source Page owned by Node Page not owned by Node Sudipto Das {sudipto@cs.ucsb.edu}
Init Mode Freeze index wireframe and migrate P1 P1 P2 P2 P3 P3 Owned Pages Un-owned Pages Pn Pn TS1,…, TSk Active transactions Destination Source Page owned by Node Page not owned by Node Sudipto Das {sudipto@cs.ucsb.edu}
What is an index wireframe? Source Destination Sudipto Das {sudipto@cs.ucsb.edu}
Dual Mode Requests for un-owned pages can block P3 accessed by TDi P1 P1 P2 P2 P3 P3 P3pulled from source Pn Pn TSk+1,…, TSl TD1,…, TDm Old, still active transactions New transactions Destination Source Page owned by Node Index wireframes remain frozen Page not owned by Node Sudipto Das {sudipto@cs.ucsb.edu}
Finish Mode Pages can be pulled by the destination, if needed P1 P1 P2 P2 P3 P3 P1, P2, … pushed from source Pn Pn TDm+1,…, TDn Completed Destination Source Page owned by Node Page not owned by Node Sudipto Das {sudipto@cs.ucsb.edu}
Normal Operation Index wireframe un-frozen P1 P2 P3 Pn TDn+1,…, TDp Destination Source Page owned by Node Page not owned by Node Sudipto Das {sudipto@cs.ucsb.edu}
Artifacts of this design • Once migrated, pages are never pulled back by source • Transactions at source accessing migrated pages are aborted • No structural changes to indices during migration • Transactions (at both nodes) that make structural changes to indices abort • Destination “pulls” pages on-demand • Transactions at the destination experience higher latency compared to normal operation Sudipto Das {sudipto@cs.ucsb.edu}
Serializability (proofs in paper) • Only concern is “dual mode” • Init and Finish: only one node is executing transactions • Local predicate locking of internal index and exclusive page level locking between nodes no phantoms • Strict 2PL Transactions are locally serializable • Pages transferred only once • No Tdest Tsource conflict dependency • Guaranteed serializability Sudipto Das {sudipto@cs.ucsb.edu}
Recovery (proofs in paper) • Transaction recovery • For every database page, transactions at source ordered before transactions at destination • After failure, conflicting transactions replayed in the same order • Migration recovery • Atomic transitions between migration modes • Logging and handshake protocols • Every page has exactly one owner • Bookkeeping at the index level Sudipto Das {sudipto@cs.ucsb.edu}
Correctness (proofs in paper) • In the presence of arbitrary repeated failures, Zephyr ensures: • Updates made to database pages are consistent • A failure does not leave a page without an owner • Both source and destination are in the same migration mode • Guaranteed termination and starvation freedom Sudipto Das {sudipto@cs.ucsb.edu}
Extensions (Details in the paper) • Replicated Tenants • Sharded Tenants • Allow structural changes to the indices • Using shared lock managers in the dual mode Sudipto Das {sudipto@cs.ucsb.edu}
Implementation • Prototyped using an open source OLTP database H2 • Supports standard SQL/JDBC API • Serializable isolation level • Tree Indices • Relational data model • Modified the database engine • Added support for freezing indices • Page migration status maintained using index • Details in the paper… • Tungsten SQL Routermigrates JDBC connections during migration Sudipto Das {sudipto@cs.ucsb.edu}
Experimental Setup • Two database nodes, each with a DB instance running • Synthetic benchmark as load generator • Modified YCSB to add transactions • Small read/write transactions • Compared against Stop and Copy (S&C) Sudipto Das {sudipto@cs.ucsb.edu}
Experimental Methodology • Default transaction parameters: • 10 operations per transaction 80% Read, 15% Update, 5% Inserts System Controller Metadata Workload: 60 sessions 100 Transactions per session Migrate • Hardware: 2.4 Ghz Intel Core 2 Quads, 8GB RAM, 7200 RPM SATA HDs with 32 MB Cache • Gigabit ethernet • Default DB Size: 100k rows (~250 MB) Sudipto Das {sudipto@cs.ucsb.edu}
Results Overview • Downtime (tenant unavailability) • S&C: 3 – 8 seconds (needed to migrate, unavailable for updates) • Zephyr:No downtime. Either source or destination is available • Service interruption (failed operations) • S&C: ~100 s – 1,000s. All transactions with updates are aborted • Zephyr: ~10s – 100s. Orders of magnitude less interruption Sudipto Das {sudipto@cs.ucsb.edu}
Results Overview • Average increase in transaction latency (compared to the 6,000 transaction workload without migration) • S&C: 10 – 15%. Cold cache at destination • Zephyr: 10 – 20%. Pages fetched on-demand • Data transfer • S&C: Persistent database image • Zephyr: 2 – 3% additional data transfer (messaging overhead) • Total time taken to migrate • S&C: 3 – 8 seconds. Unavailable for any writes • Zephyr: 10 – 18 seconds. No-unavailability Sudipto Das {sudipto@cs.ucsb.edu}
Failed Operations Sudipto Das {sudipto@cs.ucsb.edu} Orders of magnitude fewer failed operations
Contributions • Proposed Zephyr, a live database migration technique with no downtime for shared nothing architectures • The first end to end solution with safety, correctness and liveness guarantees • Prototype implementation on a relational OLTP database • Low cost on a variety of workloads Sudipto Das {sudipto@cs.ucsb.edu}
More details Txns Source Destination Sudipto Das {sudipto@cs.ucsb.edu}
Freeze indexes Txns Source Destination Sudipto Das {sudipto@cs.ucsb.edu}
Duplicate indexes with sentinels Txns Source Destination Sudipto Das {sudipto@cs.ucsb.edu}
Dual Mode Txns Source Destination Sudipto Das {sudipto@cs.ucsb.edu}
Finish Mode Txns Source Destination Sudipto Das {sudipto@cs.ucsb.edu} 37
Finish Mode Txns Source Destination Sudipto Das {sudipto@cs.ucsb.edu}
Guarantees • Either source or destination is serving the tenant • No downtime • Serializable transaction execution • Unique page ownership • Local multi-granularity locking • Safety in the presence of failures • Transactions are atomic and durable • Migration state is recovered from log • Ensure consistency of the database state Sudipto Das {sudipto@cs.ucsb.edu}
Migration Cost Analysis • Wireframe copy • Typically orders of magnitude smaller than data • Operational overhead during migration • Extra data (in addition to database pages) transferred • Transactions aborted during migration Sudipto Das {sudipto@cs.ucsb.edu}
Effect of Inserts on Zephyr Sudipto Das {sudipto@cs.ucsb.edu} Failures due to attempted modification of Index structure
Average Transaction Latency Sudipto Das {sudipto@cs.ucsb.edu} Only committed transaction reported Loss of cache for both migration types Zephyrresults in a remote fetch