Zephyr: Live Migration in Shared Nothing Databases for Elastic Cloud Platforms

Zephyr: Live Migration in Shared Nothing Databases for Elastic Cloud Platforms Aaron J. Elmore, Sudipto Das, DivyakantAgrawal, Amr El AbbadiDistributed Systems LabUniversity of California Santa Barbara

Cloud Application Platforms • Serve thousands of applications (tenants) • AppEngine, Azure, Force.com • Tenants are (typically) • Small • SLA sensitive • Erratic load patterns • Subject to flash crowds • i.e. the fark, digg, slashdot,reddit effect (for now) • Support for Multitenancy is critical • Our focus: DBMSs serving these platforms Sudipto Das {sudipto@cs.ucsb.edu}

Multitenancy… What the tenant wants… What the service provider wants… Sudipto Das {sudipto@cs.ucsb.edu}

Cloud Infrastructure is Elastic Static provisioning for peak is inelastic Capacity Resources Resources Capacity Demand Demand Traditional Infrastructures Deployment in the Cloud Time Time Unused resources Slide Credits: Berkeley RAD Lab Sudipto Das {sudipto@cs.ucsb.edu}

Elasticity in a Multitenant DB Load Balancer Application/Web/Caching tier Database tier Sudipto Das {sudipto@cs.ucsb.edu}

Live Database Migration • Migrate a tenant’s database in a Live system • A critical operation to support elasticity • Different from • Migration between software versions • Migration in case of schema evolution Sudipto Das {sudipto@cs.ucsb.edu}

VM Migration for DB Elasticity • VM migration [Clark et al., NSDI 2005] • One tenant-per-VM • Pros: allows fine-grained load balancing • Cons • Performance overhead • Poor consolidation ratio [Curino et al., CIDR 2011] • Multiple tenants in a VM • Pros: good performance • Cons: Migrate alltenants  Coarse-grained load balancing Sudipto Das {sudipto@cs.ucsb.edu}

Problem Formulation • Multiple tenants share the same database process • Shared process multitenancy • Example systems: SQL Azure, ElasTraS, RelationalCloud, and may more • Migrate individual tenants • VM migration cannot be used for fine-grained migration • Target architecture: Shared Nothing • Shared storage architectures: see our VLDB 2011 Paper Sudipto Das {sudipto@cs.ucsb.edu}

Shared nothing architecture Sudipto Das {sudipto@cs.ucsb.edu}

Why is Live Migration hard? • How to ensure no downtime? • Need to migrate the persistent database image (tens of MBs to GBs) • How to guarantee correctness during failures? • Nodes can fail during migration • How to ensure transaction atomicity and durability? • How to recover migration state after failure? • Nodes recover after a failure • How to guarantee serializability? • Transaction correctness equivalent to normal operation • How to minimize migration cost? … Sudipto Das {sudipto@cs.ucsb.edu}

Migration Cost Metrics • Downtime • Time tenant is unavailable • Service Interruption • Number of operations failing/transactions aborting • Migration Overhead/Performance impact • During normal operation, migration, and after migration • Additional Data Transferred • Data transferred in addition to DB’s persistent image Sudipto Das {sudipto@cs.ucsb.edu}

How did we do it? • Migration executed in phases • Starts with transfer of minimal information to destination (“wireframe”) • Source and destination concurrently execute transactions in one migration phase • Database pages used as granule of migration • Pages “pulled” by destination on-demand • Minimal transaction synchronization • A page is uniquely owned by either source or destination • Leverage page level locking • Logging and handshaking protocols to tolerate failures Sudipto Das {sudipto@cs.ucsb.edu}

Simplifying Assumptions • For this talk • Small tenants • i.e. not sharded across nodes. • No replication • No structural changes to indices • Extensions in the paper • Relaxes these assumptions Sudipto Das {sudipto@cs.ucsb.edu}

Design Overview P1 P2 P3 Owned Pages Pn TS1,…, TSk Active transactions Destination Source Page owned by Node Page not owned by Node Sudipto Das {sudipto@cs.ucsb.edu}

Init Mode Freeze index wireframe and migrate P1 P1 P2 P2 P3 P3 Owned Pages Un-owned Pages Pn Pn TS1,…, TSk Active transactions Destination Source Page owned by Node Page not owned by Node Sudipto Das {sudipto@cs.ucsb.edu}

What is an index wireframe? Source Destination Sudipto Das {sudipto@cs.ucsb.edu}

Dual Mode Requests for un-owned pages can block P3 accessed by TDi P1 P1 P2 P2 P3 P3 P3pulled from source Pn Pn TSk+1,…, TSl TD1,…, TDm Old, still active transactions New transactions Destination Source Page owned by Node Index wireframes remain frozen Page not owned by Node Sudipto Das {sudipto@cs.ucsb.edu}

Finish Mode Pages can be pulled by the destination, if needed P1 P1 P2 P2 P3 P3 P1, P2, … pushed from source Pn Pn TDm+1,…, TDn Completed Destination Source Page owned by Node Page not owned by Node Sudipto Das {sudipto@cs.ucsb.edu}

Normal Operation Index wireframe un-frozen P1 P2 P3 Pn TDn+1,…, TDp Destination Source Page owned by Node Page not owned by Node Sudipto Das {sudipto@cs.ucsb.edu}

Artifacts of this design • Once migrated, pages are never pulled back by source • Transactions at source accessing migrated pages are aborted • No structural changes to indices during migration • Transactions (at both nodes) that make structural changes to indices abort • Destination “pulls” pages on-demand • Transactions at the destination experience higher latency compared to normal operation Sudipto Das {sudipto@cs.ucsb.edu}

Serializability (proofs in paper) • Only concern is “dual mode” • Init and Finish: only one node is executing transactions • Local predicate locking of internal index and exclusive page level locking between nodes  no phantoms • Strict 2PL  Transactions are locally serializable • Pages transferred only once • No Tdest  Tsource conflict dependency • Guaranteed serializability Sudipto Das {sudipto@cs.ucsb.edu}

Recovery (proofs in paper) • Transaction recovery • For every database page, transactions at source ordered before transactions at destination • After failure, conflicting transactions replayed in the same order • Migration recovery • Atomic transitions between migration modes • Logging and handshake protocols • Every page has exactly one owner • Bookkeeping at the index level Sudipto Das {sudipto@cs.ucsb.edu}

Correctness (proofs in paper) • In the presence of arbitrary repeated failures, Zephyr ensures: • Updates made to database pages are consistent • A failure does not leave a page without an owner • Both source and destination are in the same migration mode • Guaranteed termination and starvation freedom Sudipto Das {sudipto@cs.ucsb.edu}

Extensions (Details in the paper) • Replicated Tenants • Sharded Tenants • Allow structural changes to the indices • Using shared lock managers in the dual mode Sudipto Das {sudipto@cs.ucsb.edu}

Implementation • Prototyped using an open source OLTP database H2 • Supports standard SQL/JDBC API • Serializable isolation level • Tree Indices • Relational data model • Modified the database engine • Added support for freezing indices • Page migration status maintained using index • Details in the paper… • Tungsten SQL Routermigrates JDBC connections during migration Sudipto Das {sudipto@cs.ucsb.edu}

Experimental Setup • Two database nodes, each with a DB instance running • Synthetic benchmark as load generator • Modified YCSB to add transactions • Small read/write transactions • Compared against Stop and Copy (S&C) Sudipto Das {sudipto@cs.ucsb.edu}

Experimental Methodology • Default transaction parameters: • 10 operations per transaction 80% Read, 15% Update, 5% Inserts System Controller Metadata Workload: 60 sessions 100 Transactions per session Migrate • Hardware: 2.4 Ghz Intel Core 2 Quads, 8GB RAM, 7200 RPM SATA HDs with 32 MB Cache • Gigabit ethernet • Default DB Size: 100k rows (~250 MB) Sudipto Das {sudipto@cs.ucsb.edu}

Results Overview • Downtime (tenant unavailability) • S&C: 3 – 8 seconds (needed to migrate, unavailable for updates) • Zephyr:No downtime. Either source or destination is available • Service interruption (failed operations) • S&C: ~100 s – 1,000s. All transactions with updates are aborted • Zephyr: ~10s – 100s. Orders of magnitude less interruption Sudipto Das {sudipto@cs.ucsb.edu}

Results Overview • Average increase in transaction latency (compared to the 6,000 transaction workload without migration) • S&C: 10 – 15%. Cold cache at destination • Zephyr: 10 – 20%. Pages fetched on-demand • Data transfer • S&C: Persistent database image • Zephyr: 2 – 3% additional data transfer (messaging overhead) • Total time taken to migrate • S&C: 3 – 8 seconds. Unavailable for any writes • Zephyr: 10 – 18 seconds. No-unavailability Sudipto Das {sudipto@cs.ucsb.edu}

Failed Operations Sudipto Das {sudipto@cs.ucsb.edu} Orders of magnitude fewer failed operations

Contributions • Proposed Zephyr, a live database migration technique with no downtime for shared nothing architectures • The first end to end solution with safety, correctness and liveness guarantees • Prototype implementation on a relational OLTP database • Low cost on a variety of workloads Sudipto Das {sudipto@cs.ucsb.edu}

Back-up

More details Txns Source Destination Sudipto Das {sudipto@cs.ucsb.edu}

Freeze indexes Txns Source Destination Sudipto Das {sudipto@cs.ucsb.edu}

Duplicate indexes with sentinels Txns Source Destination Sudipto Das {sudipto@cs.ucsb.edu}

Dual Mode Txns Source Destination Sudipto Das {sudipto@cs.ucsb.edu}

Finish Mode Txns Source Destination Sudipto Das {sudipto@cs.ucsb.edu} 37

Finish Mode Txns Source Destination Sudipto Das {sudipto@cs.ucsb.edu}

Guarantees • Either source or destination is serving the tenant • No downtime • Serializable transaction execution • Unique page ownership • Local multi-granularity locking • Safety in the presence of failures • Transactions are atomic and durable • Migration state is recovered from log • Ensure consistency of the database state Sudipto Das {sudipto@cs.ucsb.edu}

Migration Cost Analysis • Wireframe copy • Typically orders of magnitude smaller than data • Operational overhead during migration • Extra data (in addition to database pages) transferred • Transactions aborted during migration Sudipto Das {sudipto@cs.ucsb.edu}

Effect of Inserts on Zephyr Sudipto Das {sudipto@cs.ucsb.edu} Failures due to attempted modification of Index structure

Average Transaction Latency Sudipto Das {sudipto@cs.ucsb.edu} Only committed transaction reported Loss of cache for both migration types Zephyrresults in a remote fetch

Zephyr: Live Migration in Shared Nothing Databases for Elastic Cloud Platforms