1 / 42

DOLLY: Virtualization-Driven Database Provisioning for the Cloud

DOLLY: Virtualization-Driven Database Provisioning for the Cloud. Emmanuel Cecchet Joint work with Rahul Singh, Upendra Sharma and Prashant Shenoy. THE CLOUD. Virtualization Pay as you go Elasticity. Internet. Frontend/ Load balancer. App. Servers. Databases.

eithne
Download Presentation

DOLLY: Virtualization-Driven Database Provisioning for the Cloud

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DOLLY: Virtualization-Driven Database Provisioning for the Cloud Emmanuel Cecchet Joint work with Rahul Singh, Upendra Sharma and PrashantShenoy

  2. THE CLOUD • Virtualization • Pay as you go • Elasticity Internet Frontend/ Load balancer App. Servers Databases

  3. PROVISIONING IN THE CLOUD • Based on request volume and resource usage • Reactions based on thresholds • Works for stateless tiers Internet Frontend/ Load balancer App. Servers Provisioning logic Databases

  4. WHY IS IT HARD TO ADD A DB REPLICA? 5pm Replay updates Replica ready New replica MySQL backup MySQL restore snapshot

  5. WHY IS IT HARD TO ADD A DB REPLICA? 5pm 2pm Replay updates 2pm snapshot New replica Replica ready MySQL restore

  6. WHY IS IT HARD TO ADD A DB REPLICA? snapshot snapshot RUBiS New replica RUBiS x users 14min MySQL backup MySQL restore Queries are slow… Let’s improve this! CREATE INDEX i1 on Table1 ; CREATE INDEX i2 on Table 2 RUBiS New replica RUBiS x users with indices 1H36min MySQL backup MySQL restore

  7. WHY IS IT HARD TO ADD A DB REPLICA? snapshot snapshot Warehouse 1GB Warehouse 10GB Warehouse 1GB Warehouse 10GB 24min 1H30min PostgreSQL backup PostgreSQL backup PostgresSQL restore PostgresSQL restore • apt-get update postgresql • echo 1 > /proc/sys/magic_options • CREATE USER x • GRANT PRIVILEGES TO y

  8. What are the main problems ? • When to start replica spawning? • How to predict replica spawning time? • How to make replica spawning platform independent? • When to generate new snapshots? • How can we minimize resource usage? • Power/cooling in private cloud • $ cost in public cloud

  9. VM Cloning: Backup/Restore in constant time • Filesystem snapshot/copy is OS & DB agnostic • Only depends on VM size

  10. Dolly • Database replication in the Cloud • Provisioning with Dolly • Prototype & Evaluation

  11. SPAWNING A REPLICA WITH CLONING Backup & Restore replace by VM cloning VM3 VM3 DB2 DB2 OS OS 4 add replica resynchronize 1 VM1 VM2 2 3 clone clone OS OS Client SQL requests Management console Replication middleware Transactional log Load balancer DB1 DB2 3

  12. SPAWNING IN A PRIVATE CLOUD Clone entire virtual machine for backup/restore Backup server is optional VM3 VM4 VM3 VM4 DB2 DB2 DB2 DB2 OS OS OS OS VM2 DB1 DB2 B clone stop OS 1 resume VM1 VM1 start start OS OS 3 2 DB1 1 clone 3 2

  13. SPAWNING IN A PUBLIC CLOUD Storage decoupled from computing resource Starting a new instance clones the volume Vol2 Vol4 Vol2 Vol3 DB2 DB2 DB2 DB2 DB1 DB1 OS OS OS OS DB1 snapshot stop register start restart Vol1 Vol1 Vol1 OS OS OS

  14. Dolly • Database replication in the Cloud • Provisioning with Dolly • Prototype & Evaluation

  15. MODELING SPAWNING TIME Predictable backup and restore times are required Replay time can be estimated from write throughput wt: current workload write throughput wmax: replay speed of the spawning replica replica spawning time backup restore replay time ri bi updates

  16. WHEN TO SNAPSHOT? Time to spawn from a live replica Time to spawn from an existing snapshot Faster to take a new snapshot j to spawn a new replica than using old snapshot i if:backupj+restorej< restorei+replayi

  17. DOLLY OVERVIEW Input capacity prediction write prediction Output schedule of snapshots schedule of replica spawning admission control if needed Dolly write predictions capacity predictions Snapshot scheduler Capacity Provisioning HA adjuster Spawning options Write throttling Paused pool cleaner Scheduler start/stop clone/ snapshot write throttling/ read throttling delete VM/ snapshot Predictors Admission Control Management API Monitoring Free pool Manager reclaim

  18. PROVISIONING REPLICAS • Dolly does not provide predictors • Dolly can work with any predictor (see [Eurosys09]) Workload prediction Capacity prediction Write prediction

  19. CLOUD COST FUNCTIONS Adapt the provisioning decisions to the cloud platform specifics Cost can be $ on public cloud or time on private cloud

  20. PROVISIONING REPLICAS Parse capacity provisioning predictions Decrease capacity by pausing VMs Increasing capacity Check if we can reuse a paused VM Check if we can spawn from an existing snapshot Choose cheapest options according to spawn_costfunction Perform admission control if all replicas cannot be provisioned in time

  21. SNAPSHOT SCHEDULING How to snapshot? Clone a paused VM Pause an active VM to clone it When to snapshot? At time j whenbackupj+restorej<restorei+replayi If new snapshot is scheduled, re-run capacity provisioning Prediction window must have minimum size

  22. Dolly • Database replication in the Cloud • Provisioning with Dolly • Prototype & Evaluation

  23. IMPLEMENTATION C-JDBC/Sequoia replication middleware OpenNebula Cloud management middleware Cost functions private cloud: minimize resource utilization time Amazon EC2: minimize cost VM4 VMclone VM5 VM3 VM2 VM1 OS OS OS OS OS OS Dolly Private EC2 Recovery Log Dump table Log table DB3snapshot New replica New replica admission control TPC-W load injector predictions Sequoia driver write throttling SQL requests add/remove replica snapshot/pause/… Sequoia controller JMX Management API Scheduler Backupers Dolly OpenNebula Loadbalancer start/stop/ clone/… DB1 DB2 DB3 OpenNebula clone clone Backup server or NAS

  24. IMPLEMENTATION – COST FUNCTIONS Private cloud: minimize resource utilization Amazon EC2: minimize cost

  25. TPC-W EVALUATION Multi-tier online bookstore benchmark 4GB Xen VM for the database Large EC2 instances from EBS volumes with CloudWatch

  26. WORKLOAD DESCRIPTION Snapshot s0 available at t0

  27. Overprovisioning with 6 replicas – 1h snapshot

  28. Reactive provisioning replicas available replica spawning triggered here

  29. Reactive provisioning – 15m snapshot

  30. Dolly – 30m Prediction Window Private Cloud Amazon EC2 s2 s1 cheaper to leave instances online

  31. VM cloning Solves administration issues by blackboxing the database Constant time backup/restore needed to predict replica spawning time New provisioning algorithm Decouples capacity provisioning from snapshot scheduling Cost functions to optimize for cloud platform specifics CONCLUSION

  32. Bonus Slides

  33. Dolly – 10m Prediction Window Private Cloud Amazon EC2 s1 s1 s2 s2

  34. Reactive provisioning – 1h snapshot

  35. BACKUP/RESTORE TECHNIQUES Database native tools Vendor specific or 3rd party ETL Understand database semantics Filesystem copy Low-level data copy Need to know what to copy VM cloning Copies database content + configuration + OS Unused space can be compressed

  36. DATABASE SIZES

  37. BACKUP/RESTORE PERFORMANCE (1/3) Performance depends on database content

  38. BACKUP/RESTORE PERFORMANCE (2/3) • File copy is the most effective for small databases

  39. BACKUP/RESTORE PERFORMANCE (3/3) • VM cloning most effective on large databases

  40. BACKUP/RESTORE SUMMARY

  41. DOLLY MAIN ALGORITHM Capacity provisioning depends on available snapshots Snapshots scheduled according to capacity demand Decouple capacity provisioning from snapshot scheduling if (predictor.capacity_changes || predictor.write_workload_changes) { do { schedule = capacity_provisioning(predictions) snapshot_schedule = snapshot_scheduling(predictions) } while (snapshot_schedule schedules new snapshots) scheduler.schedule(snapshot_schedule) scheduler.schedule(capacity_schedule) } if (time since last operation > threshold) { paused_pool_cleaner.release_old_paused_vms(); paused_pool_cleaner.delete_old_snapshots(); }

  42. RELEASING RESOURCES Paused VMs VM never re-used if cost to resume > cost to spawn from last snapshot Snapshots Old snapshots can be released based on cost to keep them around Free server pool Can reclaim servers with paused VMs when pool is empty

More Related