1 / 66

Elasca : Workload-Aware Elastic Scalability for Partition Based Database Systems

Elasca : Workload-Aware Elastic Scalability for Partition Based Database Systems. Taha Rafiq MMath Thesis Presentation 24/04/2013. Outline. Introduction & Motivation VoltDB & Elastic Scale-Out Mechanism Partition Placement Problem Workload-Aware Optimizer Experiments & Results

rhian
Download Presentation

Elasca : Workload-Aware Elastic Scalability for Partition Based Database Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Elasca: Workload-Aware Elastic Scalability for Partition Based Database Systems TahaRafiq MMath Thesis Presentation 24/04/2013

  2. Outline • Introduction & Motivation • VoltDB & Elastic Scale-Out Mechanism • Partition Placement Problem • Workload-Aware Optimizer • Experiments & Results • Supporting Multi-Partition Transactions • Conclusion

  3. IntroDuction & Motivation

  4. DBMS Scalability Replication Partitioning

  5. Traditional (DBMS) Scalability Expensive Downtime Ability of a system to be enlarged to handle growing amount of work

  6. Elastic (DBMS) Scalability No Downtime Use of computer resources which vary dynamically to meet a variable workload

  7. Elastically Scaling a Partition Based DBMS Re-Partitioning Node 1 Scale Out Node 1 Node 2 Partition 1 Partition 1 Partition 2 Scale In

  8. Elastically Scaling a Partition Based DBMS Partition Migration Node 1 Node 1 Scale Out P1 P2 P1 P2 Node 2 P3 P4 Scale In P3 P4

  9. Partition Migration for Elastic Scalability Mechanism Howto add/remove nodes and move partitions Policy/Strategy Whichpartitions to move whenand where during scale out/scale in

  10. ElascaElastic Scale-Out MechanismPartition Placement & Migration Optimizer = +

  11. VoltDB &Elastic Scale-oUT Mechanism

  12. What is VoltDB? • In memory, partition based DBMS • No disk access = very fast • Shared nothing architecture, serial execution • No locks • Stored procedures • No arbitrary transactions • Replication • Fault tolerance & durability

  13. VoltDB Architecture P1 P2 P3 P2 P1 P3 ES1 ES2 ES1 ES1 ES2 ES2 Initiator Initiator Initiator Threads Client Interface Client Interface Client Interface Client Client Client Client

  14. Single-Partition Transactions P1 P2 P3 P2 P1 P3 ES1 ES2 ES1 ES1 ES2 ES2 Initiator Initiator Initiator Client Interface Client Interface Client Interface Client Client Client Client

  15. Multi-Partition Transactions P1 P2 P2 P3 P3 P1 ES1 ES1 ES2 ES1 ES1 ES2 ES2 Initiator Initiator Initiator Client Interface Client Interface Client Interface Client Client Client Client

  16. Elastic Scale-Out Mechanism Scale-Out Node (Failed) P1 P2 P1 P3 P4 P4 ES1 ES2 ES1 ES3 ES4 ES4 Initiator Initiator Client Interface Client Interface

  17. Overcommitting Cores • VoltDB suggests:Partitions per node < Cores per node • Wasted resources when load is low or data access is skewed Idea Aggregate extra partitions on each node and scale out when load increases

  18. Partition Placement Problem

  19. Given… Cluster and System Specifications Max. Number of Nodes Number of CPU cores Memory

  20. Given…

  21. Given…

  22. Given… Current Partition-to-Node Assignment

  23. Find… Optimal Partition-to-Node Assignment (For Next Time Interval)

  24. Optimization Objectives Maximize Throughput Match the performance of a static, fully provisioned system Minimize Resources Used Use the minimum number of nodes required to meet performance demands

  25. Optimization Objectives Minimize Data Movement Data movement adversely affects system performance and incurs network costs Balance Load Effectively Minimizes the risk of overloading a node during the next time interval

  26. Workload-Aware Optimizer

  27. System Overview

  28. Statistics Collected α.Maximum number of transactions that can be executed on a partition per second • Max capacity of Execution Sites β. CPU overhead of host-level tasks • How much CPU capacity the Initiator uses

  29. Effect of β

  30. Estimating CPU Load CPU Load Generated by Each Partition Average CPU Load of Host-Level Tasks Per Node Average CPU Load Per Node

  31. Optimizer Details • Mathematical Optimization vs. Heuristics • Mixed-Integer Linear Programming (MILP) • Can be solved using any general-purpose solver (we use IBM ILOG CPLEX) • Applicable for wide variety of scenarios

  32. Objective Function Minimizes data movement as primary objective and balances load as secondary objective

  33. Effect of ε

  34. Minimizing Resources Used • Calculate the minimum number of nodes that can handle the load of all the partitions • Non-integer assignment • Explicitly tell optimizer how many nodes to use • If optimizer can’t find solution with minimum nodes, it tries again with N + 1 nodes

  35. Constraints • Replication:Replicas of a given partition must be assigned to different nodes • CPU Capacity:Sum of the load of partitions must be less than capacity of node • Memory Capacity: All the partitions assigned to a node must fit in its memory • Host-Level Tasks:The overhead of host-level tasks must not exceed capacity of single core

  36. Staggering Scale In • Fluctuating workload can result in excessive data movement • Staggering scale in mitigates this problem • Delay scaling in by stime steps • Slightly higher resources used to provide stability

  37. Experimental Evaluation

  38. Optimizers Evaluated • ELASCA: Our workload-aware optimizer • ELASCA-S: ELASCA with staggered scale in • OFFLINE:Offline optimizer that minimizes resources used and data movement • GREEDY:A greedy first-fit optimizer • SCO:Static, fully provisioned system (no optimization)

  39. Benchmarks Used • TPC-C: Modified to make it cleanly partitioned and fit in memory (3.6 GB) • TATP: Telecommunication Application Transaction Processing Benchmark (250 MB) • YCSB: Yahoo! Cloud Serving Benchmark with 50/50 read/write ratio (1 GB)

  40. Dynamic Workloads • Varying the aggregate request rate • Periodic waveforms • Sine, Triangle, Sawtooth • Skewing the data access • Temporal skew • Statistical distributions • Uniform, Normal, Categorical, Zipfian

  41. Temporal Skew

  42. Experimental Setup • Each experiment run for 1 hour • 15 time intervals • Optimizer run every four minutes • Combination of simulation and actual runs • Exact numbers for data movement, resources used and load balance through simulation • Cluster has 4 nodes, 2 separate client machines

  43. Data Movement (TPC-C) Triangle Wave (f = 1)

  44. Data Movement (TPC-C) Triangle Wave (f = 1), Zipfian Skew

  45. Data Movement (TPC-C) Triangle Wave (f = 4)

  46. Computing Resources Saved (TPC-C) Triangle Wave (f = 1)

  47. Load Balance (TPC-C) Triangle Wave (f = 1)

  48. Database Throughput (TPC-C) Sine Wave (f = 2)

  49. Database Throughput (TPC-C) Sine Wave (f = 2), Normal Skew

  50. Database Throughput (TATP) Sine Wave (f = 2)

More Related