270 likes | 449 Views
Automated, Elastic Resource Provisioning for NoSQL clusters Using TIRAMOLA. Ioannis Konstantinou CSLAB, National Technical University of Athens, Greece. Motivation – the story(1). ‘Big-data’ opts for highly scalable distributed solutions (Web) analytics, science, business
E N D
Automated, Elastic Resource Provisioning for NoSQL clusters Using TIRAMOLA IoannisKonstantinou CSLAB, National Technical University of Athens, Greece
Motivation – the story(1) • ‘Big-data’ opts for highly scalable distributed solutions • (Web) analytics, science, business • Store + analyze everything, no matter the size • Traditional databases not up to the task • Applications suffer from highly variable and unpredictable workloads • Social networks, internet gaming, web sites, etc • Over-provisioning is costly, under-provisioning leads to outages • Cloud computing can help!!!! • Elastic, pay-as-you go resource provisioning • Suitable for distributed scalable applications • Can we take advantage of elasticity in an automated, fine-tuned application agnostic manner? Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
Motivation – the story (2) • NoSQL • Non-relational • Horizontal scalable • Distributed • Open source • And often: • schema-free, easily replicated, simple API, eventually consistent /(not ACID), big-data-friendly, etc • Many, many, implementations… • Currently around 150 see http://nosql-database.org/ Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
NoSQLs and elasticity • Column family • Hbase, Cassandra, … • Document store • CouchDB, mongoDB, … • Key-Value store • Riak, Dynamo, Voldemort, … • Many offer elasticity+sharding: • Expand/contract resources according to demand • Shared-nothing architecture allows that • Pay-as-you-go, robustness, performance • Manual and simple threshold-based elastic actions are suboptimal • Important! See Apr 2011 Amazon outage (foursquare, reddit,…) Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
thus…(end of the story) • PaaS and NoSQLs are (or should be) inherently elastic • How efficiently do they implement elasticity? • NoSQLs over an IaaS platform • EC2, Eucalyptus, OpenStack,… • We need a study that registers qualitative + quantitative results • Can we automate the elasticity procedure? • For any NoSQL, any user-given policy, any metric of interest? • Can we bundle everything together? • The Tiramola system Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
Contributions • Tiramola, a generic modular system able to manage • anyNoSQL engine • User-defined policies • Automatic resource provisioning using IaaS clouds • Module for NoSQL cluster monitoring • Real time reporting of client, application and general purpose metrics • Decision making module for automatic elasticity • Implementation as a Markov Decision Process • Continuously identifies the best scaling action according to observed system state Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
Contribution side-effects • Coding + infrastructure • Open source python code (GFOSS + google code) • http://tiramola.googlecode.com • Using cloud-based client tools, platform-agnostic • Euca2ools guarantee execution in numerous cloud platforms • YCSB clients • Cassandra, Hbase implementation • almost Voldemort, Riak Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
Related work • Elastic Nimbus [13], AutoScale [14], SmartSla [15], iBalloon [17], Kingfisher [18] • Do not handle NoSQL systems • Do not address dynamic cluster resizes • [19] elastic HDFS • Specific HDFS implementation • Not fine-grained policy support, No support for auto-learning • Cloudy [20], Nefeli [21], Microeconomics [23] • [20] Provides only support for scaling • [21] requires software installation at the cloud vendor • [22] classic microeconomics for profit maximization, no support for fine-grained policy definitions • Systems like Autoscaling [24], RightScale [26], Scalr [27] • Commercial: vendor-lock in • Limited metric support, primitive decision making policies Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
Tiramola architecture overview Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
Tiramola monitoring module • Ganglia tool • Suitable for clusters • Easy to install/maintain • Low overhead- UDP • Metrics • General purpose like CPU, RAM • App-specific and user metrics: gmetric spoofing Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
Tiramola cloud management module • Translates resize actions • Contacts IaaS provider • Acquires or releases cloud resources • Euca2tools • EC2 compliant • Support for any cloud management • Precooked VM images • Ready to launch AMIs • Contain all necessary NoSQL libraries Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
Tiramola cluster coordinator module • Operates when cloud management finishes resource (de-)allocation • Orchestrates NoSQL • When resources change • Implements specific NoSQL resizing actions • Remote execution of shell scripts • Injection of new config files • Current support for: • HBase, Cassandra, Riak Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
Tiramola core: Decision-Making module • Formulation as an MDP: • {S, A, {Piαj}, γ,Riαj} • Identify • Exp. Sum of rew. • State = # running VMs • Actions = {add-n, rem-n, no-op} • P: Transition probabilities going from state i to j • Reward Function R – sets the policy for the resize • Immediate gain for going to state s • R(s) = f(gains, costs) • Thus optim. Value f: (Bellman’s equation) • System of equations, optimal policy greedy w.r.t. V Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
Large Degree of freedom • MDP allows for optimal solution without previous knowledge • The system learns from previous experience, by “exploring” permissible states • Learning is real-time and continues during execution • Decision making is auto-tuned, reacting to environment and reward changes. • Generic applicable • Arbitrary reward functions allow for virtually any optimization policy Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
Estimate R(s) for possible transitions • How to know exact R(s) for all s, without making the transition? • Assume “deterministic”, reliable cluster behavior • Similar input ⇨ similar performance metrics • Idea: cluster measurements around current load • r(s)=f(latency, VMs) • Add 2 extra dimensions: #VMs, load • Find, for all permissible transitions, what latency would be Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
Architectural considerations • Robustness • Daemon process that checkpoints and can be restarted • State is provided from the IaaS Cloud and the Monitoring module. • Applicable timeouts (not realtime systems!) • Modularity • Different interchangeable components • APIs that utilize primitives (NoSQL and Policies) • Expandability • Speed (irrelevant in most cases) • Written in Python Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
Platform Setup • Cloud Cluster • Private OpenStack cactus cluster • A total of 16 server VMs • Similar to an Amazon EC2 large instance • 4-core processor, 8GB RAM, 50GB disk space • A total of 20 client VMs • 2-core processor, 4GB RAM, 50GB disk space • Storage • QCOW image: 1.6GB compressed, 4.3GB uncompressed • VM root fs instead of EBS (Reddit outage) Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
Clients, Data and Workloads • Hbase (v. 0.20.6), Cassandra (v. 0.7.0 beta) • Hadoop 0.2.20 • 8 initial nodes (VMs) • Ganglia 3.1.2 • YCSB tool • Database: 20M objects – 20GB raw (Cass ~60GB, Hbase ~90GB) • Loads: UNI_R, UNI_U, UNI_RMW, ZIPF_R • Default: uniform read, 10%-50% range • Both client (YCSB) and cluster (Ganglia) metrics reported Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
1. Metrics affected • Max throughput • HBaseμmax=80Kreqs/sec @ λ=80Kreqs/sec • Cassandra μmax=13Kreqs/sec @ λ=20Kreqs/sec • Max total cluster CPU usage • HBaseCPUmax=55% • Cassandra CPUmax=76% • Further λ increase has no effect • Systems are fully utilized and requests are queued • Stress systems by increasing client request λ • UNI_R workload • Identify critical operation points • 8-node cluster, no resize Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
The setup • λstart=180K reqs/sec (way over critical point) • At time t=370sec: • Double the cluster size (add extra 8 nodes) • Triple the cluster size (add extra 16 nodes) • 4 different experiments • READ+8, READ+16, UPDATE+8 and UPDATE+16 • Measure client, cluster-side metrics • query latency, throughput and total cluster usage • Vs time Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
2. HBase cluster resize • For READ, throughput is (roughly) doubled and tripled • More servers handle more requests • Data is not transferred, but it is cached • Update is not affected • I/O bound operation • Updates will be handled by the initial 8 nodes • Only new regions due to compaction will be served by new nodes Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
Evaluating Tiramola – setup • Initial state: S4, 1 to 16 VMs cluster size allowed • Sinusoid-like READ loads – vary peak, periodicity • Alter YCSB clients • Reward functions (proof of concept) • r1(s) = -C∙|VMs| • r2(s) = B ∙ thr • r3(s) = B ∙ thr - C ∙ |VM|2 • r4(s) = B ∙ thr - C ∙ |VMs| - A ∙ lat. • Monitor every min, decide every 10 min, 5 min backoff • Training Period for initial data points • Not necessary, but allows quicker “correct” decisions Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
Evaluating Tiramola – 1 • r1, r2 • r3, r4 Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
Evaluating Tiramola – 2 • Different amplitude • Different periodicity Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
Evaluating Tiramola – 3 • Initial training set close to applied load • Initial training set very different from applied load Max training Min training Max training Min training Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
Questions ? • “TIRAMOLA: Elastic NoSQL Provisioning through A Cloud Management Platform” – SIGMOD 2012(Demo Track) • “Automatic Scaling of Selective SPARQL Joins Using the TIRAMOLA System” – SWIM 2012 • “On the Elasticity of NoSQL Databases over Cloud Management Platforms” – CIKM 2011 • “Elastic NoSQL databases over the Cloud” – ΕΛ/ΛΑΚ 2011 • CELAR: “Automatic, multi-grained elasticity-provisioning for the Cloud” – FP7 • http://www.celarcloud.eu/ • http://tiramola.googlecode.com Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou