370 likes | 561 Views
Dynamic Resource Management in Internet Data Centers. Prashant Shenoy University of Massachusetts. Motivation. Internet applications used in a variety of domains Online banking, online brokerage, online music store, e-commerce Internet usage continues to grow rapidly
E N D
Dynamic Resource Management in Internet Data Centers Prashant Shenoy University of Massachusetts
Motivation • Internet applications used in a variety of domains • Online banking, online brokerage, online music store, e-commerce • Internet usage continues to grow rapidly • Broadband deployment is accelerating • Outages of Internet applications more common “Site not responding” “connection timed out”
Internet Application Outages Holiday Shopping Season 2000: Down for 30 minutes Periodic outages over 4 days Average download time ~ 260 sec 9/11: site inaccessible for brief periods Cause: Too many users leading to overload
Internet Workloads are highly variable Soccer World Cup’98 • Short-term fluctuations • “Slashdot Effect” • Flash Crowds • Long-term seasonal effects • Time-of-day, month-of-year • Peak difficult to predict • Static overprovisoning not effective • Manual allocation: slow Key Issue: How can we design applications to handle large workload variations?
Internet Data Centers • Internet applications run on data centers • Server farms • Provide computational and storage resources • Applications share data center resources • Problem: How should the platform allocate resources to absorb workload variations?
Talk Outline • Motivation • Internet data center model • Dynamic provisioning • Request Policing • Cataclysm Server Platform • Experimental results • Summary
Data Center Model Retail Web site streaming • Dedicated hosting: each application runs on a subset of servers in the data center • Subsets are mutually exclusive: no server sharing • Data center hosts multiple applications • Free server pool: unused servers
requests Load balancing sentry database http J2EE Internet Application Model • Internet applications: multiple tiers • Example: 3 tiers: HTTP, J2EE app server, database • Replicable applications • Individual tiers: partially or fully replicable • Example: clustered HTTP, J2EE server, shared-nothing db • Each application employs a sentry • Each tier uses a dispatcher: load balancing
Approach • Dynamic provisioning • Allocate servers to applications on-the-fly • Request policing • Turn away excess requests • Degrade performance based on SLA • Couple provisioning and policing
Research Questions • How many servers to allocate and when? • Multi-tier apps: when and how to provision each tier? • How many requests should be turned away during overload? • Multi-tier apps: where should requests be dropped? • Can we meet SLAs during overloads? • Is it possible to predict future workloads?
Dynamic Provisioning • Key idea: increase or decrease allocated servers to handle workload fluctuations • Monitor incoming workload • Compute current or future demand • Match number of allocated servers to demand Monitor workload Compute current/ future demand Adjust allocation
10 10 14 14 req/s C=10.1 C=10 C=15 dropped 4 req/s Single-tier Provisioning • Single tier provisioning well studied [Muse, TACT] • Non-trivial to extend to multiple-tiers • Strawman #1: use single-tier provisioning independently at each tier • Problem: independent tier provisioning may not increase goodput
C=10.1 C=15 Single-tier Provisioning • Single tier provisioning well studied [Muse, TACT] • Non-trivial to extend to multiple-tiers • Strawman #1: use single-tier provisioning independently at each tier • Problem: independent tier provisioning may not increase goodput 10.1 14 14 14 req/s C=20 dropped 3.9 req/s
10.1 14 14 14 req/s C=10.1 C=20 C=15 Model-based Provisioning • Black box approach • Treat application as a black box • Measure response time from outside • Increase allocation if response time > SLA • Use a model to determine how much to allocate • Strawman #2: use black box for multi-tier apps • Problems: • Unclear which tier needs more capacity • May not increase goodput if bottleneck tier is not replicable
Provisioning Multi-tier Apps • Approach: holistic view of multi-tier application • Determine tier-specific capacity independently • Allocate capacity by looking at all tiers (and other apps) • Predictive provisioning • Long-term provisioning: time scale of hours • Maintain long-term workload statistics • Predict and provisioning for the next few hours • Reactive provisioning • Short term provisioning: time scale of several minutes • React to “current” workload trends • Correct errors of long-term provisioning • Handle flash crowds (inherently unpredictable)
Workload Prediction • Long term workload monitoring and prediction • Monitor workload for multiple days • Maintain a histogram for each hour of the day • Capture time of day effects • Forecast based on • Observed workload for that hour in the past • Observed workload for the past few hours of the current day • Predict a high percentile of expected workload Mon Tue Wed Today
G/G/1 G/G/1 G/G/1 Predictive Provisioning • Queuing theoretic application model • Each individual server is a G/G/1 queue • Derive per-tier E(r) from end-to-end SLA • Monitor other parameters and determine l (per-server capacity) • Use predicted workload lpred to determine # servers per tier • Assumes perfect load balancing in each tier • Alternative: each tier G/G/k lpred
Reactive Provisioning lactual Prediction error Invoke reactor allocate servers > t lerror • Idea: react to current conditions • Useful for capturing significant short-term fluctuations • Can correct errors in predictions • Track error between long-term predictions and actual • Allocate additional servers if error exceeds a threshold • Account for prediction errors • Can be invoked if request drop rate exceeds a threshold • Handles sudden flash crowds • Operates over time scale of a few minutes • Pure reactive provisioning: lags workload • Reactive + predictive more effective! lpred time series
Talk Outline • Motivation • Internet data center model • Dynamic provisioning • Request Policing • Cataclysm Server Platform • Experimental results • Summary
Sentry policing G/G/1 G/G/1 G/G/1 drop Request Policing • Key Idea: If incoming req. rate > current capacity • Turn away excess requests • Degrade performance of requests • Why police when you can provision? • Provisioning is not instantaneous • Residual sessions on reallocated server • Application and OS installation and configuration overheads • Overhead of several (5-30) minutes
Sentry policing drop Class-based Differentiation • Some requests are more important than others • Purchase versus catalog browsing • Stock trade versus view account balance • Overload => preferentially let in more important requests • Maximize utility during overload • Incoming requests queued up in class queues • Example: gold, silver, bronze class • Higher priority to more important classes
Scalable Policing Techniques • Examining individual requests infeasible • Incoming rate may be order of magnitude greater than capacity • Need to reduce overhead of policing decisions • Idea #1: Batch processing • Premise: Requests arrivals are bursty • Admit a batch of queued up requests • One admission control test per batch • Reduces overhead from O(n) to O(b) • Idea #2: Use pre-computed thresholds • Example: capacity = 100 req/s, G=75, S=50, B=50 req/s • Admit all gold, half of silver and no broze • Periodically estimate l and s: compute threshold • O(1) overhead: trades accuracy for efficiency
Cataclysm Server Platform • Prototype data center • Commodity hardware • 40+ Pentium servers • 2 TB of RAID arrays • Gigabit switches • Linux-based platform
Apps Apps Apps Nucleus Nucleus Nucleus OS OS OS Cataclysm Software Architecture • Two key components: control plane and nuclei Server Node Runs apps, sentries Resource monitoring, Local allocation Provisioning Global allocation App placement Cataclysm Control Plane
Capsule Capsule Capsule Capsule VM VM VM QLinux Nucleus Cataclysm Node Architecture Dormant Active • Capsule: component of an app on a node • Qlinux: proportional-sharing of node resources • Nucleus: resource allocations across capsules and VMs UML Xen HSFQ CPU scheduler Prop-share packet sched Cello disk scheduler SFVM memory mgr Nucleus QLinux
Apache Load bal ktcpvs mysql Apache JBOSS police Cataclysm Applications • Multi-tiered apps: Rubis (e-auctions), Rubbos (b-board) • Apache, JBOSS, mysql • Tier-1 Sentry • Ktcpvs: kernel HTTP load balancer • Request policing and class-based differentiation • Workload monitoring • Tier-2 sentry: Apache JBOSS redirector, workload monitoring • Nuclues: Linux trace toolkit, /proc to monitor node statistics • All system components are replicable!
Talk Outline • Motivation • Internet data center model • Dynamic provisioning • Request Policing • Cataclysm Server Platform • Experimental results • Summary
Dynamic Provisioning • RuBiS: E-auction application like Ebay Server Allocation adapts to changing workload Workload Server Allocation
Fraction admitted Arrival rate 1.2 250 1 200 0.8 GLD GLD 150 Fraction admitted Arrival rate 0.6 SIL SIL 100 BRZ BRZ 0.4 50 0.2 0 0 0 100 200 300 400 500 600 0 100 200 300 400 500 600 Time (sec) Time (sec) Class-based differentiation
Scalability 100 80 60 Batch CPU usage Thresh 40 20 0 0 5000 10000 15000 20000 Arrival rate Threshold-based: higher scalability
Other Research Results • OS Resource Allocation • Qlinux [ACM MM00], SFS [OSDI00], DFS [RTAS02] • SHARC cluster-based prop. sharing [TPDS03] • Shared hosting provisioning • Measurement-based [IWQOS02], Queuing-based [Sigmetrics03,IWQOS03] • Provisioning granularity [Self-manage 03] • Application placement [PDCS 2004] • Profiling and Overbooking [OSDI02] • Storage issues • iSCSI vs NFS [FAST03], Policy-managed [TR03]
Glimpse of Other Projects • Hyperion: Network processor based measurement platform • Measurement in the backbone and at the edge • NP-based measurements in the data center • RiSE: Rich Sensor Environments • Video sensor networks • Robotics sensor networks • Real-time sensor networks • Weather sensors
Concluding Remarks • Internet applications see varying workloads • Handle workload dynamics by • Dynamic capacity provisioning • Request Policing • Need to account for multi-tiered applications • Joint work: Bhuvan Urgaonkar, Abhishek Chandra and Vijay Sundaram • More at http://lass.cs.umass.edu
Predictive Provisioning • Invoked once every hour • Captures long-term variations - time of day effects • Extensions to seasonal effects (month-of-year, holidays) • How to initialize? • Needs several days of history to work well • What happens if no servers are available? • Use revenue/utility to arbitrate allocation [Muse] • Turn away excess requests • Non-replicable tiers are easy to handle • Provision other tiers until non-replicable tier is saturated
Degrade or Drop? • Depends on the application and the SLA • Degrading increases effective capacity • Also degrades performance seen by requests • Degrade if • Utility from servicing more requests at lower performance > • Utility from servicing fewer requests - penalty of dropping requests • Otherwise drop requests SLA:
Use of Virtual Machine Monitors • Server allocation can be slow (~ 5-20+ minutes) • Need residual sessions to terminate • Disk scrubbing, OS and app installation, configuration • Application and system overheads • Flash crowds => need fast allocation • Use virtual machines • Each app runs inside a VM, multiple VMs on a server • Only one VM is active at any time, other VMs are “hot spares” • Server allocation => idle one VM, activate another • System overhead reduces to < 1s • Need to still account for residual sessions • Application issue, not longer a system limitation
Response time Admission rate 5000 250 4000 200 GLD GLD 3000 150 SIL 95th resp time (msec) Admission rate SIL 2000 100 BRZ BRZ 1000 50 0 0 0 100 200 300 400 500 0 100 200 300 400 500 Time (sec) Time (sec) Threshold-based: loss of accuracy