Performance and Availability Models for IaaS Cloud and Their Applications Rahul Ghosh

Performance and Availability Models for IaaS Cloud and Their Applications Rahul Ghosh Duke High Availability Assurance Lab Dept. of Electrical and Computer Engineering Duke University, Durham, NC 27708 www.ee.duke.edu/~rg51 Collaborators: Vijay K. Naik, Murthy Devarakonda (IBM), Kishor S. Trivedi, DongSeong Kim and Francesco Longo (Duke) IBM Student Workshop for Frontiers of Cloud Computing Hawthorne, NY, USA September 10, 2010

Key problems of interest: Characterize cloud services as a function of arrival rate, available capacity, service requirements, and failure properties Apply these characteristics in SLA analysis and management, admission control, cloud capacity planning, cloud economics Approach: Performability (Performance + Availability) analysis We use stochastic interacting stochastic sub-models based approach Lower relative cost of solving the models while covering large parameter space compared to measurement based analysis Introduction Two key quality-of-service measures for IaaS cloud: (1) service availability and (2) provisioning response delay

Novelty of our approach • Single monolithic model vs. interacting sub-models approach • Even with a simple case of 6 physical machines and 1 virtual machine per physical machine, a monolithic model will have 126720 states. • In contrast, our approach of interacting sub-models has only 41 states. Clearly, for a real cloud, a naïve modeling approach will lead to very large analytical model. Solution of such model is practically impossible. Interacting sub-models approach is scalable, tractable and of high fidelity. Also, adding a new feature in an interacting sub-models approach, does not require reconstruction of the entire model. What are the different sub-models? How do they interact?

Main Assumptions All requests are homogenous, where each request is for one virtual machine (VM) with fixed size CPU cores, RAM, disk capacity. We use the term “job” to denote a user request for provisioning a VM. Submitted requests are served in FCFS basis by resource provisioning decision engine (RPDE). If a request can be accepted, it goes to a specific physical machine (PM) for VM provisioning. After getting the VM, the request runs in the cloud and releases the VM when it finishes. To reduce cost of operations, PMs can be grouped into multiple pools. We assume three pools – hot (running with VM instantiated), warm (turned on but VM not instantiated) and cold (turned off). All physical machines (PMs) in a particular type of pool are identical. System model

Provisioning and servicing steps: (i) resource provisioning decision, (ii) VM provisioning and (iii) run-time execution Life-cycle of a job inside a IaaS cloud Provisioning response delay Resource Provisioning Decision Engine VM deployment Provisioning Decision Actual Service Out Arrival Queuing Instantiation Run-time Execution Instance Creation Deploy Admission control Job rejection due to buffer full Job rejection due to insufficient capacity We translate these steps into analytical sub-models

Resource provisioning decision Provisioning response delay Resource Provisioning Decision Engine VM deployment Provisioning Decision Actual Service Out Run-time Execution Arrival Queuing Instantiation Instance Creation Deploy Admission control Job rejection due to buffer full Job rejection due to insufficient capacity

A request is provisioned on a hot PM if pre-instantiated but unassigned VM exists. If none exists, a PM from warm pool is used. If all warm machines are busy, a PM from cold pool is used. Resource provisioning decision

Continuous Time Markov Chain (CTMC) Resource provisioning decision model i = number of jobs in queue, s = pool (hot, warm or cold) Provisioning decision of a single job

Output measures • Job rejection probability due to buffer full (Pblock) • Job rejection probability due to insufficient capacity (Pdrop) • Total job rejection probability (Preject= Pblock+ Pdrop) • Mean queuing delay (E[Tq_dec]) • Mean decision delay (E[Tdecision]) Reward rate based approach (attach a reward rate to each state of Markov chain) Little’s law (connecting mean number in the queue with mean waiting time) 3-stage Coxian distribution

VM provisioning Provisioning response delay Resource Provisioning Decision Engine VM deployment Provisioning Decision Actual Service Out Run-time Execution Arrival Queuing Instantiation Instance Creation Deploy Admission control Job rejection due to buffer full Job rejection due to insufficient capacity

VM provisioning model Hot PM Hot pool Resource Provisioning Decision Engine Warm pool Service out Accepted jobs Running VMs Idle resources in hot machine Cold pool Idle resources in warm machine Idle resources in cold machine

VM provisioning model for each hot PM … Lh is the buffer size and m is max. # VMs that can run simultaneously on a PM 0,0,0 0,1,0 Lh,1,0 … 0,0,1 (Lh-1),1,1 Lh,1,1 … … … … … … … 0,0,(m-1) 0,1,(m-1) (Lh-1),1,(m-1) Lh,1,(m-1) … 0,0,m 1,0,m Lh,0,m i,j,k i = number of jobs in the queue, j = number of VMs being provisioned, k = number of VMs running

VM provisioning model for each warm PM 0,0,0 0,1*,0 Lw,1*,0 … 0,1,0 Lw,1,0 … 0,1**,0 Lw, 1**,0 … … … … … … … 0,0,1 (Lw-1),1,1 Lw,1,1 … 0,0,(m-1) 0,1,(m-1) (Lw-1),1,(m-1) Lw,1,(m-1) … 0,0,m 1,0,m Lw,0,m

Output measures from VM provisioning models • Prob. that a job can be accepted in the hot/warm/cold pool (Ph /Pw /Pc) • Weighted mean queuing delay for VM provisioning (E[Tvm_q]) • Weighted mean provisioning delay (E[Tprov])

Run-time execution Provisioning response delay Resource Provisioning Decision Engine VM deployment Provisioning Decision Actual Service Out Run-time Execution Arrival Queuing Instantiation Instance Creation Deploy Admission control Job rejection due to buffer full Job rejection due to insufficient capacity

Run-time model • Model outputs: Mean job service time / resource holding time

Output measures from pure performance models • All these models are used for pure performance analysis since we do not consider any failure • Output of resource provisioning decision model: • -Job rejection probability due to buffer full (Pblock) • -Job rejection probability due to insufficient capacity (Pdrop) • -Mean queuing delay (E[Tq_dec]) • -Mean decision delay (E[Tdecision]) • Output of VM provisioning models: • -Probability that a atleast one machine in hot /warm/cold pool can accept a job for provisioning • -These probabilities are denoted by Ph, Pw and Pc for hot, warm and cold pool respectively • -Weighted mean queuing delay for VM provisioning (E[Tq_vm]) • -Weighted mean provisioning delay (E[Tprov]) • Output of run-time model: • -Mean job service time • Output of pure performance models • -Total job rejection probability (Preject= Pblock + Pdrop) • -Net mean response delay (E[Tresp]=E[Tq_dec]+E[Tdecision]+E[Tq_vm]+E[Tprov])

Availability model • Model outputs: Probability that the cloud service is available, downtime in minutes per year

Model interactions: Performability

Numerical Results

Effect of increasing job arrival rate

Effect of increasing job service time

Effect of increasing # VMs

Effect of increasing MTTF of a PM

Applications of the models

Admission control Increasing arrival rate increases response delay. Putting more PMs reduces this delay. What is the maximum job arrival rate that can supported by the cloud service?

Response time – energy trade-off Increasing capacity reduces the gap between actual provisioning delay and response delay. What is the optimal # PMs across different pools that minimizes response time for a given energy budget?

SLA driven capacity planning What should be the size of each pool, so that total cost is minimized and SLA (maximum rejection probability or response delay) is upheld?

Recent work on • IaaS cloud resiliency

Resiliency Analysis • Definition of resiliency • Resiliency is the persistence of service delivery that can justifiably be trusted when facing changes* • changes of interest in the context of IaaS cloud • Increase in workload, faultload • Decrease in system capacity • Security attacks • Accidents or disasters • Our contributions: • Quantifying resiliency of IaaS cloud • Resiliency analysis approach using performance analysis models *[1] J. Laprie, “From Dependability to resiliency”, DSN 2008 [2] L. Simoncini, “Resilient Computing: An Engineering Discipline”, IPDPS 2009

Effect of changing demand

Effect of changing capacity

Conclusions • Stochastic model can be an inexpensive alternative to measurement based evaluation of cloud QoS • To reduce the complexity of modeling, we use an interacting sub-model approach • - Overall solution of the model is obtained iteration over individual sub-model solutions • The proposed approach is general and can be applicable to variety of IaaS clouds • Results show that IaaS cloud service quality is affected through variations in workload (job arrival rate, job service rate), faultload (machine failure rate) and available system capacity • This approach can be extended to solve specific cloud problems such as capacity planning of public, private and hybrid clouds • In future, models will be validated using real data collected from cloud

Thanks!

Performance and Availability Models for IaaS Cloud and Their Applications Rahul Ghosh

Performance and Availability Models for IaaS Cloud and Their Applications Rahul Ghosh

Presentation Transcript

Cloud Technologies and Their Applications

Scalable Analytic Models for Cloud Services Rahul Ghosh PhD student, Duke University , USA Research intern, IBM T. J.

Cloud Service Models and Performance

Cloud Technologies and Their Applications

IaaS Cloud Benchmarking: Approaches, Challenges, and Experience

Cloud Resolving Models: Their development and their use in parametrization development

Cloud Models and Platforms

Applications and Models

Providing Performance Guarantees for Cloud Applications

Exactly-solvable Richardson-Gaudin models and their applications *

(IaaS) Cloud Benchmarking: Approaches, Challenges, and Experience

Models and their benefits.

Applications and Models

Scalable Analytic Models for Cloud Services Rahul Ghosh PhD student, Duke University , USA

Providing Performance Guarantees for Cloud Applications

Performance, Availability and Cost Analysis for Cloud Based Services Rahul Ghosh

Cloud Availability and Combatting Downtime

Types Of Cloud Deployment Models and Their Advantages and Disadvantages