230 likes | 353 Views
Extension to PerfCenter: A Modeling and Simulation Tool for Datacenter Application. Nikhil R. Ramteke, Advisor: Prof. Varsha Apte, Department of CSA, IISc 27 th May 2011. Multi-tiered Networked Applications. Important Performance Metric Response time Utilization Throughput
E N D
Extension to PerfCenter: A Modeling and Simulation Tool for Datacenter Application Nikhil R. Ramteke, Advisor: Prof. Varsha Apte, Department of CSA, IISc 27th May 2011
Multi-tiered Networked Applications • Important Performance Metric • Response time • Utilization • Throughput • Waiting time • Queue length • Arrival Rate • Blocking probability • Average service time DB server Auth servers web server
“login” Machine 1 Machine 7 Machine 4 Machine 2 Machin 8 Machine 5 Machine 3 Machine 6 Machine 9 Web Server Auth Server DB Server Web Server DB Server Web Server DB Server Web Server Auth Server Flow of a request through such a system
PerfCenter • Performance measurement tool, it builds and solves the system model • It takes the system details as an input and built the system model. • System model is built as a network of queues. • Built model is solved either by simulation or by analytical methods. • Open source, available at: • http://www.cse.iitb.ac.in/perfnet/softperf/cape/home/wosp2008page Fig: PerfCenter tool structure
PerfCenter (Input Language) Host Specification: host machine1[2] ram 1000 cpu count 1 cpu buffer 99999 cpu schedP fcfs cpu speedup 1 disk count 1 disk buffer 99999 disk schedP fcfs disk speedup 1 . . .end Server Specification: server web thread count 1 thread buffer 9999 thread schedP fcfs thread size 0.610 staticsize 100 requestsize 0.5 task node2 task node5 task node9 . . .end
Feature Enhancements to PerfCenter • (Problem Definition) • Among various enhancements possible, our contribution is • following:Memory Model: • Memory can be bottleneck while deploying server on host. • Individual server utilization on a device: • PerfCenter can predict the device utilization of host. • But can not estimate the which server has contribution in what • amount, • This feature enables the user to find bottleneck server quickly • Timeout and Retries: • Aimed at capturing the user behavior like “stop-reload”.
Memory Usage ModelingPerfCenter System Model for memory usage:Servers: • Static size of server • Per thread memory usage • Per request memory usage (increases with queue length) • Host: • RAM size for each host • Input language specification: Per server util = (Static Size + Thread size * total threads + Request size * Avg. Queue length of request queue)/ RAM size Server web staticsize 80 thread size 2 requestsize 2 end Host host1 ram 2000 end • Metrics: • util(host_name:ram) //overall RAM util • util(host_name:server_name:ram) //RAM util by a server
Software Design Changes Required for Memory model and Individual Server Utilization Server R R R R R S3 S1 S2 S2 S1 • Memory Model • Added members static size, thread size, request size to the server class software server, • Added members ram size to host class, • No change required to dynamic statistics calculation in simulation • Use average queue length calculated at the end of simulation • Individual server utilization of host devices: • Must keep track of who is issuing request to device • Class member update: total busy time, utilization variables into software queue class. • Some additional bookkeeping during simulation (per server statistics)
Timeouts and Retries: • Characteristics of real users of server systems • Impatience: users abandon if response is not received within their expected time • Retries: Users often retry just after abandoning a request (E.g. “stop-reload” behavior on Web browser) • This behavior is common in client-server based applications. • Timeout may affect system performance in following ways: • Reduction in Throughput, • Completed requests may have already timed out – need to count successful requests separately, • Utilization may decrease due to less throughput, • Average response time decrease due to increase in request timeouts,
Timeouts and Retries: When request is submitted to an application, one of the following things can happen: Request processing is not aborted immediately, processing goes to completion, but request counted as failed Timeout during service [Badput (B)] Possibility of Retry Server Arrival of request Successfully completed [Goodput (G)] Request does not leave the queue immediately, When it is picked by s/w server then it is counted as failed. Timeout in Buffer [Timeout in buffer Rate (Tb)] Drop [Drop rate (D)]
Timeouts and Retries: (PerfCenter system model) Mean timeout value is taken as an input with certain distribution, timeout value of each request is set according to it. Input language: loadparams timeout distribution_name(distribution_parameters) . . end Eg: loadparams timeout exp(0.5) end
Timeouts and Retries: • (PerfCenter system model: ) • Overall G, B, D and Tb can now be estimated with PerfCenter • as follows, • Output Language: • gput() //overall Goodput • bput() // overall Badput • buffTimeout() // overall Timeout in buffer rate • droprate() // overall drop rate
Timeouts and Retries: • Software Design Changes: • Added members timeout flag, mean timeout in to the • Request class, • Added number of request processed, number of request • timeout in buffer, number of request timed out in service, • Goodput, Badput, drop rate, timeout in buffer rate to the • Scenario simulation class. • No extra events are added.
Validation: • Type of System : Open • Service rate : 100 • Arrival rate : Varied from 10 to 100 • Timeout rate : 10 • Timeout distribution : Exponential • Requests simulated : 1000000 • Number of repetition : 20 • Validation done using sanity checks • Results should follow expected rules and trends • Scenario used for validation: • Input File
Results Fig : RAM utilization v/s Arrival Rate
Results Goodput Decreases More request timed out in buffer Fig : G, B, Tb, D v/s Arrival Rate
Results Utilization curve follows Throughput (G + B) Starts decreasing because more requests are timing out in buffer Fig : Utilization, Throughput v/s Arrival Rate
Results Utilization decreases due to more request time outs Fig : Individual server utilization v/s Arrival Rate
Results Avg. Response time decreases due to timeouts Fig : Average Response Time v/s Arrival Rate
Summary of Work Done • Before Midterm: • Background Study • Queuing theory, • Simulation modeling, • Performance issues of multi-tiered systems, • PerfCenter • After Midterm: • Developed an abstraction, an input language and updated • PerfCenter simulation engine for • Adding memory model, • Updating utilization model for Individual server utilization on device, • Adding Timeout and Retries model.
Conclusion:PerfCenter is performance measurement tool, and can now be used by performance analysts with few more useful features added, most important one being timeouts and retries. • Validated our model using test experiment. Illustrative results shows how PerfCenter can be used for estimating application performance in presence of following features. • Memory model • Individual server utilization • Timeout and retries model. • As results show, this can change data center sizing plans. • Future work: • Predicting G, B, Tb, D for individual queuing systems, • More validation is needed to increase confidence in the tool, • More features need to be added to increase power of the tool.
References: • R.P. Verlekar, V. Apte, PP. Goyal, and B. Aggarwal.Perfcenter: A methodology and tool for performance analysis of application hosting centers.MASCOTS '07: Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, 2007, pages 201—208. • SupriyaMarathe, Varsha Apte, and AkhilaDeshpande.Perfcenter: Perfcenter: A performance modeling tool for application hosting centers.WOSP '08 Proceedings of the 7th international workshop of Software and Performance, 2008 • Kishor~S. Trivedi.Probability and Statistics With Reliability, Queuing, and Computer Science Applications.PHI Learing Private Limited, Eastern Economy edition, 2009.
References: • 4. Averill M. Law and W. David Kelton.Simulation Modeling and Analysis.Tata Mcgraw-Hill, 2000. • 5. Daniel A. Menasce and Virgilio A. F. Almeida.Scaling for E-Business, Technologies, Models, Performance and Capacity Planning.Prentice Hall PTR, 2000. • 6. SupriyaMarathe.Performance Modeling for Distributed Systems.Master's thesis, IIT Bombay, Mumbai, India, June 2008. • 7. PuramNiranjan Kumar.Validation, Defect Resolution and Feature Enhancements of PerfCenter.Master's thesis, IIT Bombay, Mumbai, India, June 2008.