340 likes | 525 Views
Feedback performance control in software services. T.F. Abdelzaher, J.A. Stankovic, C. Lu, R. Zhang, and Y. Lu, Feedback Performance Control in Software Services, IEEE Control Systems, 23(3): 74-90, June 2003. . Overview. SW systems become larger and bigger
E N D
Feedback performance control in software services T.F. Abdelzaher, J.A. Stankovic, C. Lu, R. Zhang, and Y. Lu, Feedback Performance Control in Software Services, IEEE Control Systems, 23(3): 74-90, June 2003.
Overview • SW systems become larger and bigger • Performance guarantee required, e.g., in web-based e-commerce • Control theory • Promising theoretical foundation for perf control in complex SW applications, e.g., real-time scheduling, web servers, multimedia control, storage mangers, power management, routing in computer networks, …
Overview • Software performance assurance problems -> Feedback control problems focused on web server performance guarantee problems
SW performance control • Less rigorous guarantees on perf and quality • Most SW eng. research deals with the development of functionally correct SW • Functional correctness is not enough! • Timeliness in embedded systems • Correct but delayed action can be disastrous • Non-fucntional QoS attributes, e.g., timeliness, security, availability, …
Traditional approaches for perf guarantees • Worst case estimates of load & resource availability • Recall EDF, RM, DM, Priority Ceiling Protocol, …
New demand for performance assurance • QoS guarantees required in a broader scope of applications run in open, unpredictable environments • Global communication networks enabling online banking, trading, distance learning, … • Points of massive aggregation suffering unpredictable loads, potential bottlenecks, DoS attacks, … -> Precise workload/system model unknown a priori • Failure to meet QoS requirements -> loss of customers or financial damages • Worst case analysis/overdeisgn could be overly pessimistic or wasteful • Solid analytic framework for cost-effective perf assurance required
Challenges • How to model SW architecture? • How to map a specific QoS problem into a feedback control system? • How to choose proper SW sensors and actuators to monitor and adjust perf and workloads/resource allocation? • How to design controllers for servers? -> This paper focuses on web servers
QoS metrics • Delay metrics • Proportional to time: queuing delays, execution latencies, service response time • Rate metrics • Inversely proportional to time • Connection bandwidth, throughput, packet rate
Time-related perf attributes • Can be controlled by adjusting resource allocation • Queuing theory can predict perf given a particular resource allocation or vice versa • Queuing theory only works for Poisson arrival patterns • Queuing theory can only predict average perf even if this assumption holds • Arrival patterns in web applications follow heavy-tailed distribution -> Bursty arrival patterns
Service architecture Liquid task model Fig. 1 Server architecture: (a) computing model (b) control-oriented representation
Liquid task model • Ci << Di • Takes Ci units of time to serve request i • Di is the max tolerable response time • Tolerable response time is finite • Service times are infinitesimal • Progress of requests through the server queues ≈ Fluid flow • Service rate at stage k = dNk(t)/dt where Nk is #requests processed by stage k
Liquid task model • Volume at time T≈ #requests queued at stage k = ∫T(Fin – Fk) • Fk: service rate at stage k • Fin: request arrival rate to this stage • Valves: points of control, i.e., manipulated variables such as the queue length • Liquid model does not describe how individual requests are prioritized • Control theory can be combined with queuing theory or real-time scheduling
Server modeling • Difference equation to model web servers • y(k): perf, e.g., delay or throughput, measured at the kth sampling period • U(k): control input at the kth sampling period • ARMA (AutoreRressive Moving Average) model • y(k) = a1y(k-1) + a2y(k-2) + … + any(k-n) + b1u(k-1) + b2u(k-2) + … + bnu(k-n) • Transfer function can be derived • Web proxy cache model [4] • TCP dynamics [5]
Resource allocation for QoS guarantees • Allocate more/less resource = open/close a valve • Need actuators to control resource allocation or QoS provided by the system
SW system actuators • Input flow actuators • Admission control • Control queue length, server utilization, … • Reject some requests under overload
SW system actuators • Quality adaptation actuators • Change processing requirements to increase server rate under overload • E.g., Return abbreviated web page under overload • Tradeoff btwn delay & quality • Service level m in a range [0, M] where 0 is rejection
Resource reallocation actuator • Alter the amount of allocated resources • Usually applicable to multiple classes of clients, e.g., dynamically reallocate disk space to support the service delay ratio 1:2 between two service classes [4,7]
QoS Mapping • Convert common resource management & SW perf assurance problems to FC problems • Absolute convergence guarantee • Relative guarantee • Resource reservation guarantee • Prioritization guarantee • Statistical multiplexing guarantee • Utility optimization guarantee
Absolute convergence guarantee • Convergence to the specified problem • Overshoot: Maximum deviation • Settling time: Time taken to recover the desired perf
Absolute convergence guarantee • Rate & queue length control • Result in linear FC • (Flow) rate can be directly controlled by actuators • Queue length can be linearly controlled by controlling the flow • E.g., server utilization control loop
Absolute convergence guarantee • Delay control • More difficult • Delay is inversely proportional to flow • Queuing delay d = Q/r where Q is queue length & r is service rate • Nonlinear
Relative guarantee • For example, fix the delays of two traffic classes at a ratio 3:1 • Hi: measured perf of class i • Ci: weight of class i • Relative guarantee specifies H1:H2 = 1:3 • Set point = 1/3 • Error e = 1/3 – H1/H2
Relative guarantee in Apache web server • Controlled variable: relative delay ratio • Manipulated variable: #allocated processes per class to control connection delay • HTTP protocol summary • A client, e.g., web browser establishes a TCP connection with a server process • The client submits an HTTP request to the sever over the TCP connection • The server sends the response back to the client • Keep open the TCP connection for the Keep Alive interval, e.g., 15s -> Claim connection delay dominates service response time -> Scheduling can also significantly relative delay ratio, but it is not considered
Relative guarantee in Apache web server • System identification based on the ARMA model • Randomly change per class process allocations • Measure response time
Relative guarantee in Apache web server • Perf settings • 4 Linux machines run the Surge web workload generator • 1 Linux machine runs the Apache web server • Suddenly increase #premium clients by 100 at time 870s
Relative guarantee in Apache web server • Perf results Open Loop Stable? Closed Loop
Related work • ControlWare • CPU scheduling • Storage management • Network routers • Power/heat management • RTDB
Conclusions • Feedback control is applicable to managing performance in SW systems • Future work • Adaptive/robust control • Predictive control • Apply to other computational systems such as embedded systems
Adptive Control: Self-Tuning Regulator • Dynamically estimate a model of the system via the Recursive Least Square method • Controller will accordingly set the actuators to support the desired perf.
References (HP Storage Systems Lab) • Designing controllable computer systems, Christos Karamanolis, Magnus Karlsson and Xiaoyun Zhu. USENIX Workshop on Hot Topics in Operating Systems (HotOS), June 2005, pp. 49-54, Santa Fe, NM. • Dynamic black-box performance model estimation for self-tuning regulators, Magnus Karlsson and Michele Covell. International Conference on Autonomic Computing (ICAC), pp. 172-182, June 2005, Seattle, WA.
IBM Autonomic Computing Lab • http://www.research.ibm.com/autonomic/index.html • General, broader research issues regarding self-tuning, self-managing systems • Also, visit Joe Hellerstein’s Adaptive Systems Department
Some University Labs • Tarek Abdelzaher: http://www.cs.uiuc.edu/homes/zaher/ • Chenyang Lu: http://www.cse.wustl.edu/~lu/
Announcement • Programming Assignment 1 is posted on the course web page