290 likes | 495 Views
Multi-Class Latency Bounded Web Services. Vikram Kanodia and Edward Knightly Rice Networks Group http://www.ece.rice.edu/networks. Motivation. Poor end-to-end performance of web traffic. Excessive latencies due to overloaded servers a dominant factor.
E N D
Multi-Class Latency Bounded Web Services Vikram Kanodia and Edward Knightly Rice Networks Group http://www.ece.rice.edu/networks
Motivation • Poor end-to-end performance of web traffic. • Excessive latencies due to overloaded servers a dominant factor. • Present day web servers provide only FCFS. • Need Mechanisms to: • Reduce server latency; and • Control server latency.
Steps Towards Web QoS - 1 • SBAC – Session Based Admission Control [CherkPhaal99]. • Blocks sessions if load above a certain threshold. • Pros: • Prevents server from going into overload. • Cons: • Only ensures better service to all admitted requests. • Cannot ensure that requested service is met.
Steps Towards Web QoS - 2 • Operating system hooks: • Mechanisms to support resource reservation among different domains at OS level. • Resource Containers [BangDrsch99]. • Eclipse/BSD operating system [Silber99]. • Prioritizing incoming requests provides class differentiation [BhattiFried99]. • Distributed server architecture for better throughput [Vpai98].
What is Lacking ? • No mechanism to meet a requests’ targeted delay. • No class based service model: • Multiple user classes. • Each class has a different response time target. • All classes contending for the same resource. • No means of statistically quantifying the service received .
Key Challenges • Net service rate is a complex, unknown function of CPU / disk/ cache behavior. • Very difficult to model a requests’ service demand in terms of low level system resources. • Interaction between requests belonging to different classes difficult to predict a priori. • All present day web QoS schemes coupled tightly with server architecture .
First Cut: Baseline Scheme • Latency targeted service model: • Single user class with a targeted delay to be met by some percentage of all serviced requests. • Goals: • Illustrate an abstraction of the server resources into a simple queuing model. • Highlight key issues for managing multi-class web services. • Use for experimental comparisons.
Baseline Scheme: Problem formulation • Assumption: Stationary and homogeneous arrivals. • Some maximum service rate which satisfies QoS requirements. • All arrival greater than the maximum service rate need to be be blocked. • How to determine the maximum service rate ?
Baseline Scheme: M/M/1 model • Approximate a class’ service by an M/M/1 queue with an unknown service rate. • Abstracts the low level server resources into a virtual server. • Unknown Service rate is given by:
Baseline Scheme: Admission Control • A new request leads to an increase in load to l’. • Delay violation probability under load l’: • If P( D > d*) is greater than the targeted fraction of requests meeting the delay target , block the new request.
Limitations of Baseline Scheme • No support for multiple service classes • M/M/1 models each class as independent of other classes. • Cannot capture inter class interference. • Assumption of independent and exponentially distributed service times is faulty. • Does not account for highly variable service time. • Ignores temporal correlation among different requests for the same document.
Solution • LMAC : Latency Targeted Multi-Class Admission Control • Service model: • A minimum fraction of accepted requests will be serviced within the class delay target. • Mechanism to characterize and control inter-class relationships. • Decouples access control from actual server. architecture or the operating system.
Our Technique: Envelopes • Envelopes: arrival/service rates over intervals of time. • Deterministic [Cruz95] and statistical [QK99,CK00] envelopes are used to manage network QoS. • Envelopes represent net service received in the presence of other concurrent requests being processed by the server at the same time.
What do Envelopes Buy Us ? • A general yet accurate way of describing a class’ service and demand. • A higher level of abstraction of low level system resources. • Capture effects of temporal correlation and high variability in requests and server latencies. • Model relationship among different user classes in a tractable manner.
Measured Based Service Envelope • Envelope is service received versus interval length when backlogged. • Given the number of concurrently backlogged requests: • Compute the request latency mean and variance. • Use gaussian approximation to get the targeted percentile delay.
LMAC Algorithm • Ensure that a arrival maintains the latency target of its own class • Maintain a maximum horizontal distance between the requests and service envelopes less than the targeted latency. • How to ensure that the service of other classes is not disrupted ?
LMAC Algorithm (cont.) • To ensure that other classes do not suffer: • Assume that the new arrival has strict priority over all other requests. • This is a worst case assumption. • For all other classes, the request workload remains the same, but there is a reduction in service.
Simulation Details • Simulations performed using a simulator which approximates the behavior of OS management for CPU, disk, caching etc. • Use a trace generated from the CS departmental server logs at Rice University. • Assume arrival rate is poisson with a given mean rate.
Experiment 1 • Targeted delay of 1 second for 95 percentile of all admitted requests. • Demonstrates overload protection properties similar to SBAC.
Experiment 2 • Single class-single node case. • Baseline scheme does meet its delay target, but is too conservative.
Multi-Class Performance • In the absence of any server level support : • Performance of each class bounded by the most stringent class. • To investigate a true multi-class scenario: • Devise an artificial resource allocation policy.
Experiment 3 (cont.) • Class A: • Arrival rate 300 reqs/sec, target delay .5 sec • Class B: • Arrival rate 200 reqs/sec, target delay 1 sec
Conclusions • Scheme to ensure that a minimum fraction of all accepted requests meet latency targets. • A way to model system resources into a high level server: • Makes our approach general and independent of OS/ server architecture. • Ability to exploit additional features within the server architecture for higher utilization.
Future Work • Address Heterogeneous Content • Content with different service demands , e.g dynamic content. • Perform experiments with additional traces. • Incorporate LMAC into a real server and test its performance.