470 likes | 631 Views
Fair Queuing for Aggregated Multiple Links. Josep M. Blanquer and Banu Özden Proceedings of the ACM SIGCOMM , August 2001. ABSTRACT. Fair Queuing algorithms Proportionally sharing a single server among competing flows Do not address the problem of sharing multiple servers .
E N D
Fair Queuing for Aggregated Multiple Links Josep M. Blanquer and Banu Özden Proceedings of the ACM SIGCOMM, August 2001
ABSTRACT • Fair Queuing algorithms • Proportionally sharing a single server among competing flows • Do not address the problem of sharing multiple servers. • Multiserverapplications • Link aggregation • Multiprocessors • Multi-path storage I/O
We introduce a new service discipline for multi-server systems, MSF2Q, that provides guarantees for competing flows. • We prove that this new service discipline is a close approximation of the idealized Generalized Processor Sharing (GPS) discipline. • We calculate its maximum packet delay and service discrepancy with respect to GPS.
1. INTRODUCTION • A large increase in networked services a much larger variety of trafficdifferentnetwork requirements to be met simultaneouslyover the same links. • High bandwidth guarantee backups low jitter guaranteesvideo streaminglow delay guarantees network data acquisition • Network resources must be appropriately scheduled.
FairQueuing service disciplines allocates bandwidth fairly among competing traffic. • Protection from “misbehaving” traffic • Effective congestioncontrol • Better services for rate-adaptive applications • Strict QoS guarantees, with admission control.
Growing demand for bandwidth Incremental scaling techniques Grouping multiplelinks into a single logical interface [3] • Implementations • [1] 3Com’s Dynamic Access • [2] Adaptec Duralink Software Suite • [12] Hewlett Packard’s Auto-Port Aggregation • [14] Intel Load Balancing • [6] J. Blanquer, al. et. Resource Management for QoS in Eclipse/BSD, Proceedings of the First FreeBSD Conference, Berkeley, California, Oct. 1999.
2. BACKGROUND • GPS (Generalized Processor Sharing) • Guaranteed fairness • Wx(τ, t) = the amount of traffic for flow x served in the interval [τ, t], while any flow x that is continuously backlogged during [τ, t]. • ψx=weight of flow x =proportion of the server bandwidth that flow x receives when it is backlogged. • Guaranteed rate: • ri = rate of flow ir = server rate
Generalized Processor Sharing (GPS) • An idealized system that serves as a reference model for the fair queuing disciplines. • The server transmits more than one flow simultaneously and that the traffic is infinitely divisible. • A number of packetized approximations to GPS have been devised. • WFQ (Weighted Fair Queueing) ’89 Demers et al. • VC (Virtual Clock) ’90 Zhang • GPS (General Processor Sharing) ’93 Parekh et al. • SCFQ (Self-Clocked Fair Queueing) ’94 Golestani • WF2Q (Worst-case Fair Weighted Fair Queueing) ’96 Bennett et al. • SFQ (Start Time Fair Queueing) ’96 Goyal et al.
* A New Priority Calculation Method for Sorted-priority Fair Queuing – Liu et al., 2004 B. Current packet priority calculation methods • Three best known packet prioritycalculation methods are [9] • Smallest Finish time First (SFF) • Packet selection: PiX(t) + li/I (li= packet length) • WFQ and SCFQ • Smallest Start time First (SSF) • Packet selection: PiX(t) • SFQ • Smallest EligibleFinish time First (SEFF) • Pre-selection: sessions with session potentialssmaller than the system potential. • Packet selection: (SFF) PiX(t) + li/i • WF2Q
3. PROPORTIONAL SHARING OF MULTISERVER SYSTEMS • Numerous applications utilizing multi-server systems that can benefit from service guarantees: • Network: Multiple network adapters to a web or file server • Storage: Multiple I/O channels to a RAID server
(MSFQ, N, r) • System Model WFQ
(GPS, 1, Nr) WFQ
3.1 A Packetized Fair Queuing Discipline for Multi-Servers • MSFQ’s Scheduling discipline is the same asGPS: • When a server is idle and there is a packet waiting for service, MSFQ schedules the “next” packet. • The “next” packet is defined as the first packet that wouldcomplete service in the (GPS, 1,Nr) system if no more packets were to arrive. • To compare how well a (MSFQ ,N, r) system approximates a (GPS, 1,Nr) system, calculate: (i) the worst case delay (ii) the trafficdiscrepancy
3.2 Preliminary Properties • Delay and service properties of MSFQdo not trivially follow from the single server case, WFQ. • GPS and MSFQ busy periods do not coincide. Nr Finish TimeΔ1 = L / Nr (GPS, 1,Nr) Bits left= L – [r * (L/Nr)] = L – (L/N) = (N-1)L / N r r (MSFQ ,N, r) … r Finish Time Δ2 = L / r τ W(0, τ) ≥ W’(0, τ)
When GPS is busy, MSFQ is busy. However, the converse is not true. • Thus for any τ ,W(0, τ) ≥ W’(0, τ), (2)where W(0, τ) and W’ (0, τ) denote the total number of bits serviced by GPS and MSFQ , respectively, by time τ. • We will use the term busy periodto refer to a busy period in the reference (GPS, 1,Nr) system.
1 2 3 4 5 6 7 • Work from previousbusy periods can accumulate under MSFQ. • This may happen either at the beginning or in the middle of a busy period. Arrival Time Delayed Finish Service Time
1 2 3 4 5 6 7 Arrival Time Delayed Start Service Time
Theorem 1: For any τ, W(0, τ) − W’ (0, τ) ≤(N − 1) Lmaxwhere Lmaxdenote the maximum packet length. • Proof: • The slope of W(GPS) alternates between Nr(when a busy period resumes) and 0 (idle, between two consecutive busy periods). • The slope of W’ (MSFQ) is at mostNrat any given time,
Assume 3 servers W(0, t) GPS Slope = 0 or nr MSFQ Slope = r, 2r, 3r t 0 a1 a2 a3 a4 a5 a6 a7 a8 a9 t0 t0
[Case 1] At most N − 1 MSFQ servers are busy at t: • Since MSFQ is work-conserving, if a server is idle, we know that there is no packet waiting for transmission. • In the worst case, all the k busy servers have just started transmitting a packet of maximum length (Lmax). W(0, t) − W’ (0, t) ≤ k Lmax (a) where k = N – 1
GPS server • Slope = Nr W(to, t) • all MSFQ servers are busy • Slope = Nr W’(to, t) W(to, t)W’(to, t) 0 t0 t • [Case 2] All MSFQ servers are busy at t: • Let [to, t] be the largest interval in which all MSFQ servers are busy. • Since in [to, t] the slope of W’ is Nr ,W(0, t) − W’(0, t) ≤W(0, to) − W’(0, to) (b)
W(0, t)=W’(0, t) t0 = 0 t • If to= 0, then W(0, t) = W’(0, t).Otherwise, if to > 0, we know from (a), W(0, to) − W’(0, to) ≤(N − 1) Lmax (c) • From (b) and (c), we have W(0, τ) − W’ (0, τ) ≤(N − 1) Lmax • This theorem implies the need for a buffer space of (N − 1) Lmax.
The discrepancy of packet departure times (i.e. begin transmitting/servicing) between multi-server and single-server • Letdpbe the time at which packet p departs from (GPS, 1,Nr) system. • MSFQ packets may not departin increasing order of dp.
Lemma 1:Packet k will be scheduled no later than: where akand bk be respectively the arrival time and scheduling time of packet kover Nservers, each with a rate of r, Pbe the set of packets scheduledbefore packet k since time ak, including the packets in service at ak, Libe the length of packet i.
Packet arrivals from all flows ak bk • Proof: • Given a load that must be scheduled before packet k, a work conserving service discipline schedules packet k latest, if the load is equally divided among the N servers such that all of them finish the work at the same time.
4. PACKET DELAY • Theorem 2: For all packets p, wheredp’ and dpbe the time at which packet pdeparts from the (MSFQ,N, r) and (GPS,1, Nr)system, respectively. • Proof: • Skipped
5. SERVICE PER-FLOW • Theorem 3: For any τ , Wi(0, τ) − Wi’(0, τ) ≤ NLmax • Proof: • Skipped
6. FAIRNESS • Example 3: • 4 servers: • 11 flows: (fixed packet length) • F1: Weight = 0.5, 10 packets at t = 0 • F2 ~ F11: Weight = 0.05, each with 1 packet at t = 0
GPS Scheduled by WFQ ( finish time): F1A = 0 + L / 0.5 F1B = F1A + L / 0.5 = 2L / 0.5 …… F2 = 0 + L / 0.05 F3 = 0 + L / 0.05 ……
GPS Scheduled by WF2Q(eligible start time (HOL) + finish time): * Not Smooth? ?
The direct application of WF2Q technique to multi-server systems does not fix the undesired burstiness problem and moreover, it makes the discipline non-workconserving. Not eligibleuntil the previous pkt is scheduled non-workconserving
6.1 MSF2Q • (MSF2Q,N, r) • A packet is outstanding if it is being transmitted. • Let ôi(t) denote the number of outstanding flow ipackets at the MSF2Q system at time t. • Ŵi(τ, t) = the work completed for flow i under MSF2Q over the interval [τ, t]
At time t, when a server is idle and there is a packet waiting for service, MSF2Q schedules among the flows (eligible) that satisfyor [ and ] • That would complete service in the GPS system earliest Example 3: F1: r1 = 0.5 F2~F10: rx = 0.05 r = 1/4 = 0.25 ô1 = 0.5/0.25 = 2 ôx = 0.05/0.25 = 1
The output of MSF2Q in Example 3: * Smooth scheduling Example 3: F1: r1 = 0.5 F2~F10: rx = 0.05 r = 1/4 = 0.25 ô1 = 0.5/0.25 = 2 ôx = 0.05/0.25 = 1
6.2 Properties of MSF2Q • Theorem 4: Let Li,maxdenote the maximum packet length of flowi. For any time τand flow i, the following property holds:(8) • Proof: • Skipped
7. APPLICATIONS • Link Aggregation • Logicalgrouping of several Ethernetnetwork interfaces to allow for cost-effective, load balancing, better scalability, and fault-tolerance. • IEEE 802.3ad • Currently ranges from two to eight Fast/Gigabit Ethernet ports in either servers or switching elements.
Access of storage I/O • To connect the RAID system to a host (e.g., Web server) with multiple SCSI or Fiber Channels to improve the I/O performance. • Load balancing, failover
8. RELATEDWORK • Skipped
9. CONTRIBUTIONS AND FUTUREWORK • Link aggregation, or the aggregation of multiple interfaces into a single logical link, is becoming the predominant approach for bandwidth scaling. • Numerous fair queuing results previously obtained for single server systems do not directly apply to multi-server systems.
We first analyzed the cumulative service, packet delay and per-flow cumulative service bounds for Weighted Fair Queuing (WFQ) applied to a multi-server system. • We then presented a new fair queuing algorithm - MSF2Q that leads to smooth and fair schedules in finer time scales.
Our future plans include: • Investigation of implementationissues • Quantitative comparison of the approach presented in this paper to the alternative approach of partitioning flows among servers • Enhancing the algorithms for multiprocessorsand cluster of servers • Hierarchal GPS • Servers with different rates • Misorderingof packets