Design Philosophy of Scheduler

Design Philosophy of Scheduler • Design a very general and flexible scheduler at transport layer that can • work at different locations in the network • provide mechanisms for implementing a wide range of traffic classification and bandwidth management policies. • Encourages traffic aggregation, hence increases bandwidth utilization. • Have a structured implementation and programming interfaces.

Server 1 Backbone Network ISP 3 Client 1 ISP 1 ISP 4 Server 2 ISP 2 Client 2 Original Model – Scheduler at the End-Systems Connection Relationships

Corporate Network T1/T3 ISP Backbone Network Corporate Intranet Client Rate Control Device Access Link

ISP Network and Server Farms Server Farm POP Layer4/7 Switch Home Clients ISP Network Backbone Network

Inter-Domain Scenario Domain C Backbone Domain A Client Edge Device Domain B Access Link Fat Pipe

Examples of User Requirements • A user requests some rates for each of his ftp connections. • A user requests that his connection be guaranteed a delay bound d and a probability 1- a. • A manager requests that connections from his department be given a combined bandwidth of f1. • Based on the company's mission, the administrator specifies that database traffic be given a bandwidth f2 and realtime video traffic be given a bandwidth l. He also specifies that realtime video be given a delay guarantee of d with probability 1- a. • The administrator decides that all users should be given equal bandwidth.

Link 50% 40% 10% Agency A Agency B Agency C 30% 20% realtime realtime telnet ftp IP DEC- net telnet ftp 10% 10% 20% 3% 2% 5% realtime ... conn. 1 conn. n telnet ftp ... 1% 1% 15% 5% 0% H-PFQ (Bennett and Zhang 96)

A CBQ Implementation Example • Ftp class has a minimum bandwidth reservation over appropriate time scale. • Can be viewed as priority scheduler with link sharing constraints. Link 20% 80% Agency A Agency B realtime interactive realtime interactive ftp ftp 1, 5% 2, 5% 3, 10% 1, 25% 2, 25% 3, 30%

Our Scheduler Type I f11 f12 f13 Type II Type III f21 f22 Type II- Infinity 3 1, d1 2, d2 r31 r32 3 1, d3 2, d4 r41 r42 1 2 3 r51 r52 r53

An Instance Link 3 Real- time Inter- active Elastic 1, d1 2, d2 Buf. video ftp 67% 33%

Goal of Scheduling • Satisfies requirements (vaguely) from individual connections, together with admission control. • Satisfies the requirements from individual classes, which are aggregation of a set of connections with some common characteristics. • Distributes extra bandwidth prudently. • Participates in flow control and regulates excess traffic. • Achieve flow isolation and fairness. • Always aims at high bandwidth utilization. • Optimize operator’s objectives: bandwidth utilization, revenue, or combined user’s satisfactions, subject to constraints.

Goal of Scheduling – Cont. • The items in the list have overlap. Need to understand these objectives better. • One approach is to define a precedence relation among the objectives (Shenker et al, 93). • It is hopefule to design mechanisms that can achieve any suitable arrangement of objectives. • Ways to achieve higher utilization • Aggregation of realtime traffic • Tradeoff between realtime and elastic connections.

Key Features of H-PFQ • Goals: • Provides realtime traffic delay bound (loose) • Distribute excess bandwidth according to the tree hierarchy (link sharing) • What does the tree representation mean • An edge means parent-children relationship of classes; • a missing edge means disjoint classes; • children of an class is a partition of the class. • Weighted fair-queueing on each level of the tree.

Weakness of H-PFQ • Bandwidth defined only on the shortest timescale; cannot trade-off realtime traffic and elastic traffic. • All requirements are handled by bandwidth allocation and hence it discourages aggregation. • Unnatural to emulate more than two priority levels. • It is problematic for time-varying link capacity, where delay cannot be guranteed, but multiple prioirty levels still makes sense. • Limitation of tree representation: assumes classes are disjoint.

CBQ (Floyd and Jacobson 95) • Goals: • Satisfies realtime traffic • Provides link-sharing services for classes • Each class receives its allocated bandwidth over appropriate time interval • Prudent distribution of excess bandwidth (It is for this purpose the tree hierarchy is defined.) • Not an implementation, but a guideline • The tree is represents bandwidth-sharing requirements.

CBQ Realtime Services • Realtime class takes priority when the traffic is bursty and lightly loaded. (note: the bandwidth are measured on different time scales) 80% 80% 60% 2 20% video ftp video ftp 1 video1 video2 ftp1 ftp2 ftp1 ftp2 1, 20% 1, 40% 2, 15% 2, 5% 67% 33%

CBQ Realtime Service – Cont. 1 • Link sharing becomes effective when a backlogged lower-priority classes do not get the bandwidth assigned over their timescale. • Guard against overloading by realtime traffic due to admission control error or misbehavior. • Give up notion of hard delay guarantee. • Allow some realtime applications to adapt bandwidth fluctuation.

CBQ Realtime Service – Cont. 2 • With more low-priority classes, it becomes harder to provide strict priority to the high-priority class. It becomes more like processor sharing. • If two different priority levels have the same timescales, it becomes processor sharing.

Weakness of CBQ • Since bandwidth are measured on different time intervals, weight assignment makes no sense. They do not have to add up to 1. • Assumes classes are disjoint. Link Link Agency A Agency B Agency C audio video telnet ftp mail 20% 50% 10% 20% 0% 50% 40% 10%

Weakness of CBQ - Cont • The following makes no sense. Has to use the previous classification scheme. • What if the classes cannot be represented by a tree? Link Agency A Agency C Video 50% 40% 10%

Vagueness in User Requirements • Bandwidth: minimum, maximum and average bandwidth and their relevant timescales. • Traffic Classes: including short-interactive flow, bulk transfer, realtime stream, and buffered stream • Delay: average and maximum delay, and probabilistic bound • Transfer Size: special handling of short flows • Bandwidth Adaptability: the application control the data generating rate, and can render the received data at different level of quality.

Goals of Scheduler Design • Want to support the following services • Bandwidth guarantee at the shortest time scale • Various probabilistic delay-guarantees • Bandwidth guarantee on longer time scales • Priority class when bandwidth is uncertain. • Best effort services • Want a structured representation of the above. • Want more general notion of connection classes • Want a representation that leads directly to implementation. Timescales are explicit in the representation.

Scheduler Definition • Define three classes • Type I: bandwidth guaranteed on shortest time scale. Each class is associated with a number fi, which is the rate assignment for the class. • Type II: delay guarantee is characterized by a tuple (di, pi, ri), indicating Pr(D > di) < pi, where D is the queueing delay. Note that di can be infinity. The optional ri stands for the average rate of this class, and is used for admission control. • Type III: best-effort class, bandwidth guaranteed on longer time scale. Each is associated with a pair (mi,ri), where the mi is the minimum rate and ri is the average rate.

Rules for Scheduler Hierarchy • Type I class can have Type I, II, or III as children. • Type II class has no children, except that Type II-Infinity may have Type III or Type II-Infinity as children. • Type III class can have Type III or Type II-Infinity as children. • All children of a class must themselves be the same type. • Notice that Type I can only have Type I as parent. Type II can only have Type I as parent, except for Type II-Infinity. Type III can have Type I, III, or Type II-Infinity as parent. Type II-Infinity can have type I, III, or Type II-Infinity as parent.

Our Scheduler Type I f11 f12 f13 Type II Type III f21 f22 Type II- Infinity 3 1, d1 2, d2 r31 r32 3 1, d3 2, d4 r41 r42 1 2 3 r51 r52 r53

Time-Varying Pipe r21 r22 r23

Key Features • For all children of a Type I or III class, bandwidth is defined on comparable timescale. • Priority classes are added. • The representation is actually a general scheduler (in CBQ terminology), not a link-sharing representation (link-sharing requirement should be kept separately, not necessarily in tree structure).

A Scheduler Implementation • The immediate children of a class are scheduled as follows • Type I classes are scheduled by WFQ. • A Type II classes are numbered by 1,2,…,n, which stand for their scheduling priority. For instance, a class with a lower delay bound takes priority over one with a higher delay bound. When two Type II classes have the same delay bound, the one with smaller p_i value takes priority over the one with larger p_i value. Type II-Infinity classes are numbered a priori. • Type III classes are scheduled according to WFQ among themselves.

Admission Control and Usage Accounting • To guarantee delay, admission control is necessary for ordinary Type II classes. • To guarantee minimum rate, admission control is necessary for Type III classes. • Admission control is aided by usage accounting. • Additional connection classes can be defined outside the scheduler.

Overall Architecture Usage Accounting Class Management Admission Control / Policing Classifier Scheduler Adjustment Pricing User Kernel Scheduler Measurement Scheduler

Design Philosophy of Scheduler