550 likes | 695 Views
CS551: Queue Management. Christos Papadopoulos ( http://netweb.usc.edu/cs551/). Congestion control vs. resource allocation. Network’s key role is to allocate its transmission resources to users or applications Two sides of the same coin let network do resource allocation (e.g., VCs)
E N D
CS551: Queue Management Christos Papadopoulos (http://netweb.usc.edu/cs551/)
Congestion control vs. resource allocation • Network’s key role is to allocate its transmission resources to users or applications • Two sides of the same coin • let network do resource allocation (e.g., VCs) • difficult to do allocation of distributed resources • can be wasteful of resources • let sources send as much data as they want • recover from congestion when it occurs • easier to implement, may lose packets
Connectionless Flows • How can a connectionless network allocate anything to a user? • It doesn’t know about users or applications • Flow: • a sequence of packets between same source - destination pair, following the same route • Flow is visible to routers - it is not a channel, which is an end-to-end abstraction • Routers may maintain soft-state for a flow • Flow can be implicitly defined or explicitly established (similar to VC) • Different from VC in that routing is not fixed
Taxonomy • Router-centric v.s. Host-centric • router-centric: address problem from inside network - routers decide what to forward and what to drop • A variant not captured in the taxonomy: adaptive routing! • host centric: address problem at the edges - hosts observe network conditions and adjust behavior • not always a clear separation: hosts and routers may collaborate, e.g., routers advise hosts
..Taxonomy.. • Reservation-based v.s. Feedback-based • Reservations: hosts ask for resources, network responds yes/no • implies router-centric allocation • Feedback: hosts send with no reservation, adjust according to feedback • either router or host centric: explicit (e.g., ICMP source quench) or implicit (e.g., loss) feedback
..Taxonomy • Window-based v.s. Rate-based • Both tell sender how much data to transmit • Window: TCP flow/congestion control • flow control: advertised window • congestion control: cwnd • Rate: still an open area of research • may be logical choice for reservation-based system
Service Models • In practice, fewer than eight choices • Best-effort networks • Mostly host-centric, feedback, window based • TCP as an example • Networks with flexible Quality of Service • Router-centric, reservation, rate-based
Queuing Disciplines • Each router MUST implement some queuing discipline regardless of what the resource allocation mechanism is • Queuing allocates bandwidth, buffer space, and promptness: • bandwidth: which packets get transmitted • buffer space: which packets get dropped • promptness: when packets get transmitted
FIFO Queuing • FIFO:first-in-first-out (or FCFS: first-come-first-serve) • Arriving packets get dropped when queue is full regardless of flow or importance - implies drop-tail • Important distinction: • FIFO: scheduling discipline (which packet to serve next) • Drop-tail: drop policy (which packet to drop next)
Dimensions Scheduling Single class Per-connection state Class-based queuing FIFO Drop position Tail Head Random location Early drop Overflow drop
..FIFO • FIFO + drop-tail is the simplest queuing algorithm • used widely in the Internet • Leaves responsibility of congestion control to edges (e.g., TCP) • FIFO lets large user get more data through but shares congestion with others • does not provide isolation between different flows • no policing
Fair Queuing Demmers90
Fair Queuing • Main idea: • maintain a separate queue for each flow currently flowing through router • router services queues in Round-Robin fashion • Changes interaction between packets from different flows • Provides isolation between flows • Ill-behaved flows cannot starve well-behaved flows • Allocates buffer space and bandwidth fairly
FQ Illustration Flow 1 Flow 2 I/P O/P Flow n Variation: Weighted Fair Queuing (WFQ)
Some Issues • What constitutes a user? • Several granularities at which one can express flows • For now, assume at the granularity of source-destination pair, but this assumption is not critical • Packets are of different length • Source sending longer packets can still grab more than their share of resources • We really need bit-by-bit round-robin • Fair Queuing simulates bit-by-bit RR • not feasible to interleave bits!
Bit-by-bit RR • Router maintains local clock • Single flow: suppose clock ticks when a bit is transmitted. For packet i: • Pi: length, Ai = arrival time, Si: begin transmit time, Fi: finish transmit time. Fi = Si+Pi • Fi = max (Fi-1, Ai) + Pi • Multiple flows: clock ticks when a bit from all active flows is transmitted
Fair Queuing • While we cannot actually perform bit-by-bit interleaving, can compute (for each packet) Fi. Then, use Fi to schedule packets • Transmit earliest Fi first • Still not completely fair • But difference now bounded by the size of the largest packet • Compare with previous approach
Output Flow 1 Flow 2 F=10 F=8 F=5 Flow 1 (arriving) Flow 2 transmitting Output F=10 F=2 Fair Queuing Example Cannot preempt packet currently being transmitted
Delay Allocation • Aim: give less delay to those using less than their fair share • Advance finish times for sources whose queues drain temporarily • Bi = Pi + max (Fi-1, Ai - d) • Schedule earliest Bi first
Allocate Promptness • Bi = Pi + max (Fi-1, Ai - d) • d gives added promptness: • if Ai < Fi-1, conversation is active and d does not affect it: Fi = Pi + Fi-1 • if Ai > Fi-1, conversation is inactive and d determines how much history to take into account
Notes on FQ • FQ is a scheduling policy, not a drop policy • Still achieves statistical muxing - one flow can fill entire pipe if no contenders – FQ is work conserving • WFQ is a possible variation – need to learn about weights off line. Default is one bit per flow, but sending more bits is possible
More Notes on FQ • Router does not send explicit feedback to source - still needs e2e congestion control • FQ isolates ill-behaved users by forcing users to share overload with themselves • user: flow, transport protocol, etc • Optimal behavior at source is to keep one packet in the queue • But, maintaining per flow state can be expensive • Flow aggregation is a possibility
Congestion Avoidance • TCP’s approach is reactive: • detect congestion after it happens • increase load trying to maximize utilization until loss occurs • TCP has a congestion avoidance phase, but that’s different from what we’re talking about here • Alternatively, we can be proactive: • we can try to predict congestion and reduce rate before loss occurs • this is called congestion avoidance
Router Congestion Notification • Routers well-positioned to detect congestion • Router has unified view of queuing behavior • Routers can distinguish between propagation and persistent queuing delays • Routers can decide on transient congestion, based on workload • Hosts themselves are limited in their ability to infer these from perceived behavior
Router Mechanisms • Congestion notification • the DEC-bit scheme • explicit congestion feedback to the source • Random Early Detection (RED) • implicit congestion feedback to the source • well suited for TCP
Design Choices for Feedback • What kind of feedback • Separate packets (source quench) • Mark packets, receiver propagates marks in ACKs • When to generate feedback • Based on router utilization • You can be near 100% utilization without seeing a throughput degradation • Queue lengths • But what queue lengths (instantaneous, average)?
A Binary Feedback Scheme for Congestion Control in Computer Networks (DEC-bit) Ramakrishnan90
The Dec-bit Scheme • Basic ideas: • on congestion, router sets a bit (CI) bit on packet • receiver relays bit to sender in acknowledgements • sender uses feedback to adjust sending rate • Key design questions: • Router: Feedback policy (how and when does a router generate feedback) • Source: Signal filtering (how does the sender respond?)
Why Queue Lengths? • It is desirable to implement FIFO • Fast implementations possible • Shares delay among connections • Gives low delay during bursts • FIFO queue length is then a natural choice for detecting the onset of congestion
The Use of Hysteresis • If we use queue lengths, at what queue lengths should we generate feedback? • Threshold or hysteresis? • Surprisingly, simulations showed that if you want to increase power • Use no hysteresis • Use average queue length threshold of 1 • Maximizes power function Power = throughput/delay
Computing Average Queue Lengths • Possibilities: • Instantaneous • Premature, unfair • Averaged over a fixed time window, or exponential average • Can be unfair if time window different from round-trip time • Solution • Adaptive queue length estimation: busy/idle cycles • But need to account for long current busy periods
Sender Behavior • How often should the source change window? • In response to what received information should it change its window? • By how much should the source change its window? • We already know the answer to this: AIMD • DEC-bit scheme uses a multiplicative factor of 0.875
How Often to Change Window? • Not on every ACK received • Window size would oscillate dramatically because it takes time for a window change’s effects to be felt • If window changes to W, it takes (W+1) packets for feedback about that window to be received • Correct policy: wait for (W+W’) acks • Where W is window size before update and W’ is size after update
Using Received Information • Use the CI bits from W’ acks in order to decide whether congestion still persists • Clearly, if some fraction of bits are set, then congestion exists • What fraction? • Depends on the policy to set the threshold • When queue size threshold is 1, cutoff fraction should be 0.5 • This has the nice property that the resulting power is relatively insensitive to this choice
Changing the Sender’s Window • Sender policy • monitor packets within a window • make change if more than 50% of packets had CI set: • if < 50% had CI set, then increase window by 1 • else new window = window * 0.875 • additive increase, multiplicative decrease for stability
Dec-bit Evaluation • Relatively easy to implement • No per-connection state • Stable • Assumes cooperative sources • Conservative window increase policy • Some analytical intuition to guide design • Most design parameters determined by extensive simulation
Random Early Detection (RED) Floyd93
Random Early Detection (RED) • Motivation: • high bandwidth-delay flows have large queues to accommodate transient congestion • TCP detects congestion from loss - after queues have built up and increase delay • Aim: • keep throughput high and delay low • accommodate bursts
Why Active Queue Management? (Rfc2309) • Lock-out problem • drop-tail allows a few flows to monopolize the queue space, locking out other flows (due to synchronization) • Full queues problem: • drop tail maintains full or nearly-full queues during congestion; but queue limits should reflect the size of bursts we want to absorb, not steady-state queuing
Other Options • Random drop: • packet arriving when queue is full causes some random packet to be dropped • Drop front: • on full queue, drop packet at head of queue • Random drop and drop front solve the lock-out problem but not the full-queues problem
Solving the Full Queues Problem • Drop packets before queue becomes full (early drop) • Intuition: notify senders of incipient congestion • example: early random drop (ERD): • if qlen > drop level, drop each new packet with fixed probability p • does not control misbehaving users
Differences With Dec-bit • Random marking/dropping of packets • Exponentially weighted queue lengths • Senders react to single packet • Rationale: • Exponential weighting better for high bandwidth connections • No bias when weighting interval different from round-trip time, since packets are marked randomly • Random marking avoids bias against bursty traffic
RED Goals • Detect incipient congestion, allow bursts • Keep power (throughput/delay) high • keep average queue size low • assume hosts respond to lost packets • Avoid window synchronization • randomly mark packets • Avoid bias against bursty traffic • Some protection against ill-behaved users
RED Operation Min thresh Max thresh Average queue length P(drop) 1.0 MaxP Avg length minthresh maxthresh
Queue Estimation • Standard EWMA: avg - (1-wq) avg + wqqlen • Upper bound on wq depends on minth • want to set wq to allow a certain burst size • Lower bound on wq to detect congestion relatively quickly
Thresholds • minth determined by the utilization requirement • Needs to be high for fairly bursty traffic • maxth set to twice minth • Rule of thumb • Difference must be larger than queue size increase in one RTT • Bandwidth dependence
Packet Marking • Marking probability based on queue length • Pb = maxp(avg - minth) / (maxth - minth) • Just marking based on Pb can lead to clustered marking -> global synchronization • Better to bias Pb by history of unmarked packets • Pb = Pb/(1 - count*Pb)
RED Variants • FRED: Fair Random Early Drop (Sigcomm, 1997) • maintain per flow state only for active flows (ones having packets in the buffer) • CHOKe (choose and keep/kill) (Infocom 2000) • compare new packet with random pkt in queue • if from same flow, drop both • if not, use RED to decide fate of new packet