400 likes | 411 Views
This study examines the congestion responsiveness of different traffic models in the Internet. It introduces new metrics and measures the congestion responsiveness of real network traffic. The findings have implications for networking research and practice.
E N D
Congestion Responsiveness of Internet Traffic(a fresh look at an old problem) Ravi Prasad & Constantine Dovrolis Networking and Telecommunications Group College of Computing, Georgia Tech
TCP and Internet stability • Stable network: the offered load stays below the capacity (ρ<1) • Otherwise, persistent packet losses • Congestion collapse: fully utilized links, but almost zero per-flow goodput • Conventional wisdom #1: the Internet manages to be stable due to TCP congestion control • TCP: more than 90% of Internet traffic • TCP reduces offered load (send window) upon signs of congestion • Negative-feedback loop, stabilizing queueing system • Conventional wisdom #2: stability can be maintained without admission control or resource reservations
TCP-centric congestion control • If all flows use TCP, or TCP-friendly congestion control, then the Internet will be stable • TCP congestion control -> no congestion collapse • “Promoting the use of end-to-end congestion control in the Internet”, Floyd & Fall, ToN’99 • “Congestion control principles”, Floyd, RFC2914, 2000 • Key modeling unit: persistent flows (they last forever!) • “Rate control in communication networks: shadow prices, proportional fairness and stability”, Kelly et al., JORS’98 • “Congestion control for high performance, stability, and fairness in general networks”, Paganini et al., ToN’05 • Number of active flows does not change with time • Infinitely long flows can be effectively controlled
Receiver Sender Application Response Request Transport Network Flows are generated by users/applications, not by the transport layer! • Examples: user clicks web page, p2p movie download, machine-generated periodic FS synchronization • Session: Set of finite (i.e., non-persistent) flows, generated by single user action • Key issue: session arrival process • Does the session arrival rate reduce during congestion?
1 2 3 N Two fundamental flow arrival models • Closed-loop model • Fixed number of users, each user can generate one session at a time • New session arrival: depends on completion of previous session • E.g., ingress traffic in campus network (student downloads) • Open-loop model • Sessions arrive in network independently of congestion • Theoretically, infinite population of users • E.g., egress traffic at popular Web server • Very different models in terms of congestion responsiveness & stability
Related work • Open-loop traffic model • “Statistical bandwidth sharing: a study of congestion at flow level”, Fredj et al., Sigcomm’01 • “Stability and performance analysis of networks supporting services”, Veciana et al., ToN’01 • Closed-loop traffic model • “A new method for the analysis of feedback-based protocols with applications to engineering web traffic over the Internet”, Heyman et al., Sigmetrics’99 • “Dimensioning bandwidth for elastic traffic in high-speed data networks”, Berger & Kogan, ToN’00 • Main open issues: • What do the previous two models imply for the congestion responsiveness of aggregate Internet traffic? • Which of the previous two models is closer to real Internet traffic?
Our contributions • Introduce two new metrics for congestion responsiveness of aggregate Internet traffic • Elasticity and instability coefficient • Examine congestion responsiveness of several traffic models, including open-loop, closed-loop, and mixed traffic • Open-loop TCP traffic is less congestion responsive than even UDP traffic! • Closed-loop traffic is more congestion responsive than persistent flows • Design experimental methodology to measure Close-loop Traffic Ratio (CTR) • Measure CTR in several Internet packet traces • 70-90% of Internet traffic appears to be closed-loop • Several of implications for networking research & practice
Outline • Congestion responsiveness metrics • Elasticity • Instability coefficient • Results for ideal Processor Sharing (PS) server • Closed-loop flow arrival model • Open-loop flow arrival model • Congestion responsiveness of four traffic models • Persistent TCP flows • UDP constant-rate streams • Open-loop TCP flows • Closed-loop TCP flows • Congestion responsiveness of real network traffic • Methodology and measurements • Summary and implications
Elasticity metric • Quantifies the extent to which a traffic aggregate backs off upon a congestion event • Uand U ’: average throughput of aggregate traffic prior and during stimulus, respectively • Defined as fractional change in throughput • Depends on congestion event cause • Canonical congestion event: a persistent TCP transfer (stimulus) that is not limited by the receiver’s window
Elasticity • f=1 • Completely responsive • f=0 • Completely unresponsive Stimulus Cross-traffic
Elasticity • Positive elasticity • Negative elasticity • When cross traffic increases its rate upon congestion Stimulus Cross-traffic
Instability Coefficient • Instability coefficient quantifies whether (and how fast) a traffic aggregate can lead to congestion collapse upon congestion at time t • Defined as (t)=dN(t)/dt • N(t): number of active sessions at time t • ≤ 0 • Fixed or decreasing number of active sessions • Stable network • > 0 • Increasing number of active sessions • Has the potential to cause congestion collapse • Larger ; faster move towards congestion collapse
Instability Coefficient • Simulation of a stable network: = 0 • Open-loop model: session arrival rate 200/sec
Instability Coefficient • Simulation of an unstable network > 0 • Open-loop model: session arrival rate 400/sec
Outline • Congestion responsiveness metrics • Elasticity • Instability coefficient • Results for ideal Processor Sharing (PS) server • Closed-loop flow arrival model • Open-loop flow arrival model • Congestion responsiveness of four traffic models • Persistent TCP flows • UDP constant-rate streams • Open-loop TCP flows • Closed-loop TCP flows • Congestion responsiveness of real network traffic • Methodology and measurements • Summary and implications
Closed-loop model – PS server • N users: cycles of transfer and idle periods • S:Average session size • TT : Average transfer duration • TI : Average idle time • TT increases during congestion • Na: Number of active sessions • Elasticity f = 1/(Na+1) • Instability coefficient : cannot be positive indefinitely ( Na<N )
Open-loop model – PS server • Poisson session arrivals • S:Average session size • : Session arrival rate • Offered load = S/C • Stable only if <1 • Expected throughput for new transfer: • C(1-) : available bw • Elasticityf = 0 • Instability coefficient: > 0 if >1
Mixed traffic • Internet traffic: mix of open-loop and closed-loop traffic • Mixed traffic can be characterized by Closed-loop Traffic Ratio (CTR) • fmix = CTR* fclosed • mix> 0 when open > 1 • Not when open +closed >1
Outline • Congestion responsiveness metrics • Elasticity • Instability coefficient • Results for ideal Processor Sharing (PS) server • Closed-loop flow arrival model • Open-loop flow arrival model • Congestion responsiveness of four traffic models • Persistent TCP flows • UDP constant-rate streams • Open-loop TCP flows • Closed-loop TCP flows • Congestion responsiveness of real network traffic • Methodology and measurements • Summary and implications
Persistent TCP transfers • N homogenous transfers • Stimulus increases RTT and loss rate from (T,p) to (T’,p’) • UMass model to estimate TCP average throughput • Number of transfers remains constant, i.e., = 0
Constant-rate UDP transfers • Fixed number of constant-rate flows • UDP flows do not react to congestion, and they do not retransmit lost packets • Throughput after stimulus: U’= (1-p)U • Elasticity f = p >0 • Truly congestion responsive traffic should have larger elasticity than loss rate • Instability coefficient is zero • Number of flows does not change during congestion • Cannot cause congestion collapse
Open-loop TCP transfers • Poisson stream of TCP flows • Size uniformly distributed between 16-20pkts • Arrival rate chosen to vary offered load • Ideally, f=0when <1 • But, negative elasticity is possible with TCP redundant retransmissions • Increased offered load after stimulus • is positive when >1 • Possible congestion collapse • Open-loop traffic is net’s worse enemy
Closed-loop TCP transfers • When loss rate ~ 0 (i.e., small number of sessions) • Stimulus increases RTT from T to T’ • Transfer latency increases from kT to kT’ • With small number of active sessions: • Elasticity: about constant • With large number of active sessions: • Elasticity > 1/(Na+1) • Closed-loop TCP traffic: more elastic than persistent flows
Outline • Congestion responsiveness metrics • Elasticity • Instability coefficient • Results for ideal Processor Sharing (PS) server • Closed-loop flow arrival model • Open-loop flow arrival model • Congestion responsiveness of four traffic models • Persistent TCP flows • UDP constant-rate streams • Open-loop TCP flows • Closed-loop TCP flows • Congestion responsiveness of real network traffic • Methodology and measurements • Summary and implications
What to measure? • Direct elasticity measurements require packet traces at bottleneck during stimulus • We have access to only a couple of such links • Direct measurements of instability coefficient require packet traces during congestion events • We have access to only a couple of congested links • Alternative: Measure CTR (closed-loop traffic ratio) • Indirect metric for congestion responsiveness • High CTR (close to one): mostly closed-loop traffic • Low CTR (close to zero): mostly open-loop traffic
CTR estimation (overview) • Start with packet trace from Internet link • Per-packet: arrival time, src/dst address & ports, size • Focus only on TCP traffic: HTTP and well-known ports • Identify users: • Downloads: user is associated with unique DST address • Uploads: user is associated with unique SRC address • Multi-user hosts and NATs is a problem (see paper for details) • For each user, identify sessions: • Session: one or more connections (“jobs”) associated with same user action • E.g., Web page download: multiple HTTP connections • Classify sessions as open-loop or closed-loop: • Successive sessions from same user: closed-loop • Session from a new user, or session arriving from known user after a long idle period: open-loop
An HTTP 1.1 connection can stay alive across multiple sessions Job : Segment of TCP connection that belongs to a single session Intra-job packet interarrivals: TCP and network-dependent (short) Inter-job packet interarrivals: caused by user actions (long) Classify interarrivals based on Silence Threshold (STH) Intra job gap Inter job gap From Connections to Jobs to Sessions 1105126179.423931 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.478309 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.478438 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.478554 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.488433 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.488666 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.488918 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.539748 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.539870 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.539993 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.549085 163.157.239.61 127.207.1.255 80 2290 154 T 114 1105126179.549399 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.611572 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.611702 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.612235 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.612507 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.612752 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.613121 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.672432 163.157.239.61 127.207.1.255 80 2290 1420 T 1380
Intra job gap Inter job gap Silence Threshold (STH) estimation
<MSI >MSI session 2 session 1 Group jobs from same user in sessions • Intuition: jobs from same session will have short interarrivals (machine-generated) • Minimum Session Interarrival (MSI) threshold • MSI aims to distinguish machine-generated from user-initiated events • MSI = 1-5 seconds 1105126179.423931 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.478309 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.478438 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.478554 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.488433 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.488666 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.488918 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.539748 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.539870 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.539993 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.549085 163.157.239.61 127.207.1.255 80 2290 154 T 114 1105126179.549399 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.611572 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.611702 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.612235 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.612507 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.612752 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.613121 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.672432 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 Intra job gap Inter job gap session 3
<MSI >MSI > MTT < MTT Classify sessions as open/closed-loop • First session from a user is always open-loop • Session from a returning user is also open-loop, if it starts more than MTT seconds since completion of last session • MTT: Maximum Think Time • Typically, MTT would be several minutes 1105126179.423931 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.478309 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.478438 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.478554 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.488433 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.488666 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.488918 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.539748 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.539870 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.539993 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.549085 163.157.239.61 127.207.1.255 80 2290 154 T 114 1105126179.549399 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.611572 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.611702 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.612235 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.612507 163.157.239.61 127.207.1.255 80 2289 1420 T 1380 1105126179.612752 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.613121 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 1105126179.672432 163.157.239.61 127.207.1.255 80 2290 1420 T 1380 Intra job gap Inter job gap session 2 Open session 3 Close session 1 Open
Robustness to MSI & MTT thresholds • Examined CTR variation in the following ranges: • MSI: 0.1sec-2sec • MTT : 10min-25min • CTR variation < 0.05 • Linear regression: • CTR/MSI = -0.0044/sec • CTR/MTT = 0.0037/min • We use: • MSI=1 Sec. • MTT=15 Min.
Outline • Congestion responsiveness metrics • Elasticity • Instability coefficient • Results for ideal Processor Sharing (PS) server • Closed-loop flow arrival model • Open-loop flow arrival model • Congestion responsiveness of four traffic models • Persistent TCP flows • UDP constant-rate streams • Open-loop TCP flows • Closed-loop TCP flows • Congestion responsiveness of real network traffic • Methodology and measurements • Summary and implications
Summary • Persistent transfers have very different congestion responsiveness than finite-size transfers • Focus on open-loop and closed-loop flow arrivals • TCP or TCP-like protocols are not sufficient to avoid congestion collapse • Negative feedback at session/application layer holds key for network stability • Measurements show high CTR values for most Internet links we examined • Possibly why Internet is mostly stable
Is AQM an effective controller? • Active Queue Management (AQM) • Most AQM models assume persistent TCP flows • Provides congestion signal to flows • Stabilizes buffer occupancy • Controls link utilization • However, AQM is ineffective controller in presence of open-loop TCP traffic • Flow arrival process does not react to AQM drops • Congestion collapse still possible with AQM
Is admission control necessary? • Admission control is an effective way to control the offered load with open-loop traffic • Avoids flow aborts and reattempts • See proposals by J. Roberts and others • However, admission control is not required with closed-loop traffic • Closed-loop traffic is self-regulating • As long as the maximum possible number of active sessions does not exceed a certain threshold
What about TCP-friendliness? • “TCP friendliness” has been proposed for all non-TCP traffic as a way to avoid congestion collapse • However, like TCP, open-loop TCP friendly sessions can still cause congestion collapse • TCP friendliness is more important for fairness reasons (share bw almost equally with TCP)
Traffic models for simulations-analysis • Time to drop the persistent flows assumption! • It is not realistic • It has very different congestion responsiveness than real Internet traffic • More realistic aggregate traffic models: • Mix of both open-loop and closed-loop finite-size sessions • We need more CTR measurements to characterize the mix • We need mathematical models for closed-loop traffic behavior, considering user behavior under congestion
Session/application congestion control • Several existing applications generate sessions independent of network congestion (bad!) • Example-1: NNTP servers transfer news periodically • Example-2: CDN servers exchange content as needed or periodically • Client-side control mechanism: • Do not start new session before current session completes • Server-side control mechanism: • Use admission control when number of active sessions exceeds threshold