200 likes | 269 Views
What Lies Beneath: Understanding Internet Congestion. Leiwen Deng Aleksandar Kuzmanovic Northwestern University Bruce Davie, Cisco Systems. http://networks.cs.northwestern.edu. Common Wisdom and Our Key Results. No congestion in the Internet core
E N D
What Lies Beneath: Understanding Internet Congestion Leiwen Deng Aleksandar Kuzmanovic Northwestern University Bruce Davie, Cisco Systems http://networks.cs.northwestern.edu
Common Wisdom and Our Key Results • No congestion in the Internet core • Links are over-provisioned, hence no congestion • No correlation among congestion events in the Internet • Diversity of traffic and links make large and long-lasting link congestion dependence unlikely • Our key results • There is a subset of links (both inter-AS and intra-AS) that exhibit strong congestion intensity • Congestion events in the core can be highly correlated (up to 3 ASes)
Why Do We Care? • Congestion in the core • Can depend on upon internal network policies or complex inter-AS relationships • Variable queuing delay can lead to jitter, affecting VoIP or streaming applications • Correlation • Guidelines for re-routing systems • Most tomography models assume link congestion independence
Challenges • Scalability • How to concurrently monitor a large number of Internet links? • Need a light monitoring tool • Need a triggered monitoring system • Our approach • Pong: a light monitoring tool • Per-path overhead 18 kbps • TPong: a triggered monitoring system • Capable of monitoring up to 8,000 links concurrently
Congestion Events • Congestion Intensity • How frequently does queue build-ups happen over 30 seconds time scales? • We focus on persistent congestion events: • Intensity > 5%; duration > 2 minutes
f s d b Coordinated Probing Probe S D f probe b probe , s probe d probe , , 4-p probing: a symmetric path scenario Combines e2e and router-targeted probing
f s d b Locating Congestion Points Tracing Congestion Status Half-path queuing delay Pong: Coordinated Probing Probe Δf Δd S D Δs Δb Δfs Δfd
Pong: Methodology Highlights • Coordinated probing • Send 4, 3, or 2 packets from two endpoints • Quality of Measurability (QoM) • Able to deterministically detect its own inaccuracy • Self-adaptivity • Switch among different probing schemes based on QoM and path properties
Vantage Point Selection Problem • How to select vantage points to accurately measure congestion at a given link? • Link measurability score • How well are we able to measure a specific link from a specific pair of endpoints; a function of: • Quality of measurability (QoM) for a given node • Queuing-delay threshold quality • Observability score • Avoid paths that “see” multiple congested links concurrently
Triggered Monitoring System • Greedy algorithm to determine a subset of links • Covered 65% (7,800) links with 4.9% (1,750) paths • Limit the per-node measurement overhead • Priority-based Pong path allocation • Maximize quality of measurability
Coverage & Overhead Statistics • We observe ~ 36,000 paths • N^2, N = 191 nodes • Expose ~ 12,100 links at a time • Due to routing changes, we are able to observe ~ 29,000 links in total • TMon paths: • Up to 2,000 paths running fast-rate probing concurrently • Cover up to 8,000 links concurrently • 4.9% paths cover 65% of total links • Pong paths • Up to 30 Pong paths; cover up to 350 links concurrently • Overhead per node: • Average: 30 kbps, Peak: 68 kbps
Measurement Quality • How good is our vantage-point selection algorithm? • Link Measurability Score: 0-6. • 65% of measurement samples have non-zero score • 80% of measurements is better than fair • 60% of measurements is better than good • The key point is that we know how good or bad we are doing
Key Findings • Time-invariant hot spots • Strong spatial correlation among congested links • Root-cause analysis
Time-invariant Hot Spots • Time-of-day effects for the number of congestion events • Small number of links show strong time-invariant congestion intensity
Time-invariant Hot Spots • Most of the links are not inter-continental links as we initially hypothesized • Inter-AS links between large backbone networks as well as intra-AS links within these networks
Congestion Correlation • Pair-wise correlation • Percent of time 2 links are concurrently congested • Pair-wise correlation can be quite extensive • E.g., 20% of pairs has correlation greater than 0.7 • Correlation: weekend > weekdays • Overall congestion level smaller during weekends • Distance between correlated link pairs • up to 3 ASes
Aggregation Effect Hypothesis • Hypothesis: • When upstream traffic converges to a relatively thin aggregation point, then traffic surges in an upstream link are likely to create congestion at a thin downstream aggregation link • Insights: • Aggregation points correspond to time-invariant hot spots • Interaction between an aggregation point and an upstream link causes link-level correlation Aggregation link
Root-cause Analysis: Example 622Mbps 10Gbps
Final Statistics Table 1: Matched locations in the top ten networks defined by the number of peers Europe North America Asia Table 2: Matched locations in the top ten ISPs that most aggressively promote customer access
Conclusions • Triggered monitoring system • Measuring congestion in a scalable way • Key feature: • Select vantage points to measure congestion as a function of the measurement quality • Key findings • A subset of links experience time-invariant high congestion intensity • There is strong correlation among congestion events at different links (up to 3 ASes) • Root cause: aggregation effect • some links thinner than others