What Lies Beneath: Understanding Internet Congestion

What Lies Beneath: Understanding Internet Congestion Leiwen Deng Aleksandar Kuzmanovic Northwestern University Bruce Davie, Cisco Systems http://networks.cs.northwestern.edu

Common Wisdom and Our Key Results • No congestion in the Internet core • Links are over-provisioned, hence no congestion • No correlation among congestion events in the Internet • Diversity of traffic and links make large and long-lasting link congestion dependence unlikely • Our key results • There is a subset of links (both inter-AS and intra-AS) that exhibit strong congestion intensity • Congestion events in the core can be highly correlated (up to 3 ASes)

Why Do We Care? • Congestion in the core • Can depend on upon internal network policies or complex inter-AS relationships • Variable queuing delay can lead to jitter, affecting VoIP or streaming applications • Correlation • Guidelines for re-routing systems • Most tomography models assume link congestion independence

Challenges • Scalability • How to concurrently monitor a large number of Internet links? • Need a light monitoring tool • Need a triggered monitoring system • Our approach • Pong: a light monitoring tool • Per-path overhead 18 kbps • TPong: a triggered monitoring system • Capable of monitoring up to 8,000 links concurrently

Congestion Events • Congestion Intensity • How frequently does queue build-ups happen over 30 seconds time scales? • We focus on persistent congestion events: • Intensity > 5%; duration > 2 minutes

f s d b Coordinated Probing Probe S D f probe b probe , s probe d probe , , 4-p probing: a symmetric path scenario Combines e2e and router-targeted probing

f s d b Locating Congestion Points Tracing Congestion Status Half-path queuing delay Pong: Coordinated Probing Probe Δf Δd S D Δs Δb Δfs Δfd

Pong: Methodology Highlights • Coordinated probing • Send 4, 3, or 2 packets from two endpoints • Quality of Measurability (QoM) • Able to deterministically detect its own inaccuracy • Self-adaptivity • Switch among different probing schemes based on QoM and path properties

Vantage Point Selection Problem • How to select vantage points to accurately measure congestion at a given link? • Link measurability score • How well are we able to measure a specific link from a specific pair of endpoints; a function of: • Quality of measurability (QoM) for a given node • Queuing-delay threshold quality • Observability score • Avoid paths that “see” multiple congested links concurrently

Triggered Monitoring System • Greedy algorithm to determine a subset of links • Covered 65% (7,800) links with 4.9% (1,750) paths • Limit the per-node measurement overhead • Priority-based Pong path allocation • Maximize quality of measurability

Coverage & Overhead Statistics • We observe ~ 36,000 paths • N^2, N = 191 nodes • Expose ~ 12,100 links at a time • Due to routing changes, we are able to observe ~ 29,000 links in total • TMon paths: • Up to 2,000 paths running fast-rate probing concurrently • Cover up to 8,000 links concurrently • 4.9% paths cover 65% of total links • Pong paths • Up to 30 Pong paths; cover up to 350 links concurrently • Overhead per node: • Average: 30 kbps, Peak: 68 kbps

Measurement Quality • How good is our vantage-point selection algorithm? • Link Measurability Score: 0-6. • 65% of measurement samples have non-zero score • 80% of measurements is better than fair • 60% of measurements is better than good • The key point is that we know how good or bad we are doing

Key Findings • Time-invariant hot spots • Strong spatial correlation among congested links • Root-cause analysis

Time-invariant Hot Spots • Time-of-day effects for the number of congestion events • Small number of links show strong time-invariant congestion intensity

Time-invariant Hot Spots • Most of the links are not inter-continental links as we initially hypothesized • Inter-AS links between large backbone networks as well as intra-AS links within these networks

Congestion Correlation • Pair-wise correlation • Percent of time 2 links are concurrently congested • Pair-wise correlation can be quite extensive • E.g., 20% of pairs has correlation greater than 0.7 • Correlation: weekend > weekdays • Overall congestion level smaller during weekends • Distance between correlated link pairs • up to 3 ASes

Aggregation Effect Hypothesis • Hypothesis: • When upstream traffic converges to a relatively thin aggregation point, then traffic surges in an upstream link are likely to create congestion at a thin downstream aggregation link • Insights: • Aggregation points correspond to time-invariant hot spots • Interaction between an aggregation point and an upstream link causes link-level correlation Aggregation link

Root-cause Analysis: Example 622Mbps 10Gbps

Final Statistics Table 1: Matched locations in the top ten networks defined by the number of peers Europe North America Asia Table 2: Matched locations in the top ten ISPs that most aggressively promote customer access

Conclusions • Triggered monitoring system • Measuring congestion in a scalable way • Key feature: • Select vantage points to measure congestion as a function of the measurement quality • Key findings • A subset of links experience time-invariant high congestion intensity • There is strong correlation among congestion events at different links (up to 3 ASes) • Root cause: aggregation effect • some links thinner than others

What Lies Beneath: Understanding Internet Congestion

What Lies Beneath: Understanding Internet Congestion

Presentation Transcript

The Internet, Intranets, and Extranets

Chapter 12: Consumer Trade Transactions

E-banking = Banking + Internet

Internet

Selling Your Business: Prepare Now for What Lies Ahead

Congestion Control and Traffic Management in High Speed Networks

‘There are three types of lies: lies, damn lies, and statistics!’ Benjamin Disraeli

Capnography: Is it helpful?

Measurement, Modeling, and Analysis of the Internet: Part II

IP Quality of Service

Understanding and Managing Cascades on Large Graphs

Microscopic Behavior of Internet Control

Computer Networks (Graduate level)

第2章 Internet 基础

Congestion Control

Internet

Chapter 3: Transport Layer

Chapter 3: Transport Layer