490 likes | 513 Views
Gain insight into single-hop delay, routing anomalies, path multiplicity, and personal experiences at Sprint. Explore the impact of bottleneck link load, routing loop issues, and approaches to optimize routing protocols. Learn about VoIP experiments, routing loop scenarios, and suggestions to enhance network performance.
E N D
Control Plane Issues in the Internet: Personal Perspective 2005.4.11. Monday Microsoft Research Asia Beijing, China Sue B. Moon Division of Computer Science Dept. of EECS KAIST
Overview • Personal Perspective • Single-Hop Delay • Point-to-Point Delay • Routing Anomaly • Path Multiplicity as a Value-Added Service
Personal Experience at Sprint • When I first arrived, I heard … • “No loss” on Sprint backbone network • “Almost no delay” • “Cadillac brand of IP service”
Monitors in San Jose PoP * All monitored links are OC3
Summary of Single-Hop Delay • Packet size is a major factor • Non-work-conserving behavior of a router is a main cause behind large delay (> 1ms) • Not much queueing observed
Data Set 3 Delay Distributions
Data Set 3 Hourly Delay Distributions
Data Set 3 Path 3 Path 2 Path 1 Min delay of src/dst flow (Data Set 3) Identification of Constant Factors: Multi-Paths • Equal Cost Multi Paths (ECMP) • Src/Dst addresses, Router ID
Three Paths Connectivity • Data Set 3 Fiber prop.delay 28ms 32ms 34ms
Path 1 Path Separation of Data Set 3 • TTL difference • Minimum delay of flow (src ip, dst ip)
Identification of Constant Factors: Packet Size • Path transit time • Propagation + packet processing (packet size)
Data Set 3 Removing Constant Factors Path1
Data Set 3, Path 1 Variable Delay: Bulk
Variable Delay: Bulk (cont’d) Data Set 3
90 Impact of Bottleneck Link Load
Data Set 3, Path 1 Variable Delay Revisited: Tail
Closer Look • Queue Build up & Drain
Summary of Pt-to-Pt Delay • Not much queueing most of the time • Severe congestion when bottleneck link utililization > 90% • Congestion periods longer than 1 sec • Exact causes unknown • Possible causes • Route changes
Issues in "Good" Routing • Misbehaving routing protocols • BGP misconfigurations • Pathological behaviors • Frequent changes • Even under normal circumstances • Transient behaviors • Inter/intra-domain routing not well understood
VoIP experimental setup [Boutremans2002] • Traffic injected in the network: • 200 byte UDP packets • every 5ms. • Packets captured and timestamped at end-systems. • Traceroute runs continuously during the experiment. • Induced link failures on purpose to evalute convergence time and impact on e2e connections
Information Sources • IS-IS & BGP listener logs • Router logs from both ends of “failing” links • Controlled bi-directional VoIP traffic between Reston and ATL • SNMP data
~3.4ms ~2.6ms 3 links up 2 links down 2 links up 3 links down Delays (1 sec timescale)
When the two interfaces went down … 6.6 seconds
Traffic “black-holed” for 0.975 seconds Traffic “black-holed” for 1.745 seconds For 30 secs packets follow a shorter path When three links came back up
Approaches To Fix It • Fine-tuning parameters • Timer values [Alattinoglu2002] • Modify Routing Protocols • Suppress advertisement and perform local rerouting using a backwarding table [Lee04] • Centralized path computation [Feamster04,Rexford04]
Our Approach • Key Idea: • Find disjoint overlay path and send duplicate packets • Assumptions • Sender and receiver both within an AS • Bidirectional link weights • Extra income for extra b/w consumption • Pros and cons • Advantages • No modification to current infrastructure • Selective use by only those that need it • Disadvantages • Extra b/w consumption
Provisioning for Interactive Streaming • Interactive Streaming • Not a driving force behind b/w • A candidate for growing revenue • Examples • VoIP gradually taking over PSTN traffic • Remote video viewing at door by cell phone • Online game traffic • "Good" routing more important than bandwidth
source destination Basic Ideas candidate relay nodes!!!
What I have learned … • No loss, almost no delay • Almost. I gained insight into causes behind • Debunking the myths [Odlyzko2005] • Streaming real-time traffic • QoS • Content is king • Usage-sensitive pricing
Other Issues Tackled • Traffic Matrix Estimation • Inspired by tomography in other fields • Before arrival of efficient NetFlow • Network Anomaly Detection • NIDS, IDS => PCA-based global monitoring • Optimization • Cross-layer resource allocation
Future Work • Personal perspective • More into creating value-added services • MPLS/VPN performance issues
Acknowledgements • Thank D. Papagiannaki, B.-Y. Choi, U. Hengartner, C. Boutresmans, G. Iannaccone, and M. Cha for help with the slides.