490 likes | 621 Views
Control Plane Issues in the Internet: Personal Perspective. 2005.4.11. Monday Microsoft Research Asia Beijing, China Sue B. Moon Division of Computer Science Dept. of EECS KAIST. Overview. Personal Perspective Single-Hop Delay Point-to-Point Delay Routing Anomaly
E N D
Control Plane Issues in the Internet: Personal Perspective 2005.4.11. Monday Microsoft Research Asia Beijing, China Sue B. Moon Division of Computer Science Dept. of EECS KAIST
Overview • Personal Perspective • Single-Hop Delay • Point-to-Point Delay • Routing Anomaly • Path Multiplicity as a Value-Added Service
Personal Experience at Sprint • When I first arrived, I heard … • “No loss” on Sprint backbone network • “Almost no delay” • “Cadillac brand of IP service”
Monitors in San Jose PoP * All monitored links are OC3
Summary of Single-Hop Delay • Packet size is a major factor • Non-work-conserving behavior of a router is a main cause behind large delay (> 1ms) • Not much queueing observed
Data Set 3 Delay Distributions
Data Set 3 Hourly Delay Distributions
Data Set 3 Path 3 Path 2 Path 1 Min delay of src/dst flow (Data Set 3) Identification of Constant Factors: Multi-Paths • Equal Cost Multi Paths (ECMP) • Src/Dst addresses, Router ID
Three Paths Connectivity • Data Set 3 Fiber prop.delay 28ms 32ms 34ms
Path 1 Path Separation of Data Set 3 • TTL difference • Minimum delay of flow (src ip, dst ip)
Identification of Constant Factors: Packet Size • Path transit time • Propagation + packet processing (packet size)
Data Set 3 Removing Constant Factors Path1
Data Set 3, Path 1 Variable Delay: Bulk
Variable Delay: Bulk (cont’d) Data Set 3
90 Impact of Bottleneck Link Load
Data Set 3, Path 1 Variable Delay Revisited: Tail
Closer Look • Queue Build up & Drain
Summary of Pt-to-Pt Delay • Not much queueing most of the time • Severe congestion when bottleneck link utililization > 90% • Congestion periods longer than 1 sec • Exact causes unknown • Possible causes • Route changes
Issues in "Good" Routing • Misbehaving routing protocols • BGP misconfigurations • Pathological behaviors • Frequent changes • Even under normal circumstances • Transient behaviors • Inter/intra-domain routing not well understood
VoIP experimental setup [Boutremans2002] • Traffic injected in the network: • 200 byte UDP packets • every 5ms. • Packets captured and timestamped at end-systems. • Traceroute runs continuously during the experiment. • Induced link failures on purpose to evalute convergence time and impact on e2e connections
Information Sources • IS-IS & BGP listener logs • Router logs from both ends of “failing” links • Controlled bi-directional VoIP traffic between Reston and ATL • SNMP data
~3.4ms ~2.6ms 3 links up 2 links down 2 links up 3 links down Delays (1 sec timescale)
When the two interfaces went down … 6.6 seconds
Traffic “black-holed” for 0.975 seconds Traffic “black-holed” for 1.745 seconds For 30 secs packets follow a shorter path When three links came back up
Approaches To Fix It • Fine-tuning parameters • Timer values [Alattinoglu2002] • Modify Routing Protocols • Suppress advertisement and perform local rerouting using a backwarding table [Lee04] • Centralized path computation [Feamster04,Rexford04]
Our Approach • Key Idea: • Find disjoint overlay path and send duplicate packets • Assumptions • Sender and receiver both within an AS • Bidirectional link weights • Extra income for extra b/w consumption • Pros and cons • Advantages • No modification to current infrastructure • Selective use by only those that need it • Disadvantages • Extra b/w consumption
Provisioning for Interactive Streaming • Interactive Streaming • Not a driving force behind b/w • A candidate for growing revenue • Examples • VoIP gradually taking over PSTN traffic • Remote video viewing at door by cell phone • Online game traffic • "Good" routing more important than bandwidth
source destination Basic Ideas candidate relay nodes!!!
What I have learned … • No loss, almost no delay • Almost. I gained insight into causes behind • Debunking the myths [Odlyzko2005] • Streaming real-time traffic • QoS • Content is king • Usage-sensitive pricing
Other Issues Tackled • Traffic Matrix Estimation • Inspired by tomography in other fields • Before arrival of efficient NetFlow • Network Anomaly Detection • NIDS, IDS => PCA-based global monitoring • Optimization • Cross-layer resource allocation
Future Work • Personal perspective • More into creating value-added services • MPLS/VPN performance issues
Acknowledgements • Thank D. Papagiannaki, B.-Y. Choi, U. Hengartner, C. Boutresmans, G. Iannaccone, and M. Cha for help with the slides.