1 / 24

Active Measurements on the AT&T IP Backbone

Active Measurements on the AT&T IP Backbone. Len Ciavattone, Al Morton, Gomathi Ramachandran AT&T Labs. Colleagues on This Project. Nicole Kowalski Ron Kulper George Holubec Shashi Pulakurti. Measurements for Large Networks. Must be: Easily understood

nanaj
Download Presentation

Active Measurements on the AT&T IP Backbone

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Active Measurements on the AT&T IP Backbone Len Ciavattone, Al Morton, Gomathi Ramachandran AT&T Labs

  2. Colleagues on This Project • Nicole Kowalski • Ron Kulper • George Holubec • Shashi Pulakurti

  3. Measurements for Large Networks • Must be: • Easily understood • Estimate or assess customer performance • Useful for alarming and associated actions • Not likely to generate false positives • As close as possible to real-time notification • Part of the traditional fault/passive management system

  4. Traditional Measurements • Fault • Triggered by hard failures (link, card, router, etc) • Near real-time alarms • Passive • Element level monitoring • Traffic, drops, device health, card performance monitored • Performance alarming possible per interface • Where can traditional measurements be added to? • Path level performance information • Delay and delay variation measurements • Indication of customer degradation (except hard failures)

  5. Active Measurements • Active measurements introduce synthetic traffic into the network • Advantages: • Traffic flow follows a sampled customer path • Delay, delay variation and sampled loss directly measurable • Possible to estimate customer impact of element level degradation • Well designed sampling methodology will allow sound estimation of levels of degradation seen • Can be used to give customers a sense of network behavior (e.g. AT&T’s Network Status Site http://www.att.com/ipnetwork) • Disadvantages • Need to introduce traffic into the network • Based on sampling, not customer traffic

  6. Practical Considerations • From a practical standpoint, what limits the measurements? • Amount of data generated • Desire to use a standard/unmodified UNIX kernel • Expense of bigger and more powerful servers • Cost of deployment of new servers in COs. • Difficulty of acquiring appropriate GPS feed

  7. Poisson Sequence 15 minute duration  = 0.3 pkts/sec Type UDP 278 bytes total packet loss threshold is a min of 3 s Periodic Sequence 1 minute duration Random Start Time 20 ms spacing Type UDP, IPv4 60 bytes total packet loss threshold is a min of 3 s Measurement Design 24 hours . . . 15 minutes Presented at the IETF 50 IPPM meeting by Al Morton

  8. Sampling and Event Detection • Poisson Sequence • All 15 minutes tested with average inter-arrival time of 3.33s • Assume 10 s congestion events (minimum length) • If • Probability of Detection by one or more packets

  9. Sampling and Event Detection • Periodic sequence • 1-min test in a 15-min test cycle (2 if considering RT processes) • Assume 10s congestion events (minimum length), assume 1 event per test cycle • Consider that only recurring events are actionable: Average Number of cycles to detection (one-way) = 1/0.0777 = 13 test cycles The Poisson Probe sequence detects accurately, the Periodic Probe sequence is used to characterize recurring events

  10. Metrics • Round Trip (RT) Loss • RT Delay (std dev, 95th percentile, min, mean) • Inter-Packet Delay Variation (IPDV) and DV jitter • Out of sequence events (non-reversing sequence definition -- up for consideration in the IETF IPPM) • Approximate one-way loss • Degraded seconds or minutes • Loss pattern (number of consecutive losses) • Distributions of delay variations • Traceroutes performed at the beginning of each test • 85 Metrics kept indefinitely

  11. IPDV Definition and Example IPDV is a measure of transfer delay variation. For Packet n, IPDV(n) = Delay(n) - Delay(n-1) If the nominal transfer time is =10msec, and packet 2 is delayed in transit for an additional 5 msec, then two IPDV values will be affected. IPDV(2) = 15 - 10 = 5 msec IPDV(3) = 10 - 15 = -5 msec IPDV(4) = 10 - 10 = 0 msec Tx Rcv Playout 1  2 Inter packet arrival time, longer than send interval 1 3 t 2 4 3 4 Time spent in: Transit Rcv Buffer

  12. IP Packet Sequence Src Dst Playout Arriving Packets are compared with the “next expected” RefNum. Packet 2 arrives Out-of-Sequence, since Packet 3 has arrived and the “next expected” packet in Packet 4. Packet 2 is Offset by 1 packet, or Late by the arrival time of Packet 2 - Packet 3 = t 1  2 3 Tolerance on R2 arrival with 2 Packet Buffer 1 t 4 2 3 Time spent in: Transit Rcv Buffer

  13. Common Problems Detected • Route Changes • Card degradation • Low-level fiber errors • Effects of Maintenance (Card swaps etc)

  14. Examples of Detection • Bit errors that cause low-level (~0.03%) loss can be detected accurately using this method and can be fixed before customers feel the impact • Typically in such cases the degradation is subtle enough that traditional IP alarms do not show the problem clearly • Customers aren’t complaining….yet • In the case shown, no customer complaints were made and the problem was fixed proactively

  15. Increasing Bit Errors More occasional Loss was seen with the Poisson Probe Sequence Fiber span taken out of service Two packet losses per Periodic test Single packet loss per Periodic test

  16. Detection of Route Changes RT Delay 1:07 1:09 9 6 Time Periodic Sequence 1:00 1:15

  17. Poisson Probe Route change detection

  18. Periodic probe (same incident)

  19. The “Blenders” • First shown by Steve Casner et al in the NANOG 22 conference (May 20-22, 2001, “A Fine-Grained View of High Performance Networking”, http://www.nanog.org/mtg-0105/agenda.html) • Seem to be properties of route loops • Rare events, but interesting as they may shed light on some properties of route convergence

  20. Simple Blender • 88 packets arrive within 64 ms • 79 OOS packets, 9 in sequence • 7 sequence discontinuities. • Zero Loss • Delay and IPDV actually describe this event best

  21. Simple Blender Magnified

  22. Blender 2 • Scattered loss throughout • 250 packets in event, • 10 separate sequence discontinuities • Delay of first packet 6s

  23. Blender 2

  24. Summary Active measurements: • Can provide a view of customer performance • Can be used to alert maintenance personnel proactively • Can provide insight into network behavior • Can be used to improve planned maintenance

More Related