560 likes | 570 Views
Internet2 E2E piPEs Project. Eric L. Boyd. Internet2 E2E piPEs Goals. Enable end-users & network operators to: determine E2E performance capabilities locate E2E problems contact the right person to get an E2E problem resolved. Enable remote initiation of partial path performance tests
E N D
Internet2 E2E piPEs Project Eric L. Boyd
Internet2 E2E piPEs Goals • Enable end-users & network operators to: • determine E2E performance capabilities • locate E2E problems • contact the right person to get an E2E problem resolved. • Enable remote initiation of partial path performance tests • Make partial path performance data publicly available • Interoperable with other performance measurement frameworks
Project Phases • Phase 1: Tool Beacons • BWCTL (Complete), http://e2epi.internet2.edu/bwctl • OWAMP (Complete), http://e2epi.internet2.edu/owamp • NDT (Complete), http://e2epi.internet2.edu/ndt • Phase 2: Measurement Domain Support • General Measurement Infrastructure (Prototype) • Abilene Measurement Infrastructure Deployment (Complete), http://abilene.internet2.edu/observatory • Phase 3: Federation Support • AA (Prototype – optional AES key, policy file, limits file) • Discovery (Measurement Nodes, Databases) (Prototype – nearest NDT server, web page) • Test Request/Response Schema Support (Prototype – GGF NMWG Schema)
NDT (Rich Carlson) • Network Diagnostic Tester • Developed at Argonne National Lab • Ongoing integration into piPEs framework • Redirects from well-known host to “nearest” measurement node • Detects common performance problems in the “first mile” (edge to campus DMZ) • In deployment on Abilene: • http://ndt-seattle.abilene.ucaid.edu:7123
NDT Milestones • New Features added • Configuration file support • Scheduling/queuing support • Simple server discovery protocol • Federation mode support • Command line client support • Open Source Shared Development • http://sourceforge.net/projects/ndt/
NDT Future Directions • Focus on improving problem detection algorithms • Duplex mismatch • Link detection • Complete deployment in Abilene POPs • Expand deployment into University campus/GigaPoP networks
How Can you Participate? • Set up BWCTL, OWAMP, NDT Beacons • Set up a measurement domain • Place tool beacons “intelligently” • Determine locations • Determine policy • Determine limits • “Register” beacons • Install piPEs software • Run regularly scheduled tests • Store performance data • Make performance data available via web service • Make visualization CGIs available • Solve Problems / Alert us to Case Studies
Example piPEs Use Cases • Edge-to-Middle (On-Demand) • Automatic 2-Ended Test Set-up • Middle-to-Middle (Regularly Scheduled) • Raw Data feeds for 3rd-Party Analysis Tools • http://vinci.cacr.caltech.edu:8080/ • Quality Control of Network Infrastructure • Edge-to-Edge (Regularly Scheduled) • Quality Control of Application Communities • Edge-to-Campus DMZ (On-Demand) • Coupled with Regularly Scheduled Middle-to-Middle • End User determines who to contact about performance problem, armed with proof
Test from the Edge to the Middle Divide and conquer: Partial Path Analysis • Install OWAMP and / or BWCTL • Where are the nodes? • http://e2epi.internet2.edu/pipes/pmp/pmp-dir.html • Begin testing!: • http://e2epi.internet2.edu/pipes/ami/bwctl/ • Key Required • http://e2epi.internet2.edu/pipes/ami/owamp/ • No Key Required
Example piPEs Use Cases • Edge-to-Middle (On-Demand) • Automatic 2-Ended Test Set-up • Middle-to-Middle (Regularly Scheduled) • Raw Data feeds for 3rd-Party Analysis Tools • http://vinci.cacr.caltech.edu:8080/ • Quality Control of Network Infrastructure • Edge-to-Edge (Regularly Scheduled) • Quality Control of Application Communities • Edge-to-Campus DMZ (On-Demand) • Coupled with Regularly Scheduled Middle-to-Middle • End User determines who to contact about performance problem, armed with proof
Abilene Measurement Domain • Part of the Abilene Observatory: http://abilene.internet2.edu/observatory • Regularly scheduled OWAMP (1-way latency) and BWCTL/Iperf (Throughput, Loss, Jitter) Tests • Web pages displaying: • Latest results http://abilene.internet2.edu/ami/bwctl_status.cgi/TCP/now “Weathermap” http://abilene.internet2.edu/ami/bwctl_status_map.cgi/TCP/now • Worst 10 Performing Links http://abilene.internet2.edu/ami/bwctl_worst_case.cgi/TCP/now • Data available via web service: http://abilene.internet2.edu/ami/webservices.html
Quality Control of Abilene Measurement Infrastructure (1) • Problem Solving Approach • Ongoing measurements start detecting a problem • Ad-hoc measurements used for problem diagnosis • On-going Measurements • Expect Gbps flows on Abilene • Stock TCP stack (albeit tuned) • Very sensitive to loss • “Canary in a coal mine” • Web100 just deployed for additional reporting • Skeptical eye • Apparent problem could reflect interface contention
Quality Control of Abilene Measurement Infrastructure (2) • Regularly Scheduled Tests • Track TCP and UDP Flows (BWCTL/Iperf) • Track One-way Delays (OWAMP) • IPv4 and IPv6 • Observe: • Worst 10 TCP flows • First percentile TCP flow • Fiftieth percentile TCP flow • What percentile breaks 900 Mbps threshold • General Conclusions: • On Abilene, IPv4 and IPv6 statistically indistinguishable • Consistently low values to one host or across one path indicates a problem
First two weeks in March 50th percentile right at 980 Mb/s 1st percentile about 900 Mb/s Take it as a baseline.
Beware the Ides of March 1st percentile down to 522 Mb/s Circuit problems along west coast. nb: 50th percentile very robust.
Recovery – sort of; life through 29 April 1st percentile back up to mid-800s, lower and shakier. nb: 50th percentile still very robust.
Ah, sudden improvement through 5-May 1st percentile back up above 900 Mb/s and more stable. But why??
Then, while Matt Z is tearing up the tracks 1st percentile back down to the 500s. Diagnosis: something is killing Seattle. Oh, and Sunnyvale is off the air.
Matt fixes Sunnyvale, and things get (slightly) worse: both Seattle and Sunnyvale are bad. 1st percentile right at 500 Mb/s. Diagnosis: web100 interaction.
Matt fixes the web100 interaction. 1st percentile cruising through 700 Mb/s. Life is good.
Friday the (almost) 13th; JUNOS upgrade induces packet loss for about four hours along many links. 1st percentile falls to 63 Mb/s. Long-distance paths chiefly impacted.
A “Known” Problem • Mid-May: routers all got a new software load to enable a new feature • Everything seemed to come up, but on some links, utilization did not rebound • Worst-10 reflected very low performance across those links • QoS parameter configuration format change…
Nice weekend. 1st percentile rises to 968 Mb/s. But why??
We Found It First • Streams over SNVA-LOSA link all showed problems • NOC responded: Found errors on SNVA-LOSA link • (NOC is now tracking errors more closely…) • Live (URL subject to change): http://abilene.internet2.edu/ami/bwctl_percentile.cgi/TCPV4/1/50/14118254811367342080_14169839516075950080
Example piPEs Use Cases • Edge-to-Middle (On-Demand) • Automatic 2-Ended Test Set-up • Middle-to-Middle (Regularly Scheduled) • Raw Data feeds for 3rd-Party Analysis Tools • http://vinci.cacr.caltech.edu:8080/ • Quality Control of Network Infrastructure • Edge-to-Edge (Regularly Scheduled) • Quality Control of Application Communities • ESNet / ITECs (3+3) [See Joe Metzger’s talk to follow] • eVLBI • Edge-to-Campus DMZ (On-Demand) • Coupled with Regularly Scheduled Middle-to-Middle • End User determines who to contact about performance problem, armed with proof
Example Application Community: VLBI (1) • Very-Long-Baseline Interferometry (VLBI) is a high-resolution imaging technique used in radio astronomy. • VLBI techniques involve using multiple radio telescopes simultaneously in an array to record data, which is then stored on magnetic tape and shipped to a central processing site for analysis. • Goal: Using high-bandwidth networks, electronic transmission of VLBI data (known as “e-VLBI”).
Example Application Community: VLBI (2) • Haystack <-> Onsala • Abilene, Eurolink, GEANT, NorduNet, SUNET • User: David Lapsley, Alan Whitney • Constraints • Lack of administrative access (needed for Iperf) • Heavily scheduled, limited windows for testing • Problem • Insufficient performance • Partial Path Analysis with BWCTL/Iperf • Isolated packet loss to local congestion in Haystack area • Upgraded bottleneck link
Example Application Community: VLBI (3) • Result • First demonstration of real-time, simultaneous correlation of data from two antennas (32 Mbps, work continues) • Future • Optimize time-of-day for non-real-time data transfers • Deploy BWCTL at 3 more sites beyond Haystack, Onsala, and Kashima
TSEV8 Experiment • Intensive experiment • Data • 18 scans, 13.9 GB of data • Antennas: • Westford, MA and Kashima, Japan • Network: • Haystack, MA to Kashima, Japan • Initially, 100 Mbps commodity Internet at each end, Kashima link upgraded to 1 Gbps just prior to experiment
Network Issues • In week leading up to experiment, network showed extremely poor throughput ~ 1 Mbps! • Network analysis/troubleshooting required: • Traditionally: pair-wise iperf testing between hosts along transfer path, step-by-step tracing of link utilization via Internet2/Transpac-APAN network monitoring websites: • Time consuming, error prone, not conclusive • New approach: automated iperf-testing using Internet2’s bwctl tool (allows partial path analysis), one single website to integrate link utilization statistics into one single website • No maintenance required once setup, for the first time an overall view of the network and bandwidth on segment-by-segment basis
E-VLBI Network Monitoring http://web.haystack.mit.edu/staff/dlapsley/tsev7.html
E-VLBI Network Monitoring http://web.haystack.mit.edu/staff/dlapsley/tsev7.html
E-VLBI Network Monitoring • Use of centralized/integrated network monitoring helped to enable identification of bottleneck (hardware fault) • Automated monitoring allows view of network throughput variation over time • Highlights route changes, network outages • Automated monitoring also helps to highlight any throughput issues at end points: • E.g. Network Inteface Card failures, Untuned TCP Stacks • Integrated monitoring provides overall view of network behavior at a glance
Result • Successful UT1 experiment completed June 30 2004. • New record time for transfer and calculation of UT1 offset: • 4.5 hours (down from 21 hours)
Acknowledgements • Yasuhiro Koyama, Masaki Hirabaru and colleagues at National Institute for Information and Communications Technology • Brian Corey, Mike Poirier and colleagues from MIT Haysack Observatory • Internet2, TransPAC/APAN, JGN2 networks • Staff at APAN Tokyo XP • Tom Lehman - University of Southern California - Information Sciences Institute East
Example piPEs Use Cases • Edge-to-Middle (On-Demand) • Automatic 2-Ended Test Set-up • Middle-to-Middle (Regularly Scheduled) • Raw Data feeds for 3rd-Party Analysis Tools • http://vinci.cacr.caltech.edu:8080/ • Quality Control of Network Infrastructure • Edge-to-Edge (Regularly Scheduled) • Quality Control of Application Communities • Edge-to-Campus DMZ (On-Demand) • Coupled with Regularly Scheduled Middle-to-Middle • End User determines who to contact about performance problem, armed with proof
How Can you Participate? • Set up BWCTL, OWAMP, NDT Beacons • Set up a measurement domain • Place tool beacons “intelligently” • Determine locations • Determine policy • Determine limits • “Register” beacons • Install piPEs software • Run regularly scheduled tests • Store performance data • Make performance data available via web service • Make visualization CGIs available • Solve Problems / Alert us to Case Studies
American / European Collaboration Goals • Awareness of ongoing Measurement Framework Efforts / Sharing of Ideas (Good / Not Sufficient) • Interoperable Measurement Frameworks (Minimum) • Common means of data extraction • Partial path analysis possible along transatlantic paths • Open Source Shared Development (Possibility, In Whole or In Part) • End-to-end partial path analysis for transatlantic research communities • VLBI: Haystack, Mass. Onsala, Sweden • HENP: Caltech, Calif. CERN, Switzerland
American / European Collaboration Achievements • UCL E2E Monitoring Workshop 2003 • http://people.internet2.edu/~eboyd/ucl_workshop.html • Transatlantic Performance Monitoring Workshop 2004 • http://people.internet2.edu/~eboyd/transatlantic_workshop.html • Caltech <-> CERN Demo • Haystack, USA <-> Onsala, Sweden • piPEs Software Evaluation (In Progress) • Architecture Reconciliation (In Progress)
Example Application Community: ESnet / Abilene (1) • 3+3 Group • US Govt. Labs: LBL, FNAL, BNL • Universities: NC State, OSU, SDSC • http://measurement.es.net/ • Observed: • 400 usec 1-way Latency Jump • Noticed by Joe Metzger • Detected: • Circuit connecting router in the CentaurLab to the NCNI edge router moved to a different path on metro DWDM system • 60 km optical distance increase • Confirmed by John Moore
American/European Demonstration Goals • Demonstrate ability to do partial path analysis between “Caltech” (Los Angeles Abilene router) and CERN. • Demonstrate ability to do partial path analysis involving nodes in the GEANT network. • Compare and contrast measurement of a “lightpath” versus a normal IP path. • Demonstrate interoperability of piPEs and analysis tools such as Advisor and MonALISA