580 likes | 682 Views
ESnet NMTF/NMFG - Status. Les Cottrell, SLAC & Dave Martin, HEPNRC < cottrell@slac.stanford.edu >, < dem@hep.net > Presented at the ESCC Meeting, JLAB , Oct 1997. Outline of Talk. What happened to the NMTF/NMFG? What are we measuring? How are we measuring?
E N D
ESnet NMTF/NMFG - Status Les Cottrell, SLAC & Dave Martin, HEPNRC <cottrell@slac.stanford.edu>, <dem@hep.net> Presented at the ESCC Meeting, JLAB, Oct 1997 /afs/slac/u/sf/cottrell/talk/escc/oct97
Outline of Talk • What happened to the NMTF/NMFG? • What are we measuring? • How are we measuring? • Tools we are using/developing • Coordination with others • Next Steps • Summary /afs/slac/u/sf/cottrell/talk/escc/oct97
What happened to the NMTF/NMFG? • It evolved • Some of original members (BNL & ORNL) were unable to continue effort • SLAC& HEPNRC retained focus on monitoring • ICFA concerned about impact of network performance on HENP research • Created NTF with various WG, one on Monitoring • More focus on HENP issues and International links • Embraced work done by NMTF/NMFG and supported continued development • Brought in new partners, in particular INFN, CERN as well as other collection sites /afs/slac/u/sf/cottrell/talk/escc/oct97
Mission etc. of the ICFA-NTF WG on Monitoring • Mission of Group: • Obtain as uniform picture as possible of the present performance of the connectivity used by the ICFA community • Two meetings so far, CHEP97 (Apr-97), & Santa Fe (Sep-97) • Produced an interim status report for Sep-97 • Will update for Dec-97, with a final report Apr-98. /afs/slac/u/sf/cottrell/talk/escc/oct97
Our Main Metric is Ping • “Universally available”, easy to understand • no software for clients to install • Low network impact • Provides loss, response time, reachability, unpredictability • select hosts carefully, concerns over routers, loaded hosts etc. (provide guidelines) • does provide useful measures /afs/slac/u/sf/cottrell/talk/escc/oct97
Ping Response Time vs Bytes /afs/slac/u/sf/cottrell/talk/escc/oct97
Ping Response vs Web Response HTTP GET Response (ms) Minimum Ping Response (ms) /afs/slac/u/sf/cottrell/talk/escc/oct97
Method • Measurement • Each Collection site keeps list of remote hosts to ping at sites it is interested in • Every 30 mins ping each remote host with 11 * 100 byte followed by 10 * 1000 byte pings • Min separation of pings is 1 second, timeout 20 seconds • Throw away first ping • Measure response, packet loss, host unreachable (no answer to any ping) • Record data and make available /afs/slac/u/sf/cottrell/talk/escc/oct97
Architecture • Three Types of Sites • Remote Sites - need only to respond to ping packets • Collecting Sites • Collecting Data: Perl Script Pings Nodes, Records Data in common documented format • Serving Data: CGI/Perl Script makes Data Available to Analysis Sites • WWW CGI tools make reports available • Analysis Sites • Retrieving Data: Perl Script Retrieves Data from Collecting Sites • Analysis: SAS Program Analyzes Data and Generates Graphs • Reports: WWW Form Makes Customized Reports Available /afs/slac/u/sf/cottrell/talk/escc/oct97
Architecture HTTP WWW Reports & Data E.g. HEPNRC E.g. SLAC Analysis Analysis Archive Collecting Collecting Collecting Collecting Pings Remote Cache Remote Remote Remote /afs/slac/u/sf/cottrell/talk/escc/oct97
Available Tools - Data Collection • Collect data (timeping) • HEPNRC rearchitected, developed & documented • Deployed at 12 sites in 6 countries • ARM, BNL, CERN, CMU, DoE/GMTN, HEPNRC/FNAL, INFN/CNAF. KEK, Hungary, RAL, SLAC, UMD • DESY, IN2P3, TRIUMF, MSU, Beijing also expressed interest, plus commercial sites • Data available (pingdata) in common format • Data collected available from collection site via HTTP • Allows data for specific times to be retrieved /afs/slac/u/sf/cottrell/talk/escc/oct97
Current Deployment CERN DESY KEK HEPNRC/FNAL RAL CMU SLAC BNL RMKI/KFKI UMD INFN/CNAF Monitoring Site ESnet Site (monitored from SLAC) N. American Site ( “ “ ) International Site ( “ “ ) /afs/slac/u/sf/cottrell/talk/escc/oct97
Analysis / Archive Site • Gathers & archives data • HEPNRC gathers data from collection sites a few times daily • Archives the data (200 Mbytes/month) • Works with collection sites to resolve problems • Provide Web access to archive data via form (ping_data.pl) /afs/slac/u/sf/cottrell/talk/escc/oct97
Access to Raw Data /afs/slac/u/sf/cottrell/talk/escc/oct97
Analysis / Archive Site • Gathers & archives data • HEPNRC gathers data from collection sites a few times daily • Archives the data (200 Mbytes/month) • Works with collection sites to resolve problems • Provide Web access to archive data via form (ping_data.pl) • Provide Web form to allow simple plotting (graph_pings.pl), uses SAS for speed /afs/slac/u/sf/cottrell/talk/escc/oct97
Form to Select Analysis Graphs /afs/slac/u/sf/cottrell/talk/escc/oct97
Analysis Tools for Collection Sites • Short-term analysis / reports • Recent data (e.g. last 30 days cached) • Web sortable table of latest measurements, colored for quality /afs/slac/u/sf/cottrell/talk/escc/oct97
Ping Loss Quality 0 -1% Good, 1-5% Acceptable, 5-12% Poor, 12-25% Poor, > 25% Unusable Similar to Internet Weather Report (<6%, <12%, > 12%) /afs/slac/u/sf/cottrell/talk/escc/oct97
Analysis Tools for Collection Sites • Short-term analysis / reports • Recent data (e.g. last 30 days cached) • Web sortable table of latest measurements, colored for quality, with output (TSV) for Excel (connectivity.pl) /afs/slac/u/sf/cottrell/talk/escc/oct97
Latest Ping Measurements /afs/slac/u/sf/cottrell/talk/escc/oct97
Raw Data from last 24 Hours /afs/slac/u/sf/cottrell/talk/escc/oct97
Latest Ping Measurements /afs/slac/u/sf/cottrell/talk/escc/oct97
Ping Performance for Last 180 Days /afs/slac/u/sf/cottrell/talk/escc/oct97
Analysis Tools for Collection Sites • Short-term analysis / reports • Recent data (e.g. last 30 days cached) • Web sortable table of latest measurements, colored for quality, with output (TSV) for Excel (connectivity.pl) • Web form to select sites and time frames to be plotted (ping_data_plot.pl) /afs/slac/u/sf/cottrell/talk/escc/oct97
Request Plot of Collection Site Data /afs/slac/u/sf/cottrell/talk/escc/oct97
Plot from Collection Site /afs/slac/u/sf/cottrell/talk/escc/oct97
Tools in Development • Re-engineering SLAC long term reports • exception report /afs/slac/u/sf/cottrell/talk/escc/oct97
Exception Reports Click to sort by column Click here to burrow down to more information Color highlights extent of exception Last 10 Weeks Ping Data /afs/slac/u/sf/cottrell/talk/escc/oct97
Tools in Development • Re-engineering SLAC long term reports • exception report • last 180 days /afs/slac/u/sf/cottrell/talk/escc/oct97
180 Days SLAC - Stanford Direct connect Via ESnet 5.5ms 20 ms 30ms Loss 3-6% Loss < 1% Uwave & Routing problems Feb-97 Aug-97 /afs/slac/u/sf/cottrell/talk/escc/oct97
Tools in Development • Re-engineering SLAC long term reports • exception report • last 180 days • monthly points going back for years in tabular form with quality coloring, sorting & hyperlinks • Loss (by site, and by group of sites) • Response ( “ “ ) • Reachability ( “ “ ) • % time network “Quiescent” or “Busy” /afs/slac/u/sf/cottrell/talk/escc/oct97
Ping Loss History /afs/slac/u/sf/cottrell/talk/escc/oct97
TSV Output to Excel for Further Analysis /afs/slac/u/sf/cottrell/talk/escc/oct97
Ping Response by Group /afs/slac/u/sf/cottrell/talk/escc/oct97
Prime-time Packet Loss by Group /afs/slac/u/sf/cottrell/talk/escc/oct97
“Quiescent” Frequency by Group /afs/slac/u/sf/cottrell/talk/escc/oct97
International Site “Busy” Frequency UK - US link upgraded CERN & IN2P3 track RL.UK Italian nodes track & look good /afs/slac/u/sf/cottrell/talk/escc/oct97
Tools in Development • Re-engineering SLAC long term reports • exception report • last 180 days • monthly points going back for years in tabular form with quality coloring, sorting & hyperlinks • Loss (by site, and by group of sites) • Response ( “ “ ) • Reachability ( “ “ ) • % time network “Quiescent” or “Busy” • Ten Worst links in HEP /afs/slac/u/sf/cottrell/talk/escc/oct97
Ten Worst HEP Links Ranked by % Packets Lost /afs/slac/u/sf/cottrell/talk/escc/oct97
What are Typical Uses • Setting Expectations • Service Level Contract • Choosing ISPs • Identifying problems, and verifying solutions • Planning for upgrades /afs/slac/u/sf/cottrell/talk/escc/oct97
Summary to Help Choose Upgrades /afs/slac/u/sf/cottrell/talk/escc/oct97
Prime Time Packet Loss Jun-Aug 97 /afs/slac/u/sf/cottrell/talk/escc/oct97
Coordination etc. XIWT/IPWT Interest/deployment /afs/slac/u/sf/cottrell/talk/escc/oct97
XIWT/IPWT interest Austin meeting in Sep-97 available tools presented by developers: IWR, CAIDA/NLANR, Intel, Auto Industry/Bellcore, IETF/IPPM Surveyor … XIWT/IPWT want to: Measure performance of members' own networks Get tests to validate and understand what to recommend to other commercial customers and for what purposes. Build a community within XIWT so can evolve it to address harder issues. Selected our tools to initially deploy at 6 sites includes Intel, SBC, HAI, BellSouth, CNRI, NIST /afs/slac/u/sf/cottrell/talk/escc/oct97
Coordination etc. XIWT/IPWT Interest/deployment MICS funded joint SLAC/LBL proposal on Internet End-to-end performance monitoring for 1 year LBL/NIMI project /afs/slac/u/sf/cottrell/talk/escc/oct97
NIMI (1) • NIMI=National Internet Measurement Infrastructure, collaboration LBL/PSC (V. Paxson, M Mathis, J. Mahdavi). • It is a software suite (not hardware). Deploy on “measurement hosts” around the Internet for black box infrastructure measurements. • Ready for deployment Nov-97. Perl daemon with treno, Poisson packet generation for loss & delays. • Hooks for other tools such as pathchar, tcpanaly. /afs/slac/u/sf/cottrell/talk/escc/oct97
NIMI (2) • Challenges: accurate clock synchronization (one way measurements), scaling to millions of nimids (nb end-to-end measurement strategies are usually not cost free, some things may be over-measured), data retrieval, new measurement strategies. • There is no central management. • Both HEPNRC & SLAC plan to install NIMI hosts (PCs running FreeBSD) at their sites /afs/slac/u/sf/cottrell/talk/escc/oct97
Coordination etc. XIWT/IPWT interest/deployment MICS funded joint SLAC/LBL proposal on Internet End-to-end performance monitoring for 1 year LBL/NIMI project Proposed joint work with NLANR to extend Mapnet Java tools to view our data /afs/slac/u/sf/cottrell/talk/escc/oct97
NLANR Mapnet Tool • Java Applet • Zoom & pan • Select ISPs • Color: • ISP • bandwidth • Mouse over • link details • node details /afs/slac/u/sf/cottrell/talk/escc/oct97