200 likes | 231 Views
Next Generation Network Monitoring. Prepared by: Les Cottrell SLAC http://www.slac.stanford.edu/grp/scs/net/talk07/hec-pk-mar06.ppt. Needs. Advancements in networks improve scientific collaborations, help accelerate discoveries
E N D
Next Generation Network Monitoring Prepared by: Les CottrellSLAC http://www.slac.stanford.edu/grp/scs/net/talk07/hec-pk-mar06.ppt
Needs • Advancements in networks improve scientific collaborations, help accelerate discoveries • E.g. High Energy Physics (HEP), seismology, tele-medicine, astro-physics, global weather, education … • Modern science relies on global Internet • Data exchange, interaction & teleconferencing, Grids … • Network problems have increased significance for science • Thus dependent on cyberinfrastructure to support efficient network problem diagnosis along paths traversing multiple network domains • This is an unresolved issue today • Hard to overstate amount of effort today to resolve problems • Often duplicated • Scientists forced to become part-time network engineers
Why is this hard? • Internet very diverse, hard to find “invariants, phone models do not work • Constantly changing both short and long-term • Changes are not smooth but usually in steps, findings may be out of date • No central organization • Scientific communities span multiple organizations in many countries • Typical path requires crossing at least 5 administrative domains (campus, regional, backbone, regional and campus) • Domains are autonomous • Measurement not high on vendor’s priorities • ISP’s concerned about privacy, competitive advantage, public embarrassment • Diagnosis hard: • Convince ADs there is a problem and that they could/should help • Need multiple pieces of information from multiple sources (ends, multiple middles…), with no coordinating body
Besides Pathology • Service Level Agreements (setting up and auditing) • Planning, setting expectations • Scheduling for grid computing
Past Attempts • Over past 20 years many projects • AMP, MonALISA, IEPM-BW, PingER, Surveyor … • Tended to bundle data generation, sharing and analysis and visualization • Provided many new insights • BUT: • Lacked widespread, uniform deployments • Analysis & visualization hampered by lack of standards to share data • Failed to achieve critical mass (packaging, open source, unitary solution imposed, and/or lack of community involvement)
Proposal to Address • Widespread demand for net info by: • Researchers to know how network is performing • Advanced net apps such as Grids • Net Ops staffs to diagnose problems • Flexibility in extracting net performance data, needed since • Network changes quickly, diagnostic data is moving target • New tools, metrics and types of analysis are constantly developed • Lack of effective ways to share performance data across domains
perfSONAR • Partnership of Internet2, GEANT, ESnet • Plus in the US: SLAC, U Delaware, GATech • 13 EU related NREN deployments of perfSONAR • Provide open set of protocols + reference implementation for cross-domain sharing of network measurements • Common performance middleware • Open Grid Forum NMWG = extensible XML data representation • All development is open source to encourage widespread development, deployment, ownership & involvement • Early framework prototypes deployed in Europe, N and S America (Brazil), also adopted by LHC
Components • Measurement points (MPs) • Measurement Archives (MA) • Lookup service: register & discover services • Authentication • Transformation of existing archives (e.g. IEPM) • Resource protector: manages policy details • Topology service: offers topo info on networks
Methodology Benefits • Provide standard interchange format, allow users to focus on problem solving • Easier to extend with new sub-components since standard documented APIs, allows evolution • E.g. MP tool developer can focus on tool operation and not worry about deployment • Divide & conquer trouble shooting using Lookup & Topology services • Easy to generate trouble reports with access to data in standard format • Can scale to global size
Compare with Existing • Measurement tool clients must be downloaded when needed, need experts, usually need server at remote end (implicit trust & security challenges) • Network measurement projects (AMP, PingER, IEPM etc.) • Require installation, host to run on • Lack community involvement, little ownership (e.g. research team know more about site connectivity that site people, but not involved in trouble-shooting • Projects fade as funding ceases • Net & system measurement projects (MonALISA) • Closed development effort, license requires sign of intellectual rights to Caltech, must rely on Caltech to incorporate new measurement tools • Lack of community involvement, and consensus may limit widespread ownership and deployment.
Where are we Now? • perfSONAR consortium exists, includes many NRENs, active contributions from large segment of research community • Set of protocol standards for interoperability • A partially complete reference implementation • Shortcomings: • Development of some important infrastructure still to be completed (e.g. authentication/authorization) • All existing services need work to turn into production quality, in particular to make easy to deploy • Simple installation can take many hours, and is a big barrier to adoption
Next Steps • Develop scalable, distributed, redundant Federated Lookup service (like DNS) • Integrate common, existing authentication management into perfSONAR • Design and build the Resource Protector to implement policy • Provide specific, useful example diagnostic services as high quality examples (e.g. for traceroute, ping, one-way delay, SNMP, Layer-2 link services etc.) • Provide a Topology service to provide layer-2 & 3 interconnection information • Promote perfSONAR to research community • Students get reliable data from perfSONAR, request on demand measurements, provide new analyses
Impact Science • Science relies on reliable networking. • Debugging problems across domains extraordinarily difficult today, Increased switched networks will make harder. • PerfSONAR enables divide and conquer between end & intermediate points: • provides access to relevant data, enables on demand measurements • reduces need to coordinate multi-domain admins (scientist > local net admin > Regional net admin Backbone admin > …), telephone tag, explaining • Reduces participants, hours, days, frustration etc
Impact Net Research • Network researchers can build, deploy tools to capture analyze net behavior more easily: • No need for login to test boxes, approval from sysadmin to run servers • Handled by authentication, Resource Protector service • Common data exchange formats enables access to archives
Impact Education • PerfSONAR eases bringing net into classroom, can interrogate, run measurements etc. • Incorporate perfSONAR infrastructure components into learning process. • Analyze archived data (do not have to rely on goodwill of end users) • Early prototypes of perfSONAR components featured in UDel CS courses • Excellent pedagogical vehicle for distributed systems • Develop perfSONAR plugins
Benefits Pakistan • Better understanding of customer experience and needs: • utilization, use patterns, event detection, problem diagnosis, planning • Development of better measurement tools, analysis, visualization • Pakistan part of major international community of NRENs • In Europe, U.S. and S. America • Pakistan research & education access to data to analyze
Benefits SLAC • Extend tools to a new country/NREN • Extend diagnosis (important for LHC/Pak collab) • Increased resources for tool development, analysis
Benefits Education • Proven track record • 6 students, all will return to Pakistan • 3 at SLAC now • 1 In Silicon valley start-up, 1 in Oxford, 1 returned to NIIT to pursue PhD • Students get exposure to National Lab and world leading researchers • Courses at Stanford • Hands on exposure to production high speed networks such as are planned for Pakistan
More information/Questions • Acknowledgements: • Harvey Newman and ICFA/SCIC for a raison d’etre, ICTP for contacts and education on Africa, Mike Jensen for Africa information, NIIT/Pakistan, Maxim Grigoriev (FNAL), Warren Matthews (GATech) for ongoing code development for PingER, USAID MoST/Pakistan for development funding, SLAC for support for ongoing management/operations support of PingER • PingER • www-iepm.slac.stanford.edu/pinger, sdu.ictp.it/pinger/africa.html • Human Development • http://www.gapminder.org/ • Role of Internet Exchanges • event-africa-networking.web.cern.ch/event%2Dafrica%2Dnetworking/workshop/slides/The%20Role%20of%20Internet%20Exchanges.ppt • Case Studies: • https://confluence.slac.stanford.edu/display/IEPM/Sub-Sahara+Case+Study • http://sdu.ictp.it/lowbandwidth/program/case-studies/index.html