240 likes | 363 Views
Network in EGEE Building end-to-end network services for the Grid. Mathieu Goutelle – CNRS UREC, France EGEE-II SA2 “Networking support” mathieu.goutelle@urec.cnrs.fr. Outline. Short presentation of EGEE, The network in EGEE: Network services?
E N D
Network in EGEEBuilding end-to-end network servicesfor the Grid Mathieu Goutelle – CNRS UREC, France EGEE-II SA2 “Networking support” mathieu.goutelle@urec.cnrs.fr
Outline • Short presentation of EGEE, • The network in EGEE: • Network services? • EGEE focus on end-to-end services in a multi-domain context. • Network services: • Resource reservation, • Service Level Agreement. • Operational services: • Monitoring, • EGEE Network Operational Centre. • Summary & conclusion GridNets 2006 – 2006-10-01 – San Jose, CA, USA
EGEE in a nutshell… • EGEE: • 1 April 2004 – 31 March 2006 • 71 partners in 27 countries, federated in regional Grids • EGEE-II: • 1 April 2006 – 31 March 2008 • 91 partners in 32 countries • 13 Federations • Objectives: • Large-scale, production-quality infrastructure for e-Science • Attracting new resources and users from industry as well asscience • Improving and maintaining “gLite” Grid middleware GridNets 2006 – 2006-10-01 – San Jose, CA, USA
EGEE in a nutshell… • More than 20 applications from 7 domains: • Astrophysics: • MAGIC, Planck • Computational Chemistry • Earth Sciences: • Earth Observation, Solid Earth Physics, Hydrology, Climate • Financial Simulation: • E-GRID • Fusion • Geophysics: • EGEODE • High Energy Physics: • 4 LHC experiments (ALICE, ATLAS, CMS, LHCb) • BaBar, CDF, DØ, ZEUS • Life Sciences: • Bioinformatics (Drug Discovery, GPS@, Xmipp_MLrefine, etc.) • Medical imaging (GATE, CDSS, gPTM3D, SiMRI 3D, etc.) • Multimedia • Material Sciences • … GridNets 2006 – 2006-10-01 – San Jose, CA, USA
EGEE Infrastructure Scale (June 2006): ~ 200 sites in 40 countries ~ 25 000 CPUs > 10 PB storage > 35 000 jobs per day > 100 Virtual Organizations Country participating in EGEE GridNets 2006 – 2006-10-01 – San Jose, CA, USA
Network infrastructure Connects 32 NRENs Over 3M users GridNets 2006 – 2006-10-01 – San Jose, CA, USA
Network infrastructure (cont.) GridNets 2006 – 2006-10-01 – San Jose, CA, USA
End-to-end network services? • What type of services? • Network services are available to the EGEE sites: • Premium IP and similar (QBSS e.g.), • “lightpath” or network resource reservation, • IPv6, multicast… • Operational services are available to the EGEE sites: • Monitoring of the network (local & backbone), • Operational data (incident, maintenance). • How to ensure the service continuity along the path? • In the last mile? • In a multi-domain context? • What about service availability, interface standardization, inter-domain agreements, etc. GridNets 2006 – 2006-10-01 – San Jose, CA, USA
EGEE focus • Network services: • Network resource reservation: • Bandwidth Allocation and Reservation (BAR), • Dedicated talk on that subject (see session 1, “End to End Bandwidth Allocation and Reservation for Grid applications”). • Service Level Agreement (SLAs): • End-to-end SLAs? • Operational services: • Monitoring: • Network Performance Monitoring (NPM), • Dedicated talk on that subject (see session 2, “Federated Network Performance Monitoring for the Grid”). • Coordination of operational actions: • Concept of the EGEE Network Operational Centre (ENOC). GridNets 2006 – 2006-10-01 – San Jose, CA, USA
Network resource reservation • Based on the framework currently being built by the GÉANT2 project: • Hides the multi-domain, multiple technologies issues; • Provides at the Grid level: • A seamless interface for service requests at the “customer” layer; • High-level view of the network, with request of characteristics and not of a particular service; • Reduced configuration lead-time; • A description of the service level. • Issues remain: • A component (BAR, see dedicated talk) gives access to these interfaces at the middleware layer, but the application layer is not yet ready; • Need of sub-management of the macroscopic reserved resource at the Grid level; • What about domains outside the GÉANT2 cloud? GridNets 2006 – 2006-10-01 – San Jose, CA, USA
Quick look at the BAR architecture • Clear demarcation between the Grid and the network: • The network is hidden from the Grid (technology, multi-domain issues…); • The Grid is hidden to the network (only knows one “EGEE” user); • Allows a two-stage process (reservation & activation) suitable in a Grid context; GridNets 2006 – 2006-10-01 – San Jose, CA, USA
SLAs • “SLAs”? • Description of the characteristics of the service provided (e.g. after a successful resource reservation request); • Provided by each domain crossed by the data path; • Either manually filled in by a human or automatically if the request is all handled by software. • Definition of templates in cooperation with GÉANT2: • Based on previous work inside EGEE and answers from GÉANT2 to some open issues (procedures, demarcation point…) • SLA template: • Administrative part (contact, duration, troubleshooting procedures); • SLS (Service Level Specification) part. • The SLA is formed using the individual SLAs provided by all domains along the end-to-end path. GridNets 2006 – 2006-10-01 – San Jose, CA, USA
SLAs (cont.) border-to-border connectivity end-to-end connectivity • EGEE end-to-end SLA template: • Concatenation of the individual SLAs in each participating domains; • SLA between the border of the NRENs cloud (border-to-border SLA); • Difficulty to accommodate and take into account the “last mile”: • If the “last-mile” network is not participating (no resource reservation system, no SLA, etc.); • Try to address this with static information on these networks to provide service characteristics to the user/application. GridNets 2006 – 2006-10-01 – San Jose, CA, USA
SLA institution • All domains involved in network services provisioning to EGEE as part of the existing network infrastructure hierarchy have to be categorized as one of: • Compliant with the Premium IP service, • Supportive of the Premium IP service, • Indifferent to the Premium IP service. GridNets 2006 – 2006-10-01 – San Jose, CA, USA
EGEE focus • Network services: • Network resource reservation: • Bandwidth Allocation and Reservation (BAR), • Dedicated talk on that subject (see session 1, “End to End Bandwidth Allocation and Reservation for Grid applications”). • Service Level Agreement (SLAs): • End-to-end SLAs? • Operational services: • Monitoring: • Network Performance Monitoring (NPM), • Dedicated talk on that subject (see session 2, “Federated Network Performance Monitoring for the Grid”). • Operational Interface with the network: • Concept of the EGEE Network Operational Centre (ENOC). GridNets 2006 – 2006-10-01 – San Jose, CA, USA
Monitoring • Not Yet Another Monitoring Framework! • Role of a Mediator between the various monitoring frameworks and the various clients (diagnostic tools, middleware, etc.); • Network Performance Monitoring (NPM) gives access to data collected at existing monitoring frameworks (site, backbone); • Use of the NMWG interface to access those frameworks and republish data; • Special requirements for some middleware components for faster access to data. GridNets 2006 – 2006-10-01 – San Jose, CA, USA
Operational Interface • The network infrastructure of EGEE is mainly served by a set of NRENs via GÉANT2; • Need of an entity coordinating all the NOCs involved and the Grid Operations: • Concept of an end-to-end Coordination Unit (GÉANT2); • Providing an end-to-end operational support. • A single point of contact as an operational interface between EGEE and GÉANT2/NRENs dealing with: • Network problems troubleshooting, • Interactions with network providers and Grid sites, • Notifications from NRENs, • Network SLA installation and monitoring. • Two Functional Entities inside EGEE: • EGEE Network Operational Centre (ENOC); • A Network Trouble Ticket Manager – GGUS. GridNets 2006 – 2006-10-01 – San Jose, CA, USA
EGEE Network SupportUnits NRENs ENOC GGUS GÉANT2 Users ENOC • From the EGEE point of view: • GGUS acts as the first line support (interacts with the user); • Support units are the second level support; • From the NRENs’ point of view: • EGEE (via the ENOC) is a single entity; • The ENOC is the only point of contact for the NRENs (submitter of the problem). GridNets 2006 – 2006-10-01 – San Jose, CA, USA
ENOC (cont.) • Main challenges: • To create a network support structure inside EGEE; • To define the associated network operational procedures. • The ENOC is the user support for network failures: • End-to-End network problems troubleshooting; • Coordination unit of the actions of all the entities involved in a network incident; • Try to have an overall view of the end-to-end service, gathering information from all the involved domains; • SLA Management: installation and monitoring. • ENOC Operational Procedures have been defined and validated during the first phase of EGEE; • EGEE-II will fully implement ENOC. GridNets 2006 – 2006-10-01 – San Jose, CA, USA
ENOC (cont.) • ENOC Service: • Collect tickets from NRENs which agree to provide them to the ENOC; • Forward to GGUS the ones that seem relevant (possible impact on the Grid infrastructure); • Receive tickets assigned to ENOC by the GGUS 1st level support; • Troubleshoot them with the help of monitoring tools; • Contact identified faulty domains or reassign ticket to the associated site if there is no evidence of a backbone problem (e.g. LAN issue). • Main Issues: • Load on the ENOC team (amount of info, etc.); • Heterogeneity of systems the ENOC has to deal with (languages, trouble ticket format, monitoring, etc.). GridNets 2006 – 2006-10-01 – San Jose, CA, USA
ENOC status • ENOC team is ready! • 5 people (2 FTE) including one dedicated to it. • ENOC receives operational information from GÉANT2 and 10 NRENs (more to come): • About 80% of all the EGEE sites covered; • An average of 5 tickets handled per day; • 8 different languages. • Building tools to follow up or enhance the network support: • Network Operational Database (interconnection of administrative domains between the EGEE resource centres); • TT parsing and filtering tool; • Dashboard to present overall status of the “EGEE network”. GridNets 2006 – 2006-10-01 – San Jose, CA, USA
EGEE expectations • Towards a better solution against our “multi-domain” and “end-to-end” issues • Seamless access to network monitoring data: • GÉANT2 will provide such access (PerfSonar), from multiple domains, aggregating data from multiple frameworks; • Network resource reservation: • Requests expressed not in terms of service but of characteristics; • The choice of the underlying technology to fulfil them is up to the network; • Answer to a request = SLA (depending of the current network status & load); • What about the last mile? The non-NRENs domains? • Standardization of the operational interface: • Trouble Ticket format (data schema and exchange format); • Access method. GridNets 2006 – 2006-10-01 – San Jose, CA, USA
Summary & conclusion • Focus on providing end-to-end services in a multi-domain context: • Hiding the network complexity from the Grid (users, middleware, Grid support); • Hiding the Grid complexity from the network (single point of contact, operational interface); • Many building blocks depend on the providers: • Resource reservation frameworks, SLA installation, backbone monitoring; • Fortunately, EGEE and GÉANT2 built up a strong collaboration! • Many things remains pending: • Mainly on the operational side (homogenization of the network interface); • How to cope with domains outside the GÉANT2 cloud? • The two infrastructures need to collaborate on these aspects. GridNets 2006 – 2006-10-01 – San Jose, CA, USA
Thank you for your attention! GridNets 2006 – 2006-10-01 – San Jose, CA, USA