130 likes | 236 Views
European Update: EGEE-JRA4 and UK. NM-WG, GGF15, Boston, 3 rd October 2005. M J Leese CCLRC Daresbury Laboratory m.j.leese@dl.ac.uk. Contents. Update on EGEE-JRA4 UK update Diagnostic Tool demo Diagnostic Tool itself (visualisation) JRA4 obtaining data from DANTE and Internet2
E N D
European Update: EGEE-JRA4 and UK NM-WG, GGF15, Boston, 3rd October 2005 M J Leese CCLRC Daresbury Laboratory m.j.leese@dl.ac.uk
Contents • Update on EGEE-JRA4 • UK update • Diagnostic Tool demo • Diagnostic Tool itself (visualisation) • JRA4 obtaining data from DANTE and Internet2 • Provides some contrast with perfSONAR • More interface than infrastructure • GOCs/NOCs • Grid middleware • Network intensive/dependant end users Mark Leese - Daresbury Laboratory
EGEE-JRA4 • EGEE = Europe’s latest Grid project, successor to EDG • Joint Research Activity 4 (JRA4) = group responsible for “Development of Network Services”, inc. Network Performance Monitoring (NPM) • Not the same as GN2-JRA1 et al - these are GÉANT2 projects • There are various monitoring tools and frameworks available: • We (JRA4) are not building another one! The work is about standardising access to NPM data across multiple domains and using it. • NPM activity includes: • Mediator (standardise access to NPM data) • Diagnostic Tool (use the data) • Publisher (provide NPM data to Grid middleware) Mark Leese - Daresbury Laboratory
NPM Status • Mediator deliverable DJRA4.2 was produced in PM9 (Dec ‘04): • “Specification of Interfaces for Network Performance Monitoring” document • First software prototype • Proves we can harness (multi-domain) backbone and end-site tools together • Low level framework only • Could be throw-away – designed as learning exercise to inform on architecture and interfaces • For more info (and prototype design doc): https://edms.cern.ch/document/533215/ • 2nd Mediator prototype (MJRA4.3) produced PM12 (March ’05): • Adds certificate based security • Strong focus on deployment of end-to-end monitoring infrastructure (i.e. WP7) • For more info: https://edms.cern.ch/document/575484/ • Diagnostic Tool (deliverable MJRA4.6) delivered PM18 (Sept ’05): • More later • Publisher: discussions ongoing, but work doesn’t start in earnest until October Mark Leese - Daresbury Laboratory
GOC/NOC Diagnostic Client NM-WG End Site Home grown NM-WG NM-WG NM-WG Backbone GN2 Backbone Perfmonit Backbone piPEs NPM Architecture (1) Some Client NM-WG End Site EDG WP7 Mark Leese - Daresbury Laboratory
GOC/NOC Diagnostic Client NM-WG JRA4 NPM Mediator NM-WG End Site Home grown NM-WG NM-WG NM-WG Backbone GN2 Backbone perfmonit Backbone piPEs NPM Architecture (2) Some Client NM-WG End Site EDG WP7 Mark Leese - Daresbury Laboratory
Client Application Web Service NPM Mediator Discoverer Aggregator Response Cache Web Service Web Service Network Monitoring Infrastructure Network Monitoring Point NPM Architecture (3) • Human & machine users interact via client application, “speaking” NM-WG • Discoverer locates MP(s) or infrastructures that can answer the client’s query…currently static list • Aggregator • obtains query results from MP(s) • aggregates results (if necessary) • To improve performance and reduce loading, results of recent requests will be cached • Discovery, aggregation and caching all big areas with wide application…but we need time for these, so maybe EGEE-II Mark Leese - Daresbury Laboratory
Grid Information System JRA4 NPM Publisher NM-WG End Site Home grown 1..n NM-WG End Site EDG WP7 NPM Architecture (4) “Publisher” for Grid Middleware • GIS holds data in summarised form suitable for middleware (e.g. network cost function) • Publisher has two components: • Registry - holds information about MPs to regularly contact for latest data • Data Manager - gathers data from MPs and publishes it to GIS in correct format • Publisher designed to give middleware efficient access to network performance data • Important: We’re mostly networking people but this is Grid Global Forum • Very relevant to NMA-RG Mark Leese - Daresbury Laboratory
Diagnostic Tool • Like perfSONAR we want to make use of the collected data available via a unified interface. So we creating a prototype Diagnostic Tool aimed at helping NOCs and GOCs detect and diagnose network problems • Initial requirements from: • joint EGEE JRA4-SA2 user requirements doc • UK GOSC and NREN, German NOC, UK projects • experience of group members • Requirements net not cast wider as many groups were unsure of: • what was available • what they wanted/what metrics Grid applications are dependant on • what visualisations are possible/the most useful • So we need a prototype to solicit comments on – blank piece of paper was staying blank • Of course, what we can achieve is dependant on available monitoring infrastructures (WP7, perfmonit etc.) • Do we collected all metrics of interest? e.g. traceroute tests just been added • Lack of on-demand tests could be a limiter, although DFN (German NREN) say tests every 5-15 mins is sufficient • However, the prototype can be seen as a proof of concept with these other things coming later • The kinds of things to be provided are: • the usuals, e.g. historic plots of available bandwidth • Can’t/won’t provide: • display real-time information on the load on a connection – this is about diagnose faults, not 24/7 monitoring • Network topology…although... Mark Leese - Daresbury Laboratory
NSAP • GÉANT2-SA3 group plan to deploy Network Service Access Points (NSAPs) • each domain of DANTE’s extended-QoS network • GÉANT2 and NRENs + QoS compliant regional, metropolitan and campus networks. • NSAPs provide access to network services such as BAR • Will also provide network topology database, via NIS (Network Information Service) for which GN2-SA3 will produce a reference implementation. Mark Leese - Daresbury Laboratory
UK GridMon • “...design and deploy an infrastructure for network performance monitoring within the UK e-Science community” – June 2002 • MPs (Monitoring Points) at each UK e-Science Centre • Full mesh of tests • Human access (www interface) to monitor performance, find faults • Plans to add NM-WG interface for requesting and publishing performance data to Grid m/ware and apps not network operators
Current UK Work (1) • Well received and grew interest (e.g. UK HEP/PP community), but… • Version 1 infrastructure proved to be unsustainable • most institutions were helpful, but… • varying spec of machines, flavours of Linux, security rules etc. • V1 MP: • Ran tests • Stored data locally • Served data to human users using web server running on the MP • Would have provided WS i/f using Tomcat running local • Grew interest and a useful learning exercise • V2 MP will: • Run tests • Write data back to central DB at DL and one other • Revised web i/f and WS i/f will be provided by machines co-located with DBs • MP is thus much simpler, and brains of the operation are centralised at two, more accessible, sites
Current UK Work (2) • Revised Web Interface: • status map and graphs as before, human version of request interface • useful contrast with JRA4 DT: • GridMon = UK only, but DB is co-located and accessed more natively (PerlDBI over TCP) • JRA4 DT = can access any NM-WG compliant infrastructure but WS interface is not exactly efficient for graph plotting • Happy to receive comments/suggestions on MPs and human interfaces • Lots of different approaches to deployment and dissemination being used throughout the World. Not necessarily always recreating the wheel. Hopefully we’ll eventually see what’s best for each scenario.