220 likes | 235 Views
Explore the challenges and solution of deploying and supporting a large-scale Multi-Domain Monitoring Service for the Large Hadron Collider Optical Private Network (LHC-OPN), utilizing perfSONAR. Discover the topology, requirements, benefits, and seamless monitoring capabilities across domains.
E N D
perfSONAR Multi-Domain Monitoring Service Deployment and Support: The LHC-OPN Use Case Fausto Vetter, Domenico Vicinanza DANTE TNC 2010, Vilnius, 2 June 2010
Agenda • Large Hadron Collider Optical Private Network (LHC-OPN) • Multi-Domain monitoring challenge: • perfSONAR • GÉANT Multi Domain Monitoring Service • GÉANT Service Desk • The LHCOPN case: • Deployment • Support • Monitoring
LHC-OPN • Large Hadron Collider – Optical Private Network (LHC-OPN): • Dedicated network to support LHC experiment • Large amount of data in a grid environment • Network architecture is organized in Tiers • 1 Tier0, 11 Tier1, 140+ Tier2 • Primary users are researchers around different institutes • Requirement: Large amount of data being exchanged • Strategy: Keep traffic segregated from Internet • Solution: Optical Private Network (LHC-OPN) among Tier 0/1s • Challenge: monitoring effectively in a multi-domain environment
LHC-OPN Topology • Dual-star topology • 10 Gb/s links • Cross-border fibers • resiliency • Multi-domain LHC-OPN Topology
Monitoring the LHC-OPN:The requirements • Focus of monitoring: • Network Layer (IP) • Physical Layer (Links) • Regular Active Point-to-Point Measurements • One-Way Delay, One-Way Delay Variation, Achievable Bandwidth, Historical Traceroute Changes • Regular Passive Point-to-Point Measurements • Utilization, Input Errors, Packet Discards • End-to-End link monitoring • Managed service • Unified view of the network status and information across all sites • Homogeneous installations and centralized operations
Monitoring the LHC-OPN:The solution - perfSONAR • The Tool: perfSONAR • GÉANT multi-domain monitoring (MDM) tool • Based on Open Grid Forum Standard Monitoring Protocol • Customized, fully managed and supported for LHCOPN • Objective: • Identify network problems across multiple domains • Correctly, efficiently and quickly • Allowing proactive actions • Strategy: • perform network monitoring actions in different network domains • make the information available thanks to a common protocol • cross-domain monitoring capability • access network performance metrics from across multiple domains
perfSONAR as unifying layer across domains perfSONAR Services Domain 1 Domain 2 perfSONAR Each domain has its own local monitoring Domain 3 Domain 4 perfSONAR UI (visualization) Scripts/API
Monitoring the LHC-OPN:The benefits • Effective monitoring across the several LHC-OPN domains • perfSONAR enables multi-domain monitoring • Problems can be tracked through the participating domains from a single interface • …proactively solving problems across domains • Effective, distributed monitoring can identify problems even before users suffer them • … through a customized web portal • Monitoring portal designed according to LHCOPN needs • Easy to integrate into involved NOCs workflows • Less disruptions and faster recovery • Easy to take and foster collaborative efforts • Fully managed solution: • Low overhead for the Tier0/1 network operators involved • Configuration, Operation and Support carried out by GÉANT SD
perfSONAR at LHC-OPN • 12 sites (1 Tier0, CERN, and 11 Tier1) involved • Several Countries around Europe, Asia and America • Access to network measurements data from multiple network domains • Customized version of perfSONAR MDM service for Tier0/1 sites (main contributor to LHCOPN operations) • Customized visualization tool accessed: • Dedicated web portal • Specific weather maps and further diagnosis tools to visualize measurements results • Monitoring tools, hardware and operating system packed in monitoring boxes, • To be easily deployed at any location • Remotely accessible by the service desk for operations and support
GÉANT MDM Service Designfor LHCOPN • Two servers installed in each site (Tier0 and Tier1) : • Server 1 (HADES): • one way delay, one way delay variation, achievable bandwidth, historical traceroute changes • Server 2 (MDM): • regular passive measurements carried out for collecting interface utilisation, input error and packet discards statistics from the sites network elements • Each site provided: • Gigabit port on the border router • Switch • Time Sources • DNS Servers
GÉANT Application Service Desk • Deployment carried out by the GÉANT Application Service Desk • Dedicated Staff • Manage the Users Relationship • Responsible for Incident Management • Interact with Problem Management/Product Management to Improve Products • Acts as a Single Point of Contact: • Usage of Products • Deployment of Products • Debugging Issues on Products • Focus on transition and operations of the services delivered
GÉANT MDM Service Transition • Service deployment: two workflows • Server 1: OS and Software installed and configured by a GÉANT partner • Server 2: OS and Software entirely installed and configured remotely • Phase details: • Pre-Shipment: gather information about deployment details • Pre-Shipment Form • Shipment: servers shipment to GÉANT partner and customer • Receive Boxes: customer and configuration partner receives boxes • Preparation: • Pre-Deployment Form • Third party supplier prepares servers • Physical Installation • Deployment: software installation • Configuration: service configuration • Validation
perfSONAR services monitoring • Service Monitoring Infrastructure (based on Nagios+Cacti): • Customised set of testing scripts and health checks • 35 Checks per server, covering hardware, software and services • Automatic notification, detailed history • Three layer monitoring: • Hardware layer: CPU, MEM, disk space, network interfaces, TCP/UDP traffic, temperature • Resource layer: login attempts, Tomcat RRT, eXist RTT, MySQL, NTP • Service layer: perfSONAR services availability and performance • Additional tools: • Syslog server (with MySQL support) • security log auditing (with automatic email report tools)
GÉANT MDM Service Operations:incident management procedures • Well defined procedures for Incident Management: Third party supplier involved Incident Management
Conclusions • GÉANT Application Service Desk: • Effective single point of contact in complex deployments • LHC-OPN use case: • great opportunity for service & support infrastructure • Reasons for a successful deployment: • Preparation phase is crucial • Adequate tools for event and incident management • Customer collaboration was the main player on the deployment. • Continuous service improvement • Periodic meetings with involved parties • Quality audits about the deployment
Final Remarks • Thanks to: • perfSONAR community • GÉANT partners • DANTE • perfSONAR development team • CERN and its partners • Thanks for your attention • Any questions and/or comments?