180 likes | 325 Views
Introduction LHCOPN dashboard (proposal functional design). Monitor Working Group: Initiated in Bologna 10 th & 11 th December 2009 WLCG MB mandate (see url below) First meeting 22 th January 2010 TC 26 th May 2010 TC 15 th June 2010
E N D
Introduction LHCOPN dashboard(proposal functional design) Monitor Working Group: • Initiated in Bologna 10th & 11th December 2009 • WLCG MB mandate (see url below) • First meeting 22th January 2010 • TC 26th May 2010 • TC 15th June 2010 • Barcelona 28th and 29th June 2010: first proposal Chairman: John Shade (CERN) Website: https://twiki.cern.ch/twiki/bin/view/LHCOPN/MonWG Full version of functional design proposal on above url. My name Hanno Pet <hanno.pet@sara.nl> (NL-T1 / SARA)
The problem LHC experiments and WLCG users have not enough insight in the functioning of the LHCOPN because: • Monitoring is decentralized at T0/T1 sites • Monitoring is not accessible to them The dashboard should solve these problems!
Requirements (1/4) The requirements of the dashboard are as follows: • Must only provide information about the LHCOPN keeping in mind the way application layers are using the LHCOPN. This means a full mesh of measurements is required • Must provide correct and up to date information about each site’s IPv4 connectivity in the LHCOPN • Must be simple for the LHC experiments and the WLCG user community • Must provide more in-depth information for the T0/T1 sites router operators. The router operators must be able to drill down into the dashboard to see which measurements are causing the degraded or down status
Requirements (2/4) • Must display a full mesh of end-to-end IPv4 unicast connectivity in the LHCOPN between each T0/T1 site • Must use the application programming interface (API) of the perfSONAR-MDM measurement points to collect the data which is necessary for the functioning of the dashboard • Must collect and display One Way Delay data gathered by the perfSONAR-MDM measurement points (and other parameters in the future) • Must store (historical) data in its own database
Requirements (3/4) • Must add new data from perfSONAR-MDM measurement points to its own database every <to be defined> minute(s) • Must refresh dashboard status each <to be defined> minute • Must provide an API for T0/T1 sites to generate alarms in their own NMS • Must be able to make end-to-end IPv4 unicast connectivity reports
Requirements (4/4) • Must be accessible via a web (https) interface for the LHC experiments and WLCG users with a grid certificate • More detailed information will be available for the T0/T1 sites router operators with a grid certificate • Must provide an explanation of the impact if end to end IPv4 unicast connectivity between two sites becomes degraded or down or if no data is available
Current perfSONAR-MDM implementation in LHCOPN (1/2) The GEANT application service desk has installed perfSONAR-MDM measurement points at each T0/T1 site with the following applications/tools: • Weathermap based on End to End Monitoring (E2EMON) information • E2EMON information (no E2EMON measurement point) • perfSONAR User Interface (UI)Alarm Service (Prototype based on Nagios)
Current perfSONAR-MDM implementation in LHCOPN (2/2) • Hades Performance Measurements • Bandwidth Test Control / Achievable Bandwidth (BWCTL, automated 1Gbit/s TCP Bandwidth Control Test) • One Way Delay (OWD) measurements using OWAMP • One Way Delay Variance / Jitter (OWDV) measurements using OWAMP • Packet loss (measured between Hades nodes) • Traceroute (number of hops between each Hades nodes) • Possibly duplicate packets (measured between Hades nodes) • Possibly out of order packets (measured between Hades nodes)
Dashboard approach The first version of the dashboard must be based on: • The “keep it simple” principle • The data which perfSONAR-MDM is already collecting at the moment Proposal is to use One Way Delay (OWD) (using One Way Active Measurement Protocol (OWAMP)) to make the first version of the dashboard to “monitor” end-to-end IPv4 connectivity between each site in the LHCOPN (full mesh). So OWAMP is “only” used to monitor connectivity and not yet used to monitor the delay itself. Later versions of the dashboard could include parameters that are new(er) to perfSONAR-MDM (i.e. packet loss, traceroute, achievable bandwidth, interface status, BGP status, OWD and OWDV)
Status on the dashboard The status of the end-to-end IPv4 unicast connectivity between sites must be shown on the dashboard in the following way: • Normal, availability of the end-to-end IPv4 unicast connectivity between site A en B is 100% in the given timeframe • Degraded, availability of the end-to-end IPv4 unicast connectivity between site A en B is less then 100% in the given timeframe • Down, availability of the end-to-end IPv4 unicast connectivity between site A en B is 0% in the given timeframe • No data, the dashboard server can connect to the perfSONAR-MDM measurement point on site but receives no data from the measurement archives.
Notifications Notification should be done via: • E-mail • RSS-feeds • API for integration into T0/T1 site NMS systems for raising alarms • Grid Notifications for LHC experiments We need to discuss this with gridnotification experts at the LHC experiments and askthemhowtheywouldintegratethis in their dashboards.
Questions Interesting to know: • Is this the right direction for the dashboard? • Is perfSONAR-MDM able to support this? • Is it possible to use OWAMP like this? • Are T0/T1 sites going to use this? • Are the LHC experiments going to use this? • Are WLCG users (physicists) going to use this? • Do we agree on the functional design?
WRAP UP Read the full version of the functional design! Please send your comments about this functional design to hanno.pet@sara.nl before the 5th of July 2010!! Thank you for your attention!