100 likes | 237 Views
September 2010 GDB. LHCOPN Update. John Shade /CERN IT-CS. Working Groups. LHCOPN Operations and Monitoring WGs F2F LHCOPN meetings London 8/9 March ( http:// indico.cern.ch/conferenceDisplay.py?confId =80755 )
E N D
September 2010 GDB LHCOPN Update John Shade /CERN IT-CS
Working Groups • LHCOPN Operations and Monitoring WGs • F2F LHCOPN meetings • London 8/9 March (http://indico.cern.ch/conferenceDisplay.py?confId=80755) • Barcelona 28/29 June (http://indico.cern.ch/conferenceDisplay.py?confId=88698) • Operations WG • Quarterly phone conferences • Track correlation between outages and GGUS tickets • Monitoring WG • Conference calls in May/June, numerous e-mail exchanges • perfSONAR MDM setup & deployment • LHCOPN Dashboard design • Mailing list: LHCOPN-Interest@cern.ch J. Shade/GDB LHCOPN Update
Monitoring • Working with DANTE to get a robust MDM solution in place (perfSONAR rollout had stalled) • Clarified how to access performance data, and defined requirements for a dashboard for visualisation: • See https://twiki.cern.ch/twiki/bin/view/LHCOPN/MonWG • Comments on Requirements document are still welcome! J. Shade/GDB LHCOPN Update
Monitoring • Missing a central view of LHCOPN • Weathermapand e2emon applications restricted to GEANT portal • HADES data: • Bandwidth Test Control / Achievable Bandwidth (BWCTL, automated 1Gbit/s TCP Bandwidth Control Test) • One Way Delay (OWD) • One Way Delay Variance / Jitter (OWDV) • Packet loss • Traceroute(number of hops between two Hades nodes) • Duplicate packets • Out of order packets J. Shade/GDB LHCOPN Update
Initial (simple) algorithm • Site status is up when OWD between +/-15% from baseline and packet loss less than 0.1% per five minutes • Site status is down when packet loss = 100% per five minutes • Site status is degraded when measurement values are between a) and b). J. Shade/GDB LHCOPN Update
Prototype Dashboard J. Shade/GDB LHCOPN Update
Prototype Dashboard J. Shade/GDB LHCOPN Update
Where do we go from here? • DANTE baulked at the idea of developing their prototype further and supporting it • SARA and CERN have picked up the gauntlet • SARA developers have tested XML query/responses against the central HADES repository at DFN • TOM team leader is evaluating how best to develop/integrate the LHCOPN dashboard • Sites already have local monitoring, but we need to provide a central view! • Nagios probes for sites are also expected J. Shade/GDB LHCOPN Update
Upcoming Events • Next F2F LHCOPN meeting will take place at CERN on 7th-8th October • Agenda: http://indico.cern.ch/conferenceDisplay.py?confId=102716 • Includes participants from Internet2, DANTE, T1s etc. • Topics to be covered include: • Tier2 Connectivity Requirements • Service Level Definition • GGUS • Monitoring • Operations J. Shade/GDB LHCOPN Update