280 likes | 382 Views
OCC, GOD, COD, CIC… a survival guide to the EGEE grid operations acronyms…. Diana for IT-GD-OPS. Overview . Who we are What we do in EGEE operations The tools we use I will not talk of the EGEE operations activities themselves but only of the tools we use!. Who we are. Danica Stojiljkovic
E N D
OCC, GOD, COD, CIC…a survival guide to the EGEE grid operations acronyms… Diana for IT-GD-OPS
Overview • Who we are • What we do in EGEE operations • The tools we use I will not talk of the EGEE operations activities themselves but only of the tools we use!
Who we are Danica Stojiljkovic Judit Novak Diana Bosio Maite Barroso Maria Dimou David Collados Polidura Farida Naz Nick Thackray Steve Traylen Konstantin Skaburkas Zdenek Sekera Romain Wartel John Shade Antonio Retico
What we do • OCC (Operations Coordination Centre) • Maite, Nick, John, Maria • CERN ROC (Regional Operations Centre) • Antonio, Danica, Diana, Farida, Steve • User support and GGUS (Global Grid User Support) • Maria, Diana • (Site Availability Monitoring) SAM service management • John, David, Judit, Konstantin • MPS (Middleware Preview Services) and MQS (Middleware Quality Services) aka PPS • Antonio, Danica, Konstantin, Farida • VOMS service management • Steve, Maria • Gridview • Zdenek • OSCT (Operational Security Coordination Team) • Romain
The tools • GGUS: the main ticketing system • GOCDB: the site database • SAM: the site monitoring • CIC portal: the “operations website” • EGEE broadcast • COD dashboard • SAMAP: (one of the many) SAM user interfaces • Gstat: simple overview and statistics • Gridview: a monitoring interface • Gridmap: a view of the grid For all these tools you need a certificate by one of the grid CA to access the full functionality
The tools: GGUS • Global Grid User Support • The reference ticketing system of the grid • http://ggus.org • You probably already know it as most of you are (or will be) part of the “software” support units. • But also a documentation repository for users • And a way to search the tickets and other useful grid documentation, aka “the knowledge base” A feedback of any kind and further requirements will always be considered by the GGUS team
The tools: GOCDB • Grid Operations Centres Database • It is the DB for all sites in EGEE • Used to declare the names of • Sites • Service nodes • Service managers • Used to declare (un)scheduled downtimes • http://goc.gridops.org If you are not listed there, you are not in the EGEE grid (production and pre-production)!
The tools: SAM • Used to monitor production sites • https://lcg-sam.cern.ch:8443/sam/sam.py • Also used to monitor uncertified sites • https://lcg-sam.cern.ch:8443/sam-uncert/sam.py • Tests are sent from a central server (hosted at CERN) to all the sites registered in the GOCDB with monitoring on, and few more ad-hoc sites
The tools: CIC portal • Core Infrastructure Centre portal • For historical reasons dating back to EGEE • “The” operations portal • Has several TABS to give a different view, depending on the role of the person (RC, ROC, VO, COD…) • Tool where the COD dashboard is hosted • Tool to be used to send an EGEE broadcast • Tool where the VO ID cards are stored • Store basic information concerning the VO • Vo manager • Information on VOMS configuration • http://cic.gridops.org
The tools: EGEE broadcast • Used to announce downtimes and other important messages • Hosted on the CIC portal • Used to communicate with the users • Used to communicate to site administrators of other grid sites, • It prevents users, site administrators and VO managers to remember everybody’s mailing list addresses • It groups logical units • All sites under one ROC • All (pre)-productions sites
The tools: the COD dashboard • Used by the CIC On Duty (CODs) to monitor sites daily • Alarms are raised, and if deemed necessary (as it is often the case) a ticket is open • A COD ticket always corresponds to a GGUS ticket • An e-mail is sent to the site in parallel according to a template • Several escalation steps • Tickets are followed up by the CODs and escalation steps are taken if deemed necessary
SAMap • Accessible via the COD dashboard • User interface for SAM submission tests • Hosted and maintained at poznan, but uses two dedicated WMS, one at CERN, the other at CNAF. • Used by the COD to send one-shot SAM tests to specific sites • Can also be used to set up a cron job to send several tests over few days (max. 3)
gstat • Collects useful information on the grid sites. • Has evolved to host several grids, not just EGEE • First tool of this kind • Very useful as a first debugging tool to realise if something is going wrong
The tools: gridview • Interface to SAM • Developed by the gridview team at BARC (India) • Collects also monitoring data from other sources • Used to measure availability of sites • Useful to extract graphs with various views • http://gridview.cern.ch/GRIDVIEW/
Gridmap • New tool, still under development • Gives a two dimensional view of the grid • Very nice visualization approach • Sites are represented by squares whose dimension is proportional to the number of CPUs they have
Where you listening? • So, now that you are experts… • What was the (deliberate) mistake? • What does COD stand for? • What is OCC? Thanks for listening, and I hope it was useful!