1 / 28

OCC, GOD, COD, CIC… a survival guide to the EGEE grid operations acronyms…

OCC, GOD, COD, CIC… a survival guide to the EGEE grid operations acronyms…. Diana for IT-GD-OPS. Overview . Who we are What we do in EGEE operations The tools we use I will not talk of the EGEE operations activities themselves but only of the tools we use!. Who we are. Danica Stojiljkovic

Download Presentation

OCC, GOD, COD, CIC… a survival guide to the EGEE grid operations acronyms…

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. OCC, GOD, COD, CIC…a survival guide to the EGEE grid operations acronyms… Diana for IT-GD-OPS

  2. Overview • Who we are • What we do in EGEE operations • The tools we use I will not talk of the EGEE operations activities themselves but only of the tools we use!

  3. Who we are Danica Stojiljkovic Judit Novak Diana Bosio Maite Barroso Maria Dimou David Collados Polidura Farida Naz Nick Thackray Steve Traylen Konstantin Skaburkas Zdenek Sekera Romain Wartel John Shade Antonio Retico

  4. What we do • OCC (Operations Coordination Centre) • Maite, Nick, John, Maria • CERN ROC (Regional Operations Centre) • Antonio, Danica, Diana, Farida, Steve • User support and GGUS (Global Grid User Support) • Maria, Diana • (Site Availability Monitoring) SAM service management • John, David, Judit, Konstantin • MPS (Middleware Preview Services) and MQS (Middleware Quality Services) aka PPS • Antonio, Danica, Konstantin, Farida • VOMS service management • Steve, Maria • Gridview • Zdenek • OSCT (Operational Security Coordination Team) • Romain

  5. The tools • GGUS: the main ticketing system • GOCDB: the site database • SAM: the site monitoring • CIC portal: the “operations website” • EGEE broadcast • COD dashboard • SAMAP: (one of the many) SAM user interfaces • Gstat: simple overview and statistics • Gridview: a monitoring interface • Gridmap: a view of the grid For all these tools you need a certificate by one of the grid CA to access the full functionality

  6. The tools: GGUS • Global Grid User Support • The reference ticketing system of the grid • http://ggus.org • You probably already know it as most of you are (or will be) part of the “software” support units. • But also a documentation repository for users • And a way to search the tickets and other useful grid documentation, aka “the knowledge base” A feedback of any kind and further requirements will always be considered by the GGUS team

  7. GGUS homepage

  8. Structure of GGUS support units

  9. The tools: GOCDB • Grid Operations Centres Database • It is the DB for all sites in EGEE • Used to declare the names of • Sites • Service nodes • Service managers • Used to declare (un)scheduled downtimes • http://goc.gridops.org If you are not listed there, you are not in the EGEE grid (production and pre-production)!

  10. GOCDB: CERN site

  11. The tools: SAM • Used to monitor production sites • https://lcg-sam.cern.ch:8443/sam/sam.py • Also used to monitor uncertified sites • https://lcg-sam.cern.ch:8443/sam-uncert/sam.py • Tests are sent from a central server (hosted at CERN) to all the sites registered in the GOCDB with monitoring on, and few more ad-hoc sites

  12. SAM screenshot

  13. The tools: CIC portal • Core Infrastructure Centre portal • For historical reasons dating back to EGEE • “The” operations portal • Has several TABS to give a different view, depending on the role of the person (RC, ROC, VO, COD…) • Tool where the COD dashboard is hosted • Tool to be used to send an EGEE broadcast • Tool where the VO ID cards are stored • Store basic information concerning the VO • Vo manager • Information on VOMS configuration • http://cic.gridops.org

  14. CIC portal homepage

  15. The tools: EGEE broadcast • Used to announce downtimes and other important messages • Hosted on the CIC portal • Used to communicate with the users • Used to communicate to site administrators of other grid sites, • It prevents users, site administrators and VO managers to remember everybody’s mailing list addresses • It groups logical units • All sites under one ROC • All (pre)-productions sites

  16. A BROADCAST screenshot

  17. The tools: the COD dashboard • Used by the CIC On Duty (CODs) to monitor sites daily • Alarms are raised, and if deemed necessary (as it is often the case) a ticket is open • A COD ticket always corresponds to a GGUS ticket • An e-mail is sent to the site in parallel according to a template • Several escalation steps • Tickets are followed up by the CODs and escalation steps are taken if deemed necessary

  18. COD dashboard

  19. SAMap • Accessible via the COD dashboard • User interface for SAM submission tests • Hosted and maintained at poznan, but uses two dedicated WMS, one at CERN, the other at CNAF. • Used by the COD to send one-shot SAM tests to specific sites • Can also be used to set up a cron job to send several tests over few days (max. 3)

  20. SAMap screenshot

  21. gstat • Collects useful information on the grid sites. • Has evolved to host several grids, not just EGEE • First tool of this kind • Very useful as a first debugging tool to realise if something is going wrong

  22. Gstat” screenshot

  23. The tools: gridview • Interface to SAM • Developed by the gridview team at BARC (India) • Collects also monitoring data from other sources • Used to measure availability of sites • Useful to extract graphs with various views • http://gridview.cern.ch/GRIDVIEW/

  24. Gridview screeshot: data transfer

  25. Gridview job

  26. Gridmap • New tool, still under development • Gives a two dimensional view of the grid • Very nice visualization approach • Sites are represented by squares whose dimension is proportional to the number of CPUs they have

  27. Gridmap: screenshot

  28. Where you listening? • So, now that you are experts… • What was the (deliberate) mistake? • What does COD stand for? • What is OCC? Thanks for listening, and I hope it was useful!

More Related