1 / 29

Operations structure of the INFN-GRID/Grid.it Production Grid Infrastructure

Operations structure of the INFN-GRID/Grid.it Production Grid Infrastructure. Presenter (on behalf of the authors): Cristina Vistoli cristina.vistoli@cnaf.infn.it Italian grid operation manager INFN CNAF – Bologna - Italy. Production Quality Grid Infrastructure.

jaime-britt
Download Presentation

Operations structure of the INFN-GRID/Grid.it Production Grid Infrastructure

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Operations structure of the INFN-GRID/Grid.it Production Grid Infrastructure Presenter (on behalf of the authors): Cristina Vistoli cristina.vistoli@cnaf.infn.it Italian grid operation manager INFN CNAF – Bologna - Italy

  2. Production Quality Grid Infrastructure • Status of the infrastructure • Operations structure and organization • Grid monitoring and management • Usage report and accounting • User and operation support

  3. The Italian Grid Production Infrastructure • about 40 Resource Centers • The grid resources can be accessed through central or VO-specific services (e.g. Resource Brokers) • 28 sites are also part of the EGEE/LCG Grid infrastructure (and are registered in the central database of the Grid Operation Center) • the other 12 sites can be accessed through the Italian grid services only http://grid-it.cnaf.infn.it

  4. Production Infrastructure: Resources

  5. InfnGrid-2_7_0 • InfnGrid-2_7_0 customization of LCG-2_7_0: • Support for the following VOs: • egrid, babar, zeus, biomed, magic, esr, cms, atlas, lhcb, alice (managed via LDAP VO server); • pamela, infngrid, cdf, gridit, compchem, planck, bio, enea, theophys, ingv, inaf, virgo, argo  (managed via VOMS server); • euchina, eumed (optional and managed via VOMS server). • DGAS (DataGrid Accounting System) : • Patched WMS lcg2.1.73 on the Resource Broker to support DGAS • DGAS HLR (Home Location Register) server: it is responsible for keeping the accounting information for both users and grid resources. • Network Monitor Element, interfaced with GridIce for data presentation.

  6. InfnGrid-2_7_0 • support for MPI jobs via home synchronization with scp with host based authentication • Customized tools to install and use the grid: • installation by a customized version of LCG yaim (ig-yaim) • support to interface ig-yaim with a Quattor installation; • UIPnP: a PlugAndPlay User Interface to access the grid as user of every Linux system without RPMs.

  7. InfnGrid-2_7_0: deployed services BDII INFNGRID-2_7_0 MyProxy Gridice FTS HLR LFC VOMS RB (DGAS)

  8. Operations Structure and Organization The National Grid Central Management Team (CMT): • Activities: • ‘integration’ and testing of the InfnGrid middleware release (based on LCG m/w release) • deployment procedures and configuration tools • Monitoring and control of the status of the grid services and resources • Responsibilities: • site registration procedure • middleware deployment • certification procedure for all InfnGrid sites • Operation of the GRID services

  9. Grid Central Management Team • Deployment Plan • The team coordinates the installation and deployment of the grid services. A plan is provided to: • ensure that the user support and service level provided to the grid users during the upgrade period is acceptable • simplify the certification activities (all resources are thoroughly tested before joining the infrastructure). • Site registration procedure • Site certification procedure

  10. Operations Support • The Italian ROC provides local front line support to Virtual Organization, Users and Resources Centres • The Italian Roc team is organized in daily shifts: • 2 people per shift, 2 shifts per day, from Monday to Friday. • Activities planned during the shift • Log trouble tickets created, updated and closed, problems on grid services and sites, monitor successful site certification • check the actions of the previous shift and the downtime page • check the status of production grid services and the GRIS status of production CE and SE. • check the status of the production sites using the Site Functional Tests report • Periodic (every 15 days) phone conferences • ROC/CIC teams and site managers • Provide and write the ROC report for the weekly EGEE operation meeting

  11. Grid Monitoring • The status of the Italian grid infrastructure is monitored using GridIce, • It is one of the monitoring tools used by EGEE • It is used to control • the status of the submitting queues • Process/daemons status in the services (RB, BDII) • VO view: list of CE and SE available for a the VOs and their status and capacity • Job monitoring

  12. Monitoring

  13. Accounting • The DataGrid Accounting System (DGAS) has been developed within the EDG and EGEE project. • It implements a resource usage metering and economic accounting in a fully distributed grid environment • It is part of the InfnGrid middleware release and has been deployed on the Italian Grid Infrastructure • Grid computing resources and grid users are registered in appropriate servers, known as HLRs (Home Location Registers), which keep track of every submitted job. An arbitrary number of HLR servers can be used

  14. DGAS HLR flow

  15. Accounting • Accounting data can be retrieved from the HLRs with different aggregation levels: • single-user • group of users • VO • resource • A functional test has been developed and it is used to monitor the stability of the service. It checks the functionality of the sensors and services running on the CE and the communication between CEs and HLRs • DGAS data for the Italian Grid are aggregated/anonymized and provided to EGEE through an appropriate interface to Apel. • More information on http://www.to.infn.it/grid/accounting/

  16. Jobs per site (January, 15 – 31) Total jobs =179.310

  17. Jobs per site (January, 15 – 31)

  18. Jobs per VO (January, 15 – 31)

  19. Jobs report (January, 15 – 31)

  20. User, Operation and VO support • The user support system provides tickets exchange between: • ROC on Duty and site managers • Site managers and Central management team and viceversa • Site manager and certification team during installation/upgrade • GGUS to ROC  ROC to GGUS

  21. The support system • Italian ROC ticketing system is built upon a suite of web based tools written in PHP: Xhelp • The support system components are accessible form the main interface of the deployment portal (grid-it.cnaf.infn.it) providing a SSO point of registration/identification certificate-based. • The end-user can open a request, view and follow his own tickets and related replies; • A supporter can view tickets assigned to his own groups, add responses and solutions, and change status/priority • While operating tickets, a side content is always available for all classes of users (related to their access level) • Site Functional Tests, • site downtimes calendaring system • file archive • net query tools • IRC applet, contextual questions and answers • reports from daily shifts

  22. Interface with GGUS • The Italian ROC support system is interfaced to the GGUS helpdesk application using web-services technologies • Secure methods to create and update trouble tickets in the GGUS database are provided by the GGUS application. • These methods are called by APIs that wrap into SOAP messages the ticket information stored in the XHelp database, and send them to the WSDL contact URL. • A trouble ticket submitted by a local user to the XHelp helpdesk that cannot be addressed locally, can be escalated by the local supporter across the ROC boundaries. • The system allows for ticket assignment to any other support unit of GGUS as well as all other ROC helpdesks connected to GGUS via the interface. • The ticket is shared among all the helpdesk’s databases involved in the workflow, can be updated from every source, and any update will propagate towards all the other systems.

  23. GGUSROC Basic Workflow GGUS System ROC-1 Helpdesk ROC-1 Interface SU-1 Ticket assignment to ROC-1 SU-2 Ticket solved GGUS/TPM Ticket re-assigned Web Portal SU-N ROC-X Helpdesk ROC-X Interface SU-1 SU-2 SU-N

  24. GGUS! A new ticket comes from GGUS We assign the ticket to the site

  25. The site's support group reassigns the ticket to GGUS …and adds a response!

  26. Trouble tickets statistics

  27. Authors CAROTA, Luciana INFN-CNAF NEBIOLO, Federico INFN- Torino CALTRONI, Andrea INFN – Padova DONVITO, Giacinto;, INFN - Bari VERLATO, Marco, INFN – Padova BAGNASCO, Stefano, INFN - Torino BRUNETTI, Riccardo, INFN - Torino DACRUZ, Marcio, INFN - Milano; BARCHIESI, Alex INFN - Roma FIORE, Sandro – Univ. Lecce ARGENTATI, Sabrina – INFN - LNF DALLA FINA, Simone – INFN - Padova ; DELLE FRATTE, Cesare; INFN – Roma2; TURRISI, Rosario INFN - Padova ; GREGORETTI, Francesco , CNR-ICAR Napoli VISTOLI, Maria Cristina, INFN-CNAF GAIDO, Luciano INFN-Torino SELMI, Matteo INFN-CNAF PAGANO, Alfredo INFN-CNAF AIFTIMIEI, Cristina INFN – Padova CUSCELA, Guido INFN - Bari CAVALLI, Alessandro INFN – CNAF FERRO, Enrico INFN – Padova FANZAGO, Federica INFN – Padova FANTINEL, Sergio INFN – LNL VACCAROSSA, Luca INFN – Milano CESINI, Daniele INFN-CNAF PAOLINI, Alessandro, INFN-CNAF VERONESI, Paolo INFN-CNAF

More Related