200 likes | 324 Views
GSSD Mandate: Progress and next steps. Flavia Donno CERN/IT. The mandate. Deployment of SRM v2.2 by the end of 2007: Establishing a migration plan from SRM v1 to SRM v2 so that the experiments can access the same data from the 2 endpoints transparently.
E N D
GSSD Mandate: Progress and next steps Flavia Donno CERN/IT Grid Deployment Board 9 January 2008
The mandate Deployment of SRM v2.2 by the end of 2007: • Establishing a migration plan from SRM v1 to SRM v2 so that the experiments can access the same data from the 2 endpoints transparently. • Coordinating with sites, experiments, and developers the deployment of the various 2.2 SRM implementations and the corresponding Storage Classes. • Coordinating the Glue Schema v1.3 deployment for the Storage Element and ensuring that the information published is correct. • Coordinating the provision of the necessary information by the Storage Providers in order to monitor the status of storage resources, ensuring that all sites provide the experiments with the requested resources and with the correct usage. Grid Deployment Board, CERN 9 January 2008
The mandate • Subsequently the group should clarify the data access strategy for the different experiments. Clarify how they plan to use existing Data Management tools to access files by grid jobs […]. The GSSD group will therefore work to: • Ensure transparency of data access and the functionalities required by the experiments (see Baseline Service report). • One of the outcome of the GSSD group is to produce precise guidelines per implementation focusing on deploying the specific storage classes for a specific VO. If needed, GSSD will organize tutorials (for Tier-2s) and agree on mini SC targets with the experiments. Grid Deployment Board, CERN 9 January 2008
The subgroups The work of the group started in January 2007. A few subgroups were created for: • Studying a strategy and proposing technical solutions for the migration from SRM v1 to SRM v2 (goal A) • Collecting experiment requirements on storage classes and space tokens (goal B) • Ensuring that the available implementations of SRM v2.2 were functional and responding to user requirements (goal B) • Creating a Grid Storage Infrastructure for testing activities (goal B) • Studying the Glue Schema v1.3 and identifying the mandatory object classes to be used for SRM v2.2 (goal C) • Ensuring that the monitoring information required by the experiments were available (goal D) Grid Deployment Board, CERN 9 January 2008
The work • Studying a strategy and proposing technical solutions for the migration from SRM v1 to SRM v2 (goal A) • S. Burke coordinator, 15/03/2007- how to access files transparently from SRM v1 and SRM v2 endpoints: https://twiki.cern.ch/twiki/pub/LCG/GSSDSubGroups/SRM2Recommendations10.doc • F. Donno coordinator: testing several usage scenarios taken from experiments activities through high level tools, 11/07/2007 - https://twiki.cern.ch/twiki/pub/LCG/GSSD/SRMV2_v2-1.pdf - lcg-utils, gfal, FTS tested against SRM v1 and SRM v2 endpoints, report produced: https://twiki.cern.ch/twiki/bin/view/LCG/Bug_1_10 • F. Donno, S. Campana, A. Di Girolamo – ATLAS production activities ongoing • F. Donno, N. Magini, D. Bonacorsi – CMS transfer tests ongoinghttps://twiki.cern.ch/twiki/bin/view/CMS/CMStestingSRM2 • How long will the SRM v1 endpoints survive ? Grid Deployment Board, CERN 9 January 2008
The work (next step ?) • Studying a strategy and proposing technical solutions for the migration from SRM v1 to SRM v2 (goal A) • The testing and production experiment activities have revealed a set of problems that should be followed and tackled with experiments, sites, storage solution developers (srmcopy vs. urlcopy, monitoring and debugging, hanging requests, file access, etc.) • It is more a question of support but it involves many parties Grid Deployment Board, CERN 9 January 2008
The work • Collecting experiment requirements on storage classes and space tokens (goal B) • Activity carried on by F. Donno and M. Litmaath • Several meetings with LHCb, CMS and ATLAS, first concrete proposals already available in January 2007 • This activity is ongoing within CCRC08: https://twiki.cern.ch/twiki/bin/view/LCG/GSSD Grid Deployment Board, CERN 9 January 2008
The work • Ensuring that the available implementations of SRM v2.2 were functional and responding to user requirements (goal B) • Study and analysis of the SRM spec performed (F. Donno) with list of critical issues compiled and finally resolved by the SRM collaboration in April 2007. • S2 test suite reflecting the study and the decisions taken. Used to monitor progress. Always enhanced with more tests and more test families (basic, usecase, cross, stress) – Stress tests are a very heavy activity … Not completed as needed • Deployment started in July 2007 but exposed several problems from management and operations perspectives. • In September 2007 sites agree to upgrade their production instance to the version supporting SRM v2.2. Development activities are reduced in order to focus on the deployment in production Grid Deployment Board, CERN 9 January 2008
The work (next step ?) • Ensuring that the available implementations of SRM v2.2 were functional and responding to user requirements (goal B) • Open issues still remain: • srmChangeSpaceForFiles? srmCopy? – Are they needed ? • How can data in D1 be “purged” (srmPurgeFromSpace not available) ? • srmGetSpaceMetaData (Used/Available Space), etc. • Issues still pending (check previous GSSD report to the GDB) ? • Lessons learned from CCRC08 and possible new development ? • How should SRM evolve if at all ? • Quotas ? ACLs ? (a few GSSD meetings were dedicated to this) Grid Deployment Board, CERN 9 January 2008
The work • Creating a Grid Storage Infrastructure for testing activities (goal B) • From July 2007 to present. • Participating sites: IN2P3, FZK, SARA, CERN, CNAF, RAL. Strong support from the storage core teams (S. De Witt, J. Van Eldik, G. Lo Presti, P. Fuhrmann, T. Mkrtchyan, T. Perelmutov) • S2 test suite used to monitor the status of the sites (new test families: lcg-utils, bdii, vo). • We learned a lot. Experiments could better understand the usage of the space tokens and exercise the high level tools. Grid Deployment Board, CERN 9 January 2008
SUPPORT?? The work (next step ?) • Creating a Grid Storage Infrastructure for testing activities (goal B) • As an outcome of these activities, technical documentation was produced to help sites configure the new storage services. Precise guidelines are published on the GSSD web pages with instructions specific to implementations and VOs. (goal F) – This must continue (support for specific versions only, clear release notes, clear reasons for the upgrade, etc.) • Through GSSD several discussions forums started to tackle common problems even concerning different implementations. • GSSD has proposed a model for Storage Support through forums and GGUS. Specialized second level support units should analyze in detail the reported problems and provide a solution or a detailed analysis to the developers. This should be followed up and implemented. Grid Deployment Board, CERN 9 January 2008
The work • Studying the Glue Schema v1.3 and identifying the mandatory object classes to be used for SRM v2.2 (goal C) • Activity carried on by F. Donno and M. Litmaath with important input from S. Burke and OSG (Ted Hesselroth) • Mandatory object classes identified with the agreement of the developers (both storage and high level tools) and operation people • Example LDIF file provided (by M. Litmaath): https://twiki.cern.ch/twiki/bin/view/LCG/GSSDGLUEProposal • S2 test suite used to monitor the status of the sites (bdii). • Some input provided for Glue Schema v2.0 • Initial information providers provided for dCache (by R. Trompert) and DPM (M. Jouvin) Grid Deployment Board, CERN 9 January 2008
The work (next step ?) • Studying the Glue Schema v1.3 and identifying the mandatory object classes to be used for SRM v2.2 (goal C) • Ensure developers do understand the meaning of the Glue objects and use the correct attributes for retrieving the necessary information • A mechanism for validating static and dynamic information should be in place (LDIF or GIP) • The discussions concerning the Glue Schema for Storage must involve technical experts, developers and sites • The S2 test suite used to monitor the status and coherence of the information published must evolve and be included in SAM. The test suite itself must also be validated. Grid Deployment Board, CERN 9 January 2008
The work (next step ?) • Ensuring that the monitoring information required by the experiments are available (goal D) • Activity started involving the DashBoard people (R. Rocha, J. Andreeva), INFN Bari (G. Donvito), INFN Pisa (M. Ciriello) • A report on needed metrics has been published: https://twiki.cern.ch/twiki/bin/viewfile/LCG/GSSDSubGroups?rev=2;filename=report_dpm.pdf • However, no further progress has been reported Grid Deployment Board, CERN 9 January 2008
The work • Discussing with the experiment the use of existing Data Management tools to access files by grid jobs […]. The GSSD group will therefore work to ensure transparency of data access and the functionalities required by the experiments (see Baseline Service report). (goal E) • Activity carried on by F. Donno, P. Charpentier, N. Brook, A. Smith, B. Koblitz, M. Branco, A. Sciaba’, D. Bonaccorsi • As an outcome of this activity, we discussed with the experiments and compiled a list of needed features in the high level tools in order to integrate SRM v2.2 in the experiment frameworks: https://twiki.cern.ch/twiki/bin/view/LCG/GSSDExRequests • Participation to the AA meetings in order to resolve the problem of transparently access files via ROOT through the several available interfaces (GFAL, ROOT, RFIO, etc.): https://twiki.cern.ch/twiki/bin/view/LCG/GSSD (see “Transparent data access”) • Discussion with OSG/VDT to distribute lcg-utils/gfal also in USA (this will hopefully be finalized during the visit of A. Roy at CERN) Grid Deployment Board, CERN 9 January 2008
The work (next step ?) • Discussing with the experiment the use of existing Data Management tools to access files by grid jobs […]. The GSSD group will therefore work to ensure transparency of data access and the functionalities required by the experiments (see Baseline Service report). (goal E) • It is important that this activity continues following not only the new developments but also analyzing problems encountered and providing stable and concrete solutions • Close connection with the developers and the sites is needed • Very hard to debug problems involving services at several sites. Would this be tackled by the work of the monitoring group ? • New emerging needs should be analyzed and discussed with developers and sites Grid Deployment Board, CERN 9 January 2008
The work • GSSD provided an SRM rollout plan agreed with developers, sites and experiments: https://twiki.cern.ch/twiki/pub/LCG/GSSD/SRM-Rollout-Plan-2007.pdf(goal F) • Furthermore, GSSD has coordinated and monitored the deployment in production of storage systems providing support for SRM v2.2. All Tier-1s have today upgraded their production systems with the exception of TRIUMF and PIC. Full SRM v2.2 functionalities are not yet available everywhere. • GSSD has organized a Tier-1/Tier-2 workshop in Edinburgh supported by GridPP, NeSC and WLCG. Further tutorials should follow (goal F) Grid Deployment Board, CERN 9 January 2008
Conclusions • Has GSSD achieved its mandate ? • The scope of GSSD has been continuously adjusted during its lifetime learning from experience in order to achieve the goals • Many activities still need attention and coordination • It has been a great experience for me as a coordinator of this group! Thank you to all members and especially developers, site admins and experiments for the hard work done even during vacations, illness, stress … Grid Deployment Board, CERN 9 January 2008
A Proposal • GSSD should continue its work with a new mandate: • Follow deployment, tests and production activities and ensure coordination and problems follow-up • Ensure coherence of behaviors among implementations and solutions for the SRM v2.2 open issues • Organize adequate support for storage and data management problems coming from SRM v2.2 • Ensure that Glue 2.0 offers the needed view and information for storage services • Follow experiment activities to make sure the requirements are met and data access is transparent and efficient (including monitoring) • Organize tutorials and dissemination events as needed Grid Deployment Board, CERN 9 January 2008
Thank You Grid Deployment Board 9 January 2008