280 likes | 366 Views
GS-EIS: Experiment Integration Support section. Five staff: Harry Renshall Section Leader Simone Campana ATLAS support Roberto Santinelli LHCb support Andreas Sciaba CMS support Patricia Mendez ALICE support Four INFN funded CERN fellows: Alessandro di Girolamo ATLAS
E N D
GS-EIS: Experiment Integration Support section Five staff: Harry Renshall Section Leader Simone Campana ATLAS support Roberto Santinelli LHCb support Andreas Sciaba CMS support Patricia Mendez ALICE support Four INFN funded CERN fellows: Alessandro di Girolamo ATLAS Elisa Lanciotti LHCb Nicolo Magini CMS and ALICE Vincenzo Miccio CMS One ASGC funded visitor: Gang Qin ATLAS In the future we would like to broaden the associations with single experiments where possible e.g. by leveraging common solutions or having limited duration task forces on a particular experiment problem area. GS Group Meeting - EIS section
H.Renshall: • member of wlcg team preparing for LHC startup (CCRC'08) then production running • deputy group leader - attend/contribute to departmental management activities • section leader (light administrative load) • scientific secretary of the LHCC Computing Resources Review Board and also of the Computing Resource Scrutiny Group • IT link person to the LHCb experiment GS Group Meeting - EIS section
Simone Campana: GS Group Meeting - EIS section
Experiment Integration Support for ATLAS • Liaison with WLCG and EGEE • In ATLAS organization: Facility Coordinator • Coordinate GRID middleware and ATLAS Distributed SW deployment/updates/upgrades • Primary contact for Tiers Facilities Managers • Organize, plan and coordinate ATLAS wide tests • Tier-0 throughput, DDM Functional Test, CCRC08 • Includes scripting/development of tools, debugging, testing, follow up… • This is the most time consuming activity • This activity is strong contact collaboration with Birger and Stephane Jerzequel from ATLAS • A lot of overlap, but also different scopes • Follow up of Alessandro’s activity on monitoring • Only little effort now from my side (he is now independent)
Patricia Mendez: GS Group Meeting - EIS section
EIS Commitments • WLCG Support to the ALICE Experiment • Maintenance and support of ALICE VOBOXES together with the AliEn software distribution • Implementation of gLite middleware within the Alice WMS • Establishment of site/services contacts with the experiment • SAM implementation • ALICE FDR setup and planning GS Group Meeting - EIS section
EIS Commitments • Support to communities beyond HEP • UNOSAT, Geant4, generic applications (theoretical physics, ITU, Garfield, HARP, QCD...) • Creation and setup of Vos = gear, UNOSAT, geant4 • Resources research and setup • Depending on each application requirements • Site/application contact • Gridification of the applications and merge onto the Grid environment GS Group Meeting - EIS section
EIS Commitments • CCRC`08 exercises and services • Follow up of the ALICE participation as WLCG contact person • Additional tasks as SAM implementation for the experiments, VOBOXES setup, etc • EGI proposal • Application support working group • EGEE-III GS Group Meeting - EIS section
Roberto Santinelli: GS Group Meeting - EIS section
Supporting LHCb: setting up LFC distributed service The goal A redundant and reliable File catalogue service for LHCb based on LFC A system that best matches the LHCb use cases Implementation a master LFC at CERN and mirrored replicas at Tier-1 sites using Oracle Streams Several technical aspects to consider Coherence of data and access control Latency in the propagation of updates VO support team contributed to the project Definition of the solution and “acceleration” of all steps in the software lifecycle (whenever this was possible) Functionality and stress tests. Readiness of site implementation The distributed LHCb file catalogue was deployed in time for the currently ongoing combined computing challenge (CCRC’08) GS Group Meeting - EIS section
Supporting LHCb: site readiness for CCRC and beyond FTS matrix channels between all T1’s SRMv2 • Not only monitoring resources and services (and • Writing custom tools for that) • But also : • Working with sites and WLCG service for • fixing problems spawned • Negotiating resources • channeling problems to/from VO Service classes disk space monitoring charts GS Group Meeting - EIS section
Supporting LHCb: SAM tests Nicolo' Magini - Third EGEE User Forum 12 GS Group Meeting - EIS section • LHCb uses the SAM framework to: • Check the availability of Computing Elements • Queues, WN hardware and software • Detect Operating System and architecture • Manage the deployment of LHCb software • Install (or remove) and publish appropriate software versions • Run test simulation, reconstruction, analysis • LHCb SAM jobs run with high priority with software manager credentials • LHCb sensors integrated in DIRAC infrastructure • When pilot job arrives on WN the testsuite is executed • Results are published in SAMDB
Andrea Sciaba: GS Group Meeting - EIS section
CMS contact in EIS • “Grid expert” in CMS • Giving advice, solving problems reported by CMS users and developers • Site commissioning • Responsible for SAM in CMS • Managing SAM test submission • Interface with SAM and Dashboard developers • Development of CMS SAM tests • Debug site problems, mainly those exposed by SAM tests • VO management • CMS VO manager, processing registration requests and solving VOMRS/VOMS issues • Interface with VOMRS/VOMS developers • Middleware testing • gLite WMS, CREAM, job priorities • EGEE TCG • “alternate” CMS representative • Giving input on middleware-related issues and future developments • OSG/EGEE Interoperability working group • Representing CMS • Training and documentation • Editor of the gLite 3 User Guide • Giving tutorials
SAM tests for SRMv2 Nicolo' Magini - Third EGEE User Forum Start from higher-level functionality: lcg-util tests • SRMv2-get-SURLs • For ops/dteam: get path from BDII and corresponding space tokens • For VOs: replace with VO-specific plugins. Developed TFC test for CMS • SRMv2-lcg-cp • Copy a file to the SRMv2, copy it back • SRMv2-lcg-cr • Copy a file to the SRMv2 and register in LFC File Catalog • SRMv2-lcg-gt • Get a TransferURL with supported protocols • SRMv2-lcg-gt-rm-gt • Verify ability to correctly remove a file from SRMv2 • SRMv2-lcg-ls-dir • List a directory on SRMv2 • SRMv2-lcg-ls • List a file on SRMv2 Other lower-level SRMv2 functionality will be added Nicolo' Magini - Third EGEE User Forum 16
VO support activity for CMS: DDT Nicolo' Magini - Third EGEE User Forum DebuggingDataTransfers • During CMS CSA07 period: • Define a metric and procedure to commission data transfer links between CMS Tiers • ~ 4 MB/s sustained for 5 days • Provide documentation and support on transfer debugging • FTS, SRM operations within the CMS PhEDEx middleware • ~ 250 links commissioned in 2007 • Current efforts (CCRC08 and beyond): • Scale up the rates to the requirements for the data taking period • 20 MB/s over 24h • Ongoing support for transfer debugging • FTS, SRMv2 ecc. • Current global traffic in PhEDEx counting DDT + CCRC transfers is approaching the 2008 requirements for CMS • 20 Gbps Nicolo' Magini - Third EGEE User Forum 17
Gang Qin: GS Group Meeting - EIS section
Storage Space Monitor • For T0,T1s & T2s Reliable space-info: • of different storage classes (i.e. Atlas:custodial:nearline, Atlas:replica:online, Atlas etc.) • of different time-period • Last 24hours • Last month • Last year • Functions for different storage types • Cross check between BDII (ldap query) & local command • DPM dpm-qryconf • CASTOR stager_qry on site VOBOX • dcache no local query command, ldap query (BDII) • StoRM no local query command, infos by site admin • Current status: • cronjobs running to fetch daily data • still lot of inconsistency, since lots of things are changing (SRMv2 space tokens) • To Do: implement SRM2 function to have space info for each space token GS Group Meeting - EIS section
Lumber —— Lemon Sensor • Monitor the status of user-specified processes • Process status • ‘0’ Everything is OK • ‘1’ process is not running (temporarily, a restart is tried) • ‘2’ process is closed (i.e. by expert working on the system) • ‘3’ process restart failed ------- ALARM mail sent • Now in production and running on ATLAS VOBOXes GS Group Meeting - EIS section
Alessandro di Girolamo GS Group Meeting - EIS section
Storage & Computing Elements endpoints definition: intersection between GOCDB and TiersOfATLAS (ATLAS specific sites configuration file with Cloud Model) • Different services and endpoints might need to be tested using different VOMS credentials • ATLAS endpoints and paths must be explicitly tested • The LFC of the Cloud (residing in the T1) is used ATLAS specific tests integration in the Service Availability Monitor framework • Monitor the availability of ATLAS critical Site Services • Verify the correct installation and the proper functioning of the ATLAS software on each site • SE: • Put, Get and Del for each SRM endpoint • CE: • GangaRobot on each site: execute a real analysis job based on a MC dataset • keep on running also large part of OPS suite GS Group Meeting - EIS section
Tiers of ATLAS integration within the Grid Great effort in testing the Tiers (Tier0,1 and 2) supporting ATLAS: • commissioning of srm2 endpoints installation, configuration and proper functioning • verification of the middleware versions and client tools installed Monitor of ATLAS specific critical services Lumber: a Lemon sensor to monitor the status of critical processes (like DataManagment and monitoring) running on the ATLAS VOBOXes • fully integrated into Lemon (Exceptions/Alarms) • availability output possible on Service Level Status (SLS) The publication of the availablity status of experiment specific services into monitoring framework like Lemon and SLS is now in progress GS Group Meeting - EIS section
Enzo Miccio GS Group Meeting - EIS section
Elisa Lanciotti GS Group Meeting - EIS section