230 likes | 351 Views
Experience Supporting the Integration of LHC Experiments Computing Systems with the LCG Middleware Simone Campana LCG Experiment Integration and Support CERN-IT / INFN-CNAF. Mandate of the LCG/EIS Team. EIS : Experiment Integration and Support Team
E N D
Experience Supporting the Integration of LHC Experiments Computing Systems with the LCG Middleware Simone CampanaLCG Experiment Integration and SupportCERN-IT / INFN-CNAF
Mandate of the LCG/EIS Team • EIS : Experiment Integration and Support Team • Help LHC Experiments integrating their production environment with the Grid Middleware and utilities. • Offer support during all steps of integration process • understanding of the middleware functionality • testing new prototypal components • getting on the LCG Infrastructure. • One person dedicated to each LHC Experiment • Production is the main focus. • Experiment Support does not mean User Support. • Experiment Support does not mean GOC. CHEP06 – 12-17 February 2006 – Mumbay (India)
Main Tasks • Integration • Middleware functionality and usage • Functionality tests • Customized distributions and missing tools • Discuss requirements • And bring them to the attention of the developers • Experiment and User Support • Documentation: Manuals, Guides, FAQ • First line User Support • Monitoring experiment specific production system • Provide infrastructure expertise • Monitoring/Managing Services • GRID and Experiment Specific • Solving site-related problems • Service Challenge Second Level Support (on shift) CHEP06 – 12-17 February 2006 – Mumbay (India)
Tools… Tools… Tools… • Data Management • Customized version of LCG Data Management clients • Workload Management • Monitoring of the job “standard error” and “standard output” • g-peek • Estimate job normalized CPU and Wall Clock time left on CPU • Information System • C++ Generic API (with ldap and R-GMA backends) • User friendly querying tools • Generic Framework for Job Submission • Intensively used by GEANT4 • Many others … • Several functionalities provided by the tools have been integrated in the Middleware • See the g-peek functionality CHEP06 – 12-17 February 2006 – Mumbay (India)
Monitoring Tools ATLAS SC3 Service Monitor LHCb specific Site Functional Tests CHEP06 – 12-17 February 2006 – Mumbay (India)
Experiment Software Installation Lcg-asis UI Lcg-ManageSoftware WN Lcg-ManageVOTag Tank&Spark gssklog CE CHEP06 – 12-17 February 2006 – Mumbay (India)
VO-BOX • First prototype developed and packaged by EIS. • Evaluation of the Globus GSI-enabled ssh server • and relative configuration • Development of a ad-hoc proxy renewal server • with relative user level tool • Overall configuration of the node type • Inclusion of UI clients and gssklog • Following up installation issues and further discussions on possible evolution CHEP06 – 12-17 February 2006 – Mumbay (India)
EIS on ALICE • EIS For Data Challenges 04 and 05 • Offered support for the integration of ALICE framework with LCG services • Integration with existing LCG services • Development of new tools • Follow up of production exercise • Provided solution for site specific problems • Follow up of services deployment at the sites • Collected ALICE requirements for middleware developers CHEP06 – 12-17 February 2006 – Mumbay (India)
EIS on ALICE • Development ALICE specific user level tools • Integration of Monalisa monitoring system with LCG • Later, the tools have been generalized for other use-cases • FTS transfer handling client • Then integrated in the ALICE framework • Publication of VO specific services in the Information System • Included as part of the VO-BOX middleware component CHEP06 – 12-17 February 2006 – Mumbay (India)
Some Results of the last PDC04 • ◘ Statistics after phase 1 (ended April 4, 2004): • ➸ALICE::CERN::LCG is the interface to LCG-2 • ➸ALICE::Torino::LCG is the interface to GRID.IT ~ 1.3 million files, 26 TB data volume S. Bagnasco.SC3 Detailed Planning Workshop, CERN 13.June, 05) 4 CHEP06 – 12-17 February 2006 – Mumbay (India)
Number of jobs per day Data Challenge 2 Rome Production EIS in ATLAS • Support in the development of ATLAS framework • Data Management • Workload management • Operational support • Exclusion of problematic sites • Follow up of site configuration problems • Understanding of failures and suggestion of solutions EIS support activities Large event production for Physics Rome workshop CHEP06 – 12-17 February 2006 – Mumbay (India)
Jobs distributed to 45 different computing resources Ratio generally proportional to the size of the cluster indicates an overall good job distribution. No site in particular ran large majority of jobs. The site with the largest number of CPU resources (CERN), contributed for about 11% of the ATLAS production. Other major sites ran between 5% and 8% of the jobs each. Achievement toward a more robust and fault-tolerant system does not rely on a small number of large computing centers. Rome Production experience on LCG The percentage of ATLAS jobs run at each LCG site CHEP06 – 12-17 February 2006 – Mumbay (India)
EIS in ATLAS • Service Challenge 3 • Support to the ATLAS Data Management System • File Transfer Service (FTS) and LCG File Catalog (LFC) • Prototype Data Location Interface (DLI) developed • ATLAS WMS and DDM integration. • Role in the technical coordination of the ATLAS Service Challenge activities • ensuring the readiness of the sites before and during the exercise • following up issues with the different services. • Testing • Several new glite components (WMS, gpbox, FTS …) • In the context of the task force and in collaboration with ARDA • User Support • Analysis on LCG produced data CHEP06 – 12-17 February 2006 – Mumbay (India)
EIS in CMS • LFC evaluation as a POOL file catalog • use case: local file catalog • performance tests • Results: LFC and POOL_LFC interface issues discovered and fixed • LFC evaluation as a Data Location System • implementation of a Python API • performance tests • Results: LFC was found to be an valid implementation of a DLS; performance issues discovered and fixed CHEP06 – 12-17 February 2006 – Mumbay (India)
EIS in CMS • Service Challenge 3 • fake analysis job submission • analysis of job failures and related statistics • Results: much better understanding of the stability of the LCG infrastructure when intensively used • Support • active in the solution of Grid-related problems for the MC production and user analysis (CRAB) activities • CMS VO management CHEP06 – 12-17 February 2006 – Mumbay (India)
The CMS Analysis Jobs Taken from the CMS Dashboard (ARDA) CHEP06 – 12-17 February 2006 – Mumbay (India)
EIS in LHCb • EIS supported LHCb along many activities: • Data Challenge 04 • Service Challenge 3 • Analysis exercise • Operation support • chasing/tackling sites and middleware related problems • developing experiment specific monitoring tools • T1-T1 transfer monitor for SC3 • VO oriented plug-ins for SFT CHEP06 – 12-17 February 2006 – Mumbay (India)
EIS in LHCb • Integration of LHCb framework and LCG middleware • Offering suggestions for an optimized middleware usage • Development of user level tools • Query the information system, interactions with SRM, LFC, DLI. • Repackaging or customized version of existing tools • lcg_utils and GFAL • User Support • Especially for analysis users • Using the GGUS portal • Testing of new components • CREAM CE, g-pbox, WMS … CHEP06 – 12-17 February 2006 – Mumbay (India)
187 M Produced Events Phase 1 Completed 3-5 106/day LCG restarted LCG paused 61% efficiency for LCG LCG in action 1.8 106/day DIRAC alone The LHCb Data Challenge Number of Jobs run versus time Jobs run in LCG and Dirac-only sites CHEP06 – 12-17 February 2006 – Mumbay (India)
Support to Biomedical community and the WISDOM project • First no-HEP VO supported by EIS • Different needs, access pattern, user scenarios • Scattered and heterogeneous community • Main support activities for Biomed: • Improvement of Job submission strategy • Adaptation of application to Grid Environment • Oparational support • User Support • Biomedical Data Challenge in July - August 2005 • ~70000 jobs run • 1 TB of data produced • equivalent of ~70 CPU years computed. • WISDOM: research on malaria medical care • Major success in EGEE • 1 million of potential medicines tested in 1 week • 1000 CPUs employed in EGEE/LCG CHEP06 – 12-17 February 2006 – Mumbay (India)
GEANT4 • GEANT4: simulation of particle interactions with matter. • HEP and nuclear experiments, medical, accelerator, space physics • 3 major productions on LCG • First 2 hosted by dteam and alice, third as a real VO • Aimed to test new version of software • EIS support in GEANT “Gridification” process • Development of tools for job submission an handling • Then extended and generalized for other VOs • Creation and administration of the GEANT VO • Contact point for the EGEE ROC managers • Operational support during production CHEP06 – 12-17 February 2006 – Mumbay (India)
Relief Projects of UNOSAT • Case Study: Indian Ocean Tsunami Relief and Development • 29th Dec 2004: First Map distributed online to field users • January 2005: 200,000 tsunami maps downloaded in total • UNOSAT has a huge amount of data to be stored • Good amount of storage provided by CERN • Running and storing data in LCG/EGEE can certainly assist UNOSAT in their purposes • In Summer 2005 the collaboration with LCG started • Gridification prcess similar to GEANT4 experience CHEP06 – 12-17 February 2006 – Mumbay (India)
Summary • EIS provides help integratingVO specific software environment with GRID middleware • Direct experiment support via a contact persons • Special middleware distributions • documentation • User support • Data Challenges, Service Challenges and Distributed Productions • Follow up of operational issues • maintaing experiment specific services • assisting sites with configuration problems • Not anymore “sporadic” exercises. • Overall a very interesting a productive experience • LHC experiments and other VOs seem to find EIS team very supportive Our mailing list: support-eis@cern.ch Our WEB site: http://lcg.web.cern.ch/LCG/eis.htm CHEP06 – 12-17 February 2006 – Mumbay (India)