1 / 43

ARDA status and plans

ARDA status and plans. Massimo Lamanna / CERN. Overview. ARDA in a nutshell ARDA prototypes 4 experiments ARDA feedback Middleware components on the development test bed ARDA workshops ARDA personnel Outlook and conclusions. The ARDA project. ARDA is an LCG project

Download Presentation

ARDA status and plans

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ARDA status and plans Massimo Lamanna / CERN

  2. Overview • ARDA in a nutshell • ARDA prototypes • 4 experiments • ARDA feedback • Middleware components on the development test bed • ARDA workshops • ARDA personnel • Outlook and conclusions

  3. The ARDA project • ARDA is an LCG project • main activity is to enable LHC analysis on the grid • ARDA is contributing to EGEE NA4 • uses the entire CERN NA4-HEP resource • Interface with the new EGEE middleware (gLite) • By construction, use the new middleware • Use the grid software as it matures • Verify the components in an analysis environments • Contribution in the experiments framework (discussion, direct contribution, benchmarking,…) • Users needed here. Namely physicists needing distributed computing to perform their analyses • Provide early and continuous feedback

  4. ARDA prototype overview

  5. ARDA contributions • Integrating of LHCb environment with gLite • Enabling job submission through GANGA to gLite • Job splitting and merging • Result retrieval • Enabling real analysis jobs to run on gLite • Running DaVinci jobs on gLite (custom code: user algorithms) • Installation of LHCb software using gLite package manager • Participating in the overall development of Ganga • Software process (initially) • CVS, Savannah, Release Management • Mayor contribution in new versions • Python “command-line” interface, “Ganga clients”

  6. Current Status • GANGA job submission handler for gLite is developed • DaVinci job runs on gLite submitted through GANGA Presented in the LHCb software week Demo in Rio and Den Haag

  7. Ganga4 Major version Important contribution from theARDA team Interesting concepts Note GANGA is a joint ATLAS-LHCbproject Contacts with CMS (exchange of ideas,code snippets, …)

  8. LHCb user S. K. Paterson (LHCb) Glasgow Univ.

  9. Related activities • GANGA-DIRAC(LHCb production system) • Convergence with GANGA/components/experience • Submitting jobs to DIRAC using GANGA • GANGA-Condor • Enabling submission of jobs through GANGA to Condor • LHCb Metadata catalogue performance tests • In collaboration with colleagues from Taiwan • New activity started using the ARDA metadata prototype (newversion, collaboration with gridPP/LHCb people) Wei-Long Ueng, ASCC

  10. Ganga clients Wei-Long Ueng, ASCC

  11. ALICE prototype ROOT and PROOF • ALICE provides • the UI • the analysis application (AliROOT) • GRID middleware gLite provides all the rest • ARDA/ALICE is evolving the ALICE analysis system Middleware UI shell Application end to end

  12. PROOF SLAVES PROOF SLAVES PROOF PROOF SLAVES Site B PROOF MASTER SERVER Site C Site A USER SESSION Demo based on a hybrid system using 2004 prototype

  13. Interactive Session Demo at Supercomputing 04 and Den Haag Demo in the ALICE sw week

  14. Current Status • Developed gLite C++ API and API Service • providing generic interface to any GRID service • C++ API is integrated into ROOT • In the ROOT CVS • job submission and job status query for batch analysis can be done from inside ROOT • Bash interface for gLite commands with catalogue expansion is developed • More powerful than the original shell • In use in ALICE • Considered a “generic” mw contribution (essential for ALICE, interesting in general) • First version of the interactive analysis prototype ready • Batch analysis model is improved • submission and status query are integrated into ROOT • job splitting based on XML query files • application (Aliroot) reads file using xrootd without prestaging

  15. ARDA shell + C/C++ API C++ access library for gLite has been developed by ARDA • High performance • Protocol quite proprietary... Essential for the ALICE prototype Generic enough for general use Using this API grid commands have been added seamlessly to the standard shell

  16. ATLAS/ARDA • Main component: • Contribute to the DIAL evolution • gLite analysis server • “Embedded in the experiment” • AMI tests and interaction • Production and CTB tools • Job submission (ATHENA jobs) • Integration of the gLite Data Management within Don Quijote • Benefit from the other experiments prototypes • First look on interactivity/resiliency issues • “Agent-based” approach (a` la DIRAC) • GANGA (Principal component of the LHCb prototype, key component of the overall ATLAS strategy) ADA meeting Tao-Sheng Chen, ASCC

  17. Data Management Don Quijote Locate and move data over grid boundaries ARDA has connected gLite DQ Client ADA meeting DQ server DQ server DQ server DQ server RLS SE RLS RLS RLS SE SE SE GRID3 Nordugrid LCG gLite

  18. ATCOM @ CTB • Combined Testbeam • Various extensions were made to accommodate the new database schema used for CTB data analysis. • New panes to edit transformations, datasets and partitions were implemented. • Production System • A first step is to provide a prototype with limited functionality, but support for the new production system. ADA meeting

  19. Combined Test Beam Real data processed atgLite Standard Athena for testbeam Data from CASTOR Processed on gLite worker node Example: ATLAS TRT data analysis done by PNPI St Petersburg Number of straw hits per layer

  20. ATLAS: first look in interactivity matters ADA meeting. Using DIANE

  21. CMS Prototype • Aims to end-to-end prototype for CMS analysis jobs on gLite • Native middleware functionality of gLite • Only for few CMS specific tasks on top of the middleware Dataset and owner name defining CMS data collection Points to the corresponding PubDB where POOL catalog for a given data collection is published PubDB RefDB Workflow planner with gLite back-end and command line UI POOL catalog and a set of COBRA META files Retrieves output Register required info in gLite catalog Creates and submits jobs to gLite, Queries their status gLite

  22. ARDA-CMS • CMS prototype (ASAP = Arda Support for cms Analysis Processing) • First version of the CMS analysis prototype capable of creating-submitting-monitoring of the CMS analysis jobs on the gLite middleware had been developed by the end of the year 2004 • It was demonstrated at the CMS week in December 2004 • Prototype was evolved to support both RB versions deployed at the CERN testbed (prototype task queue and gLite 1.0 WMS ). • Currently submission to both RBs is available and completely transparent for the users (same configuration file, same functionality) • Plan to implement gLite job submission handler for Crab • Users? • Starting from February 2005 CMS users began working on the testbed submitting jobs through ASAP • Positive feedback, suggestions from the users are implemented asap • Plan to involve more users as soon as preproduction farm is available • Plan to try and use in the prototype new functionality provided by WMS ( DAGs, interactive job for testing purposes)

  23. Connections to Other Projects • Compatibility with Clarens and PhySh • Analysis prototype is Python-based and it uses XML-RPC calls for client-server interaction like Clarens and PhySh • In addition, to enable future integration, the analysis prototype has similarly structured CVS repository as the PhySh project

  24. Connections to Other Projects • RefDB Re-Design and PubDB • Taking part in the RefDB redesign • Developing schema for PubDB and supervising development of the first PubDB version • Analysis Prototype Connected to MonAlisa • To track the progress of an analysis task is troublesome when the task is split into several (hundreds of) sub-jobs • Analysis prototype associates each sub-job with built-in ‘identity’ and capability to report its progress to the MonAlisa system • MonAlisa service receives and combines progress reports of single sub-jobs and publishes the overall progress of the whole task

  25. CMS - Using MonAlisafor user job monitoring A single job Is submiited to gLite JDL contains job-splitting instructions Master job is splitted by gLite into sub-jobs Demo at Supercomputing 04 Dynamic monitoring of the total number of the events of processed by all sub-jobs belonging to the same Master job

  26. Accessing analysis samples • Digi: @ CERN • ARDA with WMS as resource broker • gLite as GRID middleware • DST: @ CNAF • CRAB as job splitting and monitoring:UI @ Pisa and CERN • LCG as GRID middleware: • Data on Disk @ CNAF • Problems related to the disk, made unavailable the data since long time ago. • Several problems were caused by the Catalogues: • on DST (and DIGI) @ CNAF, the program run well There is still the lack of statistics for QCD background. A. Nikitenko (CMS) Convener of the CMS Higgs group

  27. H->2t->2j analysis: bkg. data available (all signal events processed with Arda) A. Nikitenko (CMS)

  28. Higgs boson mass (Mtt) reconstruction Higgs boson mass was reconstructed after basic off-line cuts: reco ETt jet > 60 GeV, ETmiss > 40 GeV. Mtt evaluation is shown for the consecutive cuts : pt > 0 GeV/c, pn > 0 GeV/c, Dfj1j2 < 1750. s(MH) ~ s(ETmiss) / sin(fj1j2) Mtt and s(Mtt) are in a very good agreement with old results CMS Note 2001/040, Table 3: Mtt = 455 GeV/c2, s(Mtt)=77 GeV/c2. ORCA4, Spring 2000 production. A. Nikitenko (CMS)

  29. CMS: A->2t->2j event at low luminosity A. Nikitenko (CMS)

  30. Prototype Deployment • 2004: • Prototype available (CERN + Madison Wisconsin) • A lot of activity (4 experiments prototypes) • Main limitation: size • Experiments data available!  • Just an handful of worker nodes  • 2005: • Coherent move to prepare a gLite package to be deployed on the pre-production service • ARDA contribution: • Mentoring and tutorial • Actual tests! • Lot of testing during 05Q1 • PreProduction Service is about to start! Access granted on May 18th ! 

  31. Workload Management System (WMS) • Last day monitor • “Hello World!” jobs • 1 per minute • Logging&Bookkeeping info on the web to help thedevelopers Last day Last week Hurng-Chun Lee, ASCC

  32. WMS monitor Hurng-Chun Lee, ASCC

  33. Certification activity • Certification activity • Performed by the operation team • Using tests from other sources • Re-using tests (developed by the operations group) which proved to be effective in order to pin down problems in LCG2 • LCG2  gLite • Lot of effort from ARDA (Mainly Hurng-Chun Lee – ASCC): • Several “storm” tests migrated • Help other people to get full speed in this

  34. Data Management • Central component together with the WMS • Early tests started in 2004 • Two main components: • gLiteIO (protocol + server to access the data) • FiReMan (file catalogue) • The two components are not isolated, for example gLiteIO uses the ACL as recorded in FiReMan, FiReMan exposes the physical location of files for the WMS to optimise the job submissions… • Both LFC and FiReMan offer large improvements over RLS • LFC is the most recent LCG2 catalogue • Still some issues remaining: • Scalability of FiReMan • Bulk Entry for LFC missing • More work needed to understand performance and bottlenecks • Need to test some real Use Cases • In general, the validation of DM tools takes time!

  35. Single Bulk1 Bulk 10 Bulk 100 Bulk 500 Bulk 1000 Bulk 5000 FiReMan Performance - Inserts • Inserted ~1M entries in bulk with insert time ~5ms • Insert Rate for different bulk sizes 350 300 250 Inserts / Second 200 150 100 50 0 1 2 5 10 20 50 Number Of Threads

  36. FiReMan Performance - Queries • Query Rate for an LFN 1200 Fireman Single Fireman Bulk 1 Fireman Bulk 10 1000 Fireman Bulk 100 Fireman Bulk 500 Fireman Bulk 1000 Fireman Bulk 5000 800 Entries Returned / Second 600 400 200 0 5 10 15 20 25 30 35 40 45 50 Number Of Threads

  37. FiReMan Performance - Queries • Comparsion with LFC: 1200 Fireman - Single Entry Fireman - Bulk 100 LFC 1000 800 Entries Returned / Second 600 400 200 0 1 2 5 10 20 50 100 Number Of Threads

  38. Summary of gLite usage and testing • Info available also underhttp://lcg.web.cern.ch/lcg/PEB/arda/LCG_ARDA_Glite.htm • gLite version 1 • WMS • Continuous monitor available on the web (active since 17th of February) • Concurrency tests • Usage with ATLAS and CMS jobs (Using Storage Index) • Good improvements observed • DMS (FiReMan + gLiteIO) • Early usage and feedback (Starting Nov 2004) on functionality, performance and usability • Considerable improvement in performances/stability observed during the last months • Some of the tests given to the development team for tuning and to JRA1 to be used in the testing suite • Most of the tests given to JRA1 to be used in the testing suite • Performance/stability measurements: heavy-duty testing needed for real validation • Contribution to the common testing effort to finalise gLite 1 with SA1, JRA1 and NA4-testing) • Migration of certification tests within the certification test suite (LCGgLite) • Comparison between LFC (LCG) and FiReMan • Mini tutorial to facilitate the usage of gLite within the NA4 testing

  39. Metadata • gLite has provided a prototype interface and implementation mainly for the Biomed community • Requirements in ARDA (HEP) were not all satisfied by that early version • ARDA preparatory work • Stress testing of the existing experiment metadata catalogues • Existing implementations showed to share similar problems • ARDA technology investigation • On the other hand usage of extended file attributes in modern systems (NTFS, NFS, EXT2/3 SCL3,ReiserFS,JFS,XFS) was analyzed: a sound POSIX standard exists! • Prototype activity in ARDA • Discussion in LCG and EGEE and UK GridPP Metadata group • Synthesis: • New interface which will be maintained by EGEE benefiting from the activity in ARDA (tests and benchmarking of different data bases and direct collaboration with LHCb/GridPP)

  40. ARDA metadata prototype: performances • Prototype very useful: • Investigate technology issues (Web Services optimisation) • Usable system for experiments • LHCb is using the system… Local tests CERN-TAIWAN tests Tao-Sheng Chen and Meng-Hang Ho, ASCC

  41. ARDA workshops and related activities • ARDA workshop (January 2004 at CERN; open) • ARDA workshop (June 21-23 at CERN; by invitation) • “The first 30 days of EGEE middleware” • NA4 meeting (15 July 2004 in Catania; EGEE open event) • ARDA workshop (October 20-22 at CERN; open) • “LCG ARDA Prototypes” • NA4 meeting 24 November (EGEE conference in Den Haag) • ARDA workshop (March 7-8 2005 at CERN; open) • Wednesday afternoon meeting started in 2005: • Presentations from experts and discussion (not necessary from ARDA people) Available from http://arda.cern.ch

  42. Massimo Lamanna Frank Harris (EGEE NA4) Birger Koblitz Andrey Demichev Viktor Pose Victor Galaktionov Derek Feichtinger Andreas Peters Hurng-Chun Lee Dietrich Liko Frederik Orellana Tao-Sheng Chen Julia Andreeva Juha Herrala Alex Berejnoi Andrew Maier Kuba Moscicki Wei-Long Ueng 2 PhD students: Craig Munro (Brunel Univ.) Distributed analysis within CMS Nuno Santos (Coimbra Univ) Metadata and resilient computing Catalin Cirstoiu and Slawomir Biegluk (LCG visitors) People ALICE Key contributions: Taiwan and Russia ATLAS Experiment interfaces Piergiorgio Cerello (ALICE) David Adams (ATLAS) Lucia Silvestris (CMS) Ulrik Egede (LHCb) CMS LHCb + consultancy from Meng-Hang Ho and his team

  43. Conclusions and outlook • ARDA has been set up to • enable distributed HEP analysis on gLite • Contact have been established • With the experiments • With the middleware • Experiment activities are progressing rapidly • Prototypes for LHCb, ALICE, ATLAS & CMS are on the way • Complementary aspects are studied • Good interaction with the experiments environment • Always seeking for users! (more interested in physics than in mw… we support them!) • 2005 will be the key year (gLite version 1 is becoming available on the pre-production service) • ARDA is providing early feedback to the development team • First use of components • Try to run real life HEP applications • Follow the development on the prototype and contribute to the preparation of the preproduction service • Some of the experiment-related ARDA activities could be of general use • Shell access (originally in ALICE/ARDA) • Metadata catalog (under test in LHCb/ARDA) • (Pseudo)-interactivity interesting issue (something in/from all experiments)

More Related