310 likes | 423 Views
The ARDA Project Between Grid middleware and LHC experiments Juha Herrala ARDA Project Juha.Herrala@cern.ch. GridPP 10 th Collaboration Meeting, CERN, 04 June 2004. http://cern.ch/arda. www.eu-egee.org. cern.ch/lcg.
E N D
The ARDA ProjectBetween Grid middlewareand LHC experimentsJuha HerralaARDA ProjectJuha.Herrala@cern.ch GridPP 10th Collaboration Meeting, CERN, 04 June 2004 http://cern.ch/arda www.eu-egee.org cern.ch/lcg EGEE is a project funded by the European Union under contract IST-2003-508833
Contents • Introduction to the LCG ARDA Project • History and mandate • Relation to EGEE project • ARDA prototypes • Relation to Regional Centres • Towards ARDA prototypes • LHCb, ATLAS, ALICE, CMS • Coordination and forum activities • Workshops and meetings • Conclusions and Outlook GridPP@CERN, 04 June 2004 - 2
How ARDA evolved • LHC Computing Grid (LCG) project’s Requirements and Technical Assessment Group (RTAG) for distributed analysis presented their ARDA report in November 2003. • ARDA = Architectural Roadmap for Distributed Analysis • Defined a set of collaborating Grid services and their interfaces • As a result the ARDA project was launched by LCG • ARDA = A Realisation of Distributed Analysis • Purpose is to coordinate different activities in the development of distributed analysis systems of the LHC experiments, which will be based on the new service-oriented Grid middleware and infrastructure. • But the generic Grid middleware is developed by the EGEE project • Sometimes ARDA became also a synonym for this “second generation” Grid middleware, which was later (May 2004) renamed to Glite. • Generic = no significant functionality that is of interest for HEP or any other science/community alone. GridPP@CERN, 04 June 2004 - 3
Our starting point / mandateRecommendations of the ARDA working group • New service decomposition • Strong influence of Alien system • the Grid system developed by the ALICE experiments and used by a wide scientific community (not only HEP) • Role of experience, existing technology… • Web service framework • Interfacing to existing middleware to enable their use in the experiment frameworks • Early deployment of (a series of) end-to-end prototypes to ensure functionality and coherence • Middleware as a building block • Validation of the design EGEE Middleware LCG ARDA project GridPP@CERN, 04 June 2004 - 4
LCG-1 LCG-2 EGEE-1 EGEE-2 VDT/EDG EGEE WS MW EGEE and LCG ARDA • LCG strongly linked with middleware developed/deployed in EGEE (continuation of EDG) • The core infrastructure of the EGEE Grid operation service will grow out of the LCG service • LCG includes many US and Asian partners • EGEE includes other sciences • Substantial part of infrastructure common to both • Parallel production lines • LCG-2 production Grid • 2004 data challenges • Pre-production prototype • EGEE/Glite MW • ARDA playground for the LHC experiments ARDA GridPP@CERN, 04 June 2004 - 5
ARDA prototypes • Support LHC experiments to implement their end-to-end analysis prototypes based on the EGEE/Glite middleware • ARDA will equally support each of the LHC experiments • Close collaboration with data analysis teams, ensuring end-to-end coherence of the prototypes • One prototype per experiment • Role of ARDA • Interface with the EGEE middleware • Adapt/verify components of analysis environments of the experiments (robustness/many users, performance/concurrent “read” actions) • A Common Application Layer may emerge in future • Feedback from the experiments to the middleware team • Final target beyond the prototype activity: sustainable distributed analysis services for the four experiments deployed at LCG Regional Centres. GridPP@CERN, 04 June 2004 - 6
ARDA @ Regional Centres • Regional Centres have valuable practical experience and know how • Understand “deployability” issues, which is a key factor for (EGEE/Glite) middleware success • Data base technologies • Web Services • Some Regional Centres will have the responsibility to provide early installation for the middleware • EGEE Middleware test bed • Pilot sites might enlarge the resources available and give fundamental feedback in terms of “deployability” to complement the EGEE SA1 • Running ARDA pilot installations • ARDA test bed for analysis prototypes • Experiment data available where the experiment prototype is deployed • Stress and performance tests could be ideally located outside CERN • Experiment-specific components (e.g. a Meta Data catalogue) which might be used by the ARDA prototypes • Exploit local know how of the Regional Centres • Final ARDA goal: sustainable analysis service for LHC experiments GridPP@CERN, 04 June 2004 - 7
Massimo Lamanna Birger Koblitz Derek Feichtinger Andreas Peters Dietrich Liko Frederik Orellana Julia Andreeva Juha Herrala Andrew Maier Kuba Moscicki Andrey Demichev Viktor Pose Wei-Long Ueng Tao-Sheng Chen ARDA project team Russia ALICE Taiwan ATLAS • Experiment interfaces • Piergiorgio Cerello (ALICE) • David Adams (ATLAS) • Lucia Silvestris (CMS) • Ulrik Egede (LHCb) CMS LHCb GridPP@CERN, 04 June 2004 - 8
Towards ARDA prototypes • Existing systems as starting point • Every experiment has different implementations of the standard services • Used mainly in production environments • Now more emphasis on analysis GridPP@CERN, 04 June 2004 - 9
Prototype activity • Provide a fast feedback to the EGEE MW development team • Avoid uncoordinated evolution of the middleware • Coherence between users’ expectations and final product • Experiments may benefit from the new MW as soon as possible • Frequent snapshots of the middleware available • Expose the experiments (and the community in charge of the deployment) to the current evolution of the whole system • Experiments’ systems are very complex and still evolving • Move forward towards new-generation real systems (analysis!) • A lot of work (experience and useful software) is invested in current data challenges of the experiments, which makes them a concrete starting point • Whenever possible adapt/complete/refactorise the existing components: we do not need yet another system! • Attract and involve users • Prototypes with realistic workload and conditions, thus real users from LHC experiments required! GridPP@CERN, 04 June 2004 - 10
Prototype activity • The initial prototype will have a reduced scope of functionality • Currently components are selection for the first prototype • Not all use cases/operation modes will be supported • Every experiment has a production system (with multiple backends, like PBS, LCG, G2003, NorduGrid, …). • We focus on end-user analysis on a EGEE MW based infrastructure • Informal Use Cases are still being defined, e.g. a generic analysis case: • A physicist selects a data sample (from current Data Challenges) • With an example/template as starting point (s)he prepares a job to scan the data • The job is split in sub-jobs, dispatched to the Grid, some error-recovery is automatically performed if necessary, and finally merged back in a single output • The output (histograms, ntuples) is returned together with simple information on the job-end status GridPP@CERN, 04 June 2004 - 11
Towards ARDA prototypes • LHCb - ARDA GridPP@CERN, 04 June 2004 - 12
LHCb • GANGA as a principal component • Friendly user interface for Grid services • The LHCb/GANGA plans match naturally with the ARDA mandate • Goal is to enable physicists (via GANGA) to analyse the data being produced during 2004 for their studies • Have the prototype where the LHCb data will be the key (CERN, RAL, …) • At the beginning, the emphasis will be focused on • Usability of GANGA • Validation of the splitting and merging functionality of users jobs • The DIRAC system is also an important component • LHCb grid system, used mainly in production so far • Useful target to understand the detailed behaviour of LHCb-specific grid components, like the file catalog. • Convergence between DIRAC and GANGA anticipated. GridPP@CERN, 04 June 2004 - 13
GANGAGaudi/Athena aNd Grid Alliance • Gaudi/Athena: LHCb/ATLAS frameworks • The Athena uses Gaudi as a foundation • Single “desktop” for a variety of tasks • Help configuring and submitting analysis jobs • Keep track of what they have done, hiding completely all technicalities • Resource Broker, LSF, PBS, DIRAC, Condor • Job registry stored locally or in the roaming profile • Automate config/submit/monitor procedures • Provide a palette of possible choices and specialized plug-ins (pre-defined application configurations, batch/grid systems, etc.) • Friendly user interface (CLI/GUI) is essential • GUI Wizard Interface • Help users to explore new capabilities • Browse job registry • Scripting/Command Line Interface • Automate frequent tasks • python shell embedded into the Ganga GUI GANGA GUI Collective & Resource Grid Services Histograms Monitoring Results JobOptions Algorithms GAUDI Program UI GANGA Internal Model BkSvc WLM ProSvc Monitor Grid Services Bookkeeping Service WorkLoad Manager Profile Service GAUDI Program Instr. File catalog SE CE GridPP@CERN, 04 June 2004 - 14
ARDA contribution to GANGA • Release management procedure established • Software process and integration • Testing, bug fix releases, tagging policies, etc. • Infrastructure • Installation, packaging etc. • ARDA team member in charge • Integration with job managers/resource brokers • Waiting for the EGEE middleware, we developed an interface to Condor • Use of Condor DAGMAN for splitting/merging and error recovery capability • Design and development in next future • Integration with EGEE middleware • Command Line Interface • Evolution of Ganga features GridPP@CERN, 04 June 2004 - 15
CERN/Taiwan tests on LHCb metadata catalogue Client • Clone BookkeepingDB in Taiwan • Install the WS layer • Performance Tests • Database I/O Sensor • Bookkeeping Server performance tests • Taiwan/CERN Bookkeeping Server DB • XML-RPC Service performance tests • CPU Load, Network send/receive sensor, Process time • Client Host performance tests • CPU Load, Network send/receive sensor, Process time • Feedback to LHCb metadata catalogue developers Network monitor Bookkeeping Server Virtual Users • CPU Load • Network • Processtime • Web & XML-RPC Service performance tests • CPU Load • Network • Process time CERN Bookkeeping Server Oracle DB TAIWAN • DB I/O Sensor Oracle DB GridPP@CERN, 04 June 2004 - 16
Towards ARDA prototypes • ATLAS - ARDA GridPP@CERN, 04 June 2004 - 17
ATLAS • ATLAS has a relatively complex strategy for distributed analysis, addressing different areas with specific projects • Fast response (DIAL) • User-driven analysis (GANGA) • Massive production with multiple Grids, etc… • For additional information see the ATLAS Distributed Analysis (ADA) site:http://www.usatlas.bnl.gov/ADA/ • The ATLAS system within ARDA has been agreed • Starting point is the DIAL service model for distributed interactive analysis; users will be exposed to different user interface (GANGA) • The AMI metadata catalog is a key component in ATLAS prototype • mySQL as a back end • Genuine Web Server implementation • Robustness and performance tests from ARDA • In the start up phase, ARDA provided some assistance in developing production tools GridPP@CERN, 04 June 2004 - 18
User Meta-Data(MySQL) User SOAP-Proxy User AMI studies in ARDA Studied behaviour using many concurrent clients: • Atlas Metadata- Catalogue, contains File Metadata: • Simulation/Reconstruction-Version • Does not contain physical filenames • Many problems still open: • Large network traffic overhead due to schema independent tables • SOAP Web Services proxy supposed to provide DB access • Note that Web Services are “stateless” (not automatic handles to have the concept of session, transaction, etc…): 1 query = 1 (full) response • Large queries might crash server • Shall SOAP front-end proxy re-implement all the database functionality? • Good collaboration in place with ATLAS-Grenoble GridPP@CERN, 04 June 2004 - 19
Towards ARDA prototypes • ALICE - ARDA GridPP@CERN, 04 June 2004 - 20
PROOF SLAVES PROOF SLAVES TcpRouter PROOF ALICE • Strategy: • The ALICE-ARDA will evolvethe analysis system presentedat SuperComputing 2003 ‘Grid-enabled PROOF’ • Where to improve: • Heavily connected with the middleware services • “Inflexible” configuration • No chance to use PROOF on federated grids like LCG • User libraries distribution • Activity on PROOF • Robustness • Error recovery Site A Site B PROOF SLAVES TcpRouter TcpRouter PROOF MASTER SERVER Site C TcpRouter USER SESSION GridPP@CERN, 04 June 2004 - 21
PROOF SLAVE SERVERS PROOF Improved PROOF system • Original problem: no support for hierarchical Grid infrastructure, only local cluster mode. • The remote proof slaves looklike a local proof slave onthe master machine • Booking service is usable also on local clusters Proxy rootd Grid Services Proxy proofd Booking Master GridPP@CERN, 04 June 2004 - 22
Towards ARDA prototypes • CMS - ARDA GridPP@CERN, 04 June 2004 - 23
RefDB Reconstruction instructions Summaries of successful jobs Reconstruction jobs McRunjob T0 worker nodes Transfer agent Reconstructed data Checks what has arrived GDB castor pool Updates Updates Tapes RLS TMDB Reconstructed data Export Buffers CMS • The CMS system within ARDA is still under discussion • Provide easy access (and possibly sharing) of data for the CMS users is a key issue in discussions CMS DC04production GridPP@CERN, 04 June 2004 - 24
CMS RefDB • Potential starting point for the prototype • Bookkeeping engine to plan and steer the production across different phases (simulation, reconstruction, to some degree into the analysis phase) • Contains all necessary information except file physical location (RLS) and info related to the transfer management system (TMDB) • The actual mechanism to provide these data to analysis users is under discussion • Measuring performances underway (similar philosophy as for the LHCb Metadata catalog measurements) GridPP@CERN, 04 June 2004 - 25
Coordination and forum activities • Workshops and meetings GridPP@CERN, 04 June 2004 - 26
Coordination and forum activities • Forum activities are seen as ‘fundamental’ in the ARDA project definition • ARDA will channel information to the appropriate recipients, especially to analysis-related activities and projects outside the ARDA prototypes • Ensures that new technologies can be exposed to the relevant community • ARDA should organise a set of regular meetings • Aim is to discuss results, problems, new/alternative solutions and possibly agree on some coherent program of work. Workshop every three months. • The ARDA project leader organises this activity which will be truly distributed and lead by the active partners • ARDA is embedded in EGEE • NA4, Application Identification and Support • Special relation with LCG • LCG GAG is a forum for Grid requirements and use cases • Experiments representatives coincide with the EGEE NA4 experiments representatives GridPP@CERN, 04 June 2004 - 27
Workshops and meetings • 1st ARDA workshop • January 2004 at CERN; open • Over 150 participants • 2nd ARDA workshop “The first 30 days of EGEE middleware” • June 21-23 at CERN; by invitation • Expected 30 participants • EGEE NA4 Meeting mid July • NA4/JRA1 (middleware) and NA4/SA1 (Grid operations) sessions • Organised by M. Lamanna and F. Harris • 3rd ARDA workshop • Currently scheduled for September 2004 close to CHEP; open GridPP@CERN, 04 June 2004 - 28
Next ARDA workshop“The first 30 days of the EGEE middleware” • CERN: 21-23 of June 2004 • Exceptionally by invitation only • Monday, June 21 • ARDA team / JRA1 team • ATLAS (Metadata database services for HEP experiments) • Tuesday, June 22 • LHCb (Experience in building web services for grid) • CMS (Data management) • Wednesday, June 23 • ALICE (Interactivity on the Grid) • Close out • Info on the web: • http://lcg.web.cern.ch/LCG/peb/arda/LCG_ARDA_Workshops.htm GridPP@CERN, 04 June 2004 - 29
Conclusions GridPP@CERN, 04 June 2004 - 30
Conclusions and Outlook • LCG ARDA has started • Main objective: experiment prototypes for analysis • EGEE/Glite middleware becoming available • Good feedback from the LHC experiments • Good collaboration within EGEE project • Good collaboration with Regional Centres. More help needed. • Main focus • Prototyping distributed analysis systems of LHC experiments. • Collaborate with the LHC experiments, the EGEE middleware team and the Regional Centres to set up the end-to-end prototypes. • Aggressive schedule • Milestone for the first end-to-end prototypes is already December 2004. GridPP@CERN, 04 June 2004 - 31