290 likes | 392 Views
Managing distributed computing resources with DIRAC. A.Tsaregorodtsev, CPPM-IN2P3-CNRS, Marseille. 12-17 September 2011, NEC’11, Varna. Outline. DIRAC Overview Main subsystems Workload Management Request Management Transformation Management Data Management
E N D
Managing distributed computing resources with DIRAC A.Tsaregorodtsev, CPPM-IN2P3-CNRS, Marseille 12-17 September 2011, NEC’11, Varna
Outline • DIRAC Overview • Main subsystems • Workload Management • Request Management • Transformation Management • Data Management • Use in LHCb and other experiments • DIRAC as a service • Conclusion
Introduction • DIRAC is first of all a framework to build distributed computing systems • Supporting Service Oriented Architectures • GSI compliant secure client/service protocol • Fine grained service access rules • Hierarchical Configuration service for bootstrapping distributed services and agents • This framework is used to build all the DIRAC systems: • Workload Management • Based on Pilot Job paradigm • Production Management • Data Management • etc
Production Manager Physicist User Matcher Service EGI/WLCG Grid CREAM CE GISELA Grid NDG Grid EGEE Pilot Director NDG Pilot Director EELA Pilot Director CREAM Pilot Director
User credentials management • The WMS with Pilot Jobs requires a strict user proxy management system • Jobs are submitted to the DIRAC Central Task Queue with credentials of their owner (VOMS proxy) • Pilot Jobs are submitted to a Grid WMS with credentials of a user with a special Pilot role • The Pilot Job fetches the user job and the job owner’s proxy • The User Job is executed with its owner’s proxy used to access SE, catalogs, etc • The DIRAC Proxy manager service ensures the necessary functionality • Proxy storage and renewal • Possibility to outsource the proxy renewal to the MyProxy server
Direct submission to CEs • Using gLite WMS now just as a pilot deployment mechanism • Limited use of brokering features • For jobs with input data the destination site is already chosen • Have to use multiple Resource Brokers because of scalability problems • DIRAC is supporting direct submission to CEs • CREAM CEs • Can apply individual site policy • Site chooses how much load it can take (Pull vs Push paradigm) • Direct measurement of the site state watching the pilot status info • This is a general trend • All the LHC experiments declared abandoning eventually gLite WMS
DIRAC sites On-site Director Off-site Director • Dedicated Pilot Director per (group of) site(s) • On-site Director • Site managers have full control • Of LHCb payloads • Off-site Director • Site delegates control to the central service • Site must only define a dedicated local user account • The payload submission through the SSH tunnel • In both cases the payload is executed with the owner credentials
DIRAC Sites • Several DIRACsites in productionin LHCb • E.g. Yandex • 1800 cores • Second largest MC production site • Interesting possibility for small user communities or infrastructures e.g. • contributing local clusters • building regional or university grids
WMS performance • Up to 35K concurrent jobs in ~120 distinct sites • Limited by the resources available to LHCb • 10 mid-range servers hosting DIRAC central services • Further optimizations to increase the capacity are possible • Hardware, database optimizations, service load balancing, etc
Belle (KEK) use of the Amazon EC2 • VM scheduler developed for Belle MC production system • Dynamic VM spawning taking spot prices and TQ state into account Thomas Kuhr, Belle
Belle Use of the Amazon EC2 • Various computing resource combined in a single production system • KEK cluster • LCG grid sites • Amazon EC2 • Common monitoring, accounting, etc Thomas Kuhr, Belle II
Belle II Raw Data Storage and Processing • Starting at 2015 after the KEK update • 50 ab-1 by 2020 • Computing model • Data rate 1.8 GB/s ( high rate scenario ) • Using KEK computing center, grid and cloud resources • Belle II distributed computing system is based on DIRAC MC Production and Ntuple Production Ntuple Analysis Thomas Kuhr, Belle II
Support for MPI Jobs • MPI Service developedfor applications in theGISELA Grid • Astrophysics, BioMed,Seismology applications • No special MPI support onsites is required • MPI software installed by Pilot Jobs • MPI ring usage optimization • Ring reuse for multiple jobs • Lower load on the gLite WMS • Variable ring sizes for different jobs • Possible usage for HEP applications: • Proof on demand dynamic sessions
Coping with failures • Problem: distributed resources and services are unreliable • Software bugs, misconfiguration • Hardware failures • Human errors • Solution: redundancy and asynchronous operations • DIRAC services are redundant • Geographically: Configuration, Request Management • Several instances for any service
Request Management system • A Request Management System (RMS) to accept and execute asynchronously any kind of operation that can fail • Data upload and registration • Job status and parameter reports • Request are collected by RMS instances on VO-boxes at 7 Tier-1 sites • Extra redundancy in VO-box availability • Requests are forwarded to the central Request Database • For keeping track of the pending requests • For efficient bulk request execution
DIRAC Transformation Management • Data driven payload generation based on templates • Generating data processing and replication tasks • LHCb specific templates and catalogs
Data Management • Based on the Request Management System • Asynchronous data operations • transfers, registration, removal • Two complementary replication mechanisms • Transfer Agent • user data • public network • FTS service • Production data • Private FTS OPN network • Smart pluggable replication strategies
ILC using DIRAC • ILC CERN group • Using DIRAC Workload Management and Transformation systems • 2M jobs run in the first year • Instead of 20K planned initially • DIRAC FileCatalog was developed for ILC • More efficient than LFC for common queries • Includes user metadata natively
DIRAC as a service • DIRAC installation shared by a number of user communities and centrally operated • EELA/GISELA grid • gLite based • DIRAC is part of the grid production infrastructure • Single VO • French NGI installation • https://dirac.in2p3.fr • Started as a service for grid tutorials support • Serving users from various domains now • Biomed, earth observation, seismology, … • Multiple VOs
DIRAC as a service • Necessity to manage multiple VOs with a single DIRAC installation • Per VO pilot credentials • Per VO accounting • Per VO resources description • Pilot directors are VO aware • Job matching takes pilot VO assignment into account
DIRAC Consortium • Other projects are starting to use or evaluating DIRAC • CTA, SuperB, BES, VIP(medical imaging), … • Contributing to DIRAC development • Increasing the number of experts • Need for user support infrastructure • Turning DIRAC into an Open Source project • DIRAC Consortium agreement in preparation • IN2P3, Barcelona University, CERN, … • http://diracgrid.org • News, docs, forum
Conclusions • DIRAC is successfully used in LHCb for all distributed computing tasks in the first years of the LHC operations • Other experiments and user communities started to use DIRAC contributing their developments to the project • The DIRAC open source project is being built now to bring the experience from HEP computing to other experiments and application domains
LHCb in brief Experiment dedicated to studying CP-violation Responsible for the dominance of matter on antimatter Matter-antimatter difference studied using the b-quark (beauty) High precision physics (tiny difference…) Single arm spectrometer Looks like a fixed-target experiment Smallest of the 4 big LHC experiments ~500 physicists Nevertheless, computing is also a challenge….
Tier0 Center • Raw data shipped in real time to Tier-0 • Resilience enforced by a second copy at Tier-1’s • Rate: ~3000 evts/s (35 kB) at ~100 MB/s • Part of the first pass reconstruction and re-reconstruction • Acting as one of the Tier1 center • Calibration and alignment performed on a selected part of the data stream (at CERN) • Alignment and tracking calibration using dimuons (~5/s) • Used also for validation of new calibration • PID calibration using Ks, D* • CAF – CERN Analysis Facility • Grid resources for analysis • Direct batch system usage (LXBATCH) for SW tuning • Interactive usage (LXPLUS)
Tier1 Center • Real data persistency • First pass reconstruction and re-reconstruction • Data Stripping • Event preselection in several streams (if needed) • The resulting DST data shipped to all the other Tier1 centers • Group analysis • Further reduction of the datasets, μDST format • Centrally managed using the LHCb Production System • User analysis • Selections on stripped data • Preparing N-tuples and reduced datasets for local analysis
Tier2-Tier3 centers • No assumption of the local LHCb specific support • MC production facilities • Small local storage requirements to buffer MC data before shipping to a respective Tier1 center • User analysis • No assumption of the user analysis in the base Computing model • However, several distinguished centers are willing to contribute • Analysis (Stripped) data replication to T2-T3 centers by site managers • Full or partial sample • Increases the amount of resources capable of running User Analysis jobs • Analysis data at T2 centers available to the whole Collaboration • No special preferences for local users