140 likes | 285 Views
CRAB a user-friendly tool for CMS distributed analysis. Federica Fanzago INFN-PADOVA for CRAB team. CMS overview. CMS (Compact Muon Solenoid) is one of the four particle physics experiment that will collect data at LHC (Large Hadron Collider) starting in 2007 at CERN.
E N D
CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team
CMS overview • CMS (Compact Muon Solenoid) is one of the four particle physics experiment that will collect data at LHC (Large Hadron Collider) starting in 2007 at CERN. • CMS will produce a big quantity of data • Data should be stored and made available for analysis to world-wide distributed physicists. • ~2 PB events/year (startup luminosity 2x1033 cm-2 s-1) • All events will be stored into files • O(10^6) files/year • Files will be grouped in Fileblocks • O(10^3) Fileblocks/year • Fileblocks will be grouped in Datasets • O(10^3) Datasets (total after 10 years of CMS) • 0.1- 100 TB
Why the grid… • ISSUES: • How to manage and where to store this huge quantity of data? • How to assure data access to physicists of CMS collaboration? • How to have enough computing power for processing and data analysis? • How to ensure resources and data availability? • How to define local and global policy about data access and resources? • SOLUTION: • CMS will use a distributed architecture based on grid infrastructure to ensure remote resources availability and to assure remote data access to authorized user (belonging to CMS Virtual Organization). • The grid infrastructure guarantees also enough computing power for simulation, processing and analysis data.
CMS computing model The CMS offline computing system is arranged in four Tiers which are geographically distributed Offline farm Online system recorded data CERN Computer center Tier 0 . . Tier 1 Fermilab Regional Center Italy Regional Center France Regional Center . . . Tier 2 Remote data accessible via grid Tier2 Center Tier2 Center Tier2 Center workstation Tier 3 InstituteA InstituteB
Data distribution… • During data acquisition data from detector which got over different trigger level will be sent, stored and first step reconstructed at Tier-0. • Then they will be distributed over some Tiers depending on the kind of physics data • Until real data are not available, the CMS community needs simulated data to study the detector response, the foreseen physics interaction and to get experience with management and analysis of data. So a large number of simulated data are produced and distributed among computing centers.
grid middleware… UI Job submission tools Information Service collector Data location system Workload Management System Query for data Query for matchmaking Resource Broker (RB) CE CE CE SE SE SE Tools for accessing distributed data and resources are provided by the World LHC Computing Grid (WLCG) that takes care about different grid flavours as LCG/gLite in Europe and OSG in the US. LCG middleware: Main LCG middleware components: Virtual Organizations (CMS ...) Resource Broker (RB) Replica Catalog (LFC) Computing Element (CE) Storage Element (SE) Worker node (WN) User Interface (UI)
Analysis in a local environment… • User writes his own analysis code and configuration parameter card • Starting from CMS specific analysis software • Builds executable and libraries • He apply the code to a given amount of events, whose location is known, splitting the load over many jobs • But generally he is allowed to access only local data • He writes wrapper scripts and uses a local batch system to exploit all the computing power • Comfortable until data you’re looking for are sitting just by your side • Then he submits all by hand and checks the status and overall progress • Finally collects all output files and store them somewhere But now data and resources are distributed
…and in a distributed environment The distributed analysis is a more complex computing task because it assume to know: • which data are available • where data are stored and how to access them • which resources are available and are able to comply with analysis requirements • grid and CMS infrastructure details Users don't want deal with these kind of problem They want to analyze data in “a simple way” as in local environment
Distributed analysis chain To allow distributed analysis the CMS collaboration is developing some tools interfaced with available grid services, that include: • Installation of CMS software via grid on remote resources • Data transfer service: to move and manage a large flow of data among tiers • Data validation system: to ensure data consistency • Data location system: catalogues to keep track of data available in each site and to allow data discovery • Dataset Bookkeeping System. It knows which data exist and contains CMS specific description of event data • Data Location Service. It knows where data are stored. Mapping between file-blocks and SE • Local file catalog: physical location of local data on remote SE • CRAB: Cms Remote Analysis Builder...
CRAB CMS Remote Analysis Builder • CRAB is a user-friendly tool whose aim is to simplify the work of users with no knowledge of grid infrastructure to create, submit and manage job analysis into grid environments. • written in python and installed on UI (grid user access point) • Users have to develop their analysis code in a interactive environment decide which data to analyze and how to manage jobs output • Data discovery on remote resources, resources availability, status monitoring and output retrieval of submitted jobs are fully handled by CRAB data in remote and distributed sites User analysis code as in local environment UI with CRAB
CRAB workflow Data Bookkeeping System CRAB Data Location System WMS CE ... WN WN SE dataset n.of events user code UI Main CRAB functionalities: Input data discovery: the list of sites (SEs name) where data are stored, querying “data location system” (DBS-DLS) Packaging of user code: creation of a tgz archive with user code and parameters Job creation: • Wrapper of user code executable to run on WN (sh script) • Jdl file: SE name as requirement to drive resources matchmaking • Job splitting according to user request Job submission to the grid Job status monitoring andoutputretrieval Handling of user output: copy to UI or to a generic Storage Element Job resubmission in case of failure Job jdl tgz sh SEs list Job Local File Catalog data Job output Job output
CRAB usage (1) The total number of jobs submitted to the grid using CRAB is more than 1 million by 40-50 users. Job submitted from worldwide distributed UI SC04 PTDR SC03 More or less 1500 jobs are submitted each day.
CRAB usage (2) ~75% job success rate (success means jobs arrive to remote sites and produce outputs) ~25% aborts due to site setup problem or grid services failure. ~7000 Datasets available for O(10^8) total events, full MC production physicists are using CRAB to analyze remote data stored in LCG and OSG sites.
Conclusion • CRAB tool is used to analyze remote data and also to continuously test CMS Tiers to prove the whole infrastructure robustness • CRAB proves a CMS user with no knowledge about grid infrastructure are able to use grid services • CRAB demonstrates distributed analysis works in a distributed environment. • The future code development will be related to split CRAB in a client-server system with the aim to minimize the user effort to manage analysis jobs and obtain their results. http://cmsdoc.cern.ch/cms/ccs/wm/www/Crab/