140 likes | 280 Views
Distributed Data Management in CMS. A. Fanfani University of Bologna On behalf of CMS Collaboration. Tier-1. Tier-2. Tier-0. Online system. Tier-1. Tier-2. RAW RECO. O(50) primary datasets
E N D
Distributed Data Management in CMS A. Fanfani University of Bologna On behalf of CMS Collaboration
Tier-1 Tier-2 Tier-0 Online system Tier-1 Tier-2 RAW RECO O(50) primary datasets O(10) streams (RAW) Tier-2 • Scheduled • data-processing • Analysis tasks First pass reconstruction Tier-2 tape tape • Analysis • MC simulation RAW,RECO RECO, AOD CMS Computing model • CMS Computing model as described in CTDR • Online system • RAW events are classified in O(50) primary datasets depending on their trigger history • Primary datasets are grouped into O(10) online streams • Distributed model for all computing including the serving and archiving of the RAW and RECO data J. Hernàndez:The CMS Computing Model [361] Data Management: provide tools to discover, access and transfer event data
General principles • Optimisation for read access • written once, never modified and subsequently read many times • Optimise for the large bulk case • management of very large amounts of data and jobs in a large distributed computing system • Minimise the dependencies of the jobs on the Worker Node • Application dependencies local to the site to avoid single points of failure for the entire system • asynchronous handling of job output relative to the job finishing • Site-local configuration information should remain site-local • flexibility to configure and evolve the local system as needed without synchronisation to the rest of the world
Data organization • CMS expects to produce large amounts of data (events) • O(PB)/year • Event data are in files • average file size is kept reasonably large (≥ GB) • avoid scaling issues with storage systems, catalogues when dealing with too many small files • foresee file merging • O(106) files/year • Files are grouped in Fileblocks • group files in blocks (1-10TB) for bulk data management reasons • 103 Fileblocks/year • Fileblocks are grouped in Datasets • Datasets are large (100TB) or small (0.1TB) : size driven by physics
Data flow Job flow Info flow Data processing workflow • Data Management System allow to discover, access and transfer event data in a distributed computing environment UI Submission tool DBS Dataset Bookeeping System: What data exists? DLS Data Location Service: Where does data exists? WMS CE Local file catalog Site Local catalog: physical location of files at the site WN SE Storage System PhEDEx PhEDEx: take care of data transfer to reach destination site
Dataset Bookkeeping System (DBS) • The Dataset Bookkeeping System (DBS) provides the means to define, discover and use CMS event data • Data definition: • dataset specification (content and associated metadata) • track data provenance • Data discovery: • which data exists • dataset organization in term of files/fileblocks • site independent information DBS • Use: • Distributed analysis tool (CRAB) • MC Production system
DBS scopes and dynamics • A DBS instance describing data CMS-wide (Global scope) • DBS instances with more “local” scope CERN Global DBS Oracle Local DBS Local DBS Working group Official production light-weight (SQLite)
SECERN SEFNAL fileblock1 fileblock1 ..... ..... fileblockN fileblockM Data Location Service (DLS) • The Data Location Service (DLS) provides the means to locate replicas of data in the distributed computing system • it maps file-blocks to storage elements (SE’s) where they are located • few attributes (custodial replica = considered a permanent copy at a site) • very generic: is not CMS-specific • global/local scopes DLS
........ LFN1 PFN1 LFNM PFNM Local data access WN • CMS application read and write files at a site • DLS has names of sites hosting the data and not the physical location of constituent files at the sites • Local file catalogues provides site local information about how to access any given logical file name • Baseline is to use a “Trivial catalogue” • Need to sustain very high-scale performance Local file Catalog • Site local discovery mechanism : discover at runtime on WN the site-dependent job configuration to access data • CMS application interface to storage with a POSIX-like interface • Storage System with SRM (disk, mass storage)
Data placement and transfer (PhEDEx) • Data distribution among Tier-0,Tier-1s,Tier-2s sites • PhEDEx Data placement • administrative decision of data movement (how many copies and where, which ones are custodial copy,…) • management at file block level • PhEDEx Data transfer • Replicating and moving files • Transfer nodes with site-specific replication and tape management J. Rehn: PhEDEx high-throughput data transfer management system,[track DEPP-389] T. Barrass: Techniques for high-throughput, reliable transfer systems: break-down of PhEDEx design, [poster 390]
technical descriptions of the components • DBS • Database: CERN Oracle RAC (Reliable application clusters) server for CMS • Server + Clients: • Set of tools to access the schema + CGI "pseudo" server + Client CGI API • business logic + Web service + Client web service API • DLS • CMS prototype (placeholder for evaluation of Grid catalogues) • python server with MySQL backend + clients tools • no authentication/authorization mechanisms • Evaluation of LCG LFC • Local file catalogues • Storage System • Phedex J. Rehn:PhEDEx high-throughput data transfer management system,[track DEPP-389] • Sitedependent • XML,MySQL,trivial .... • dCache,Castor,...
First experience using the system: CRAB • CRAB (distributed analysis tool) • move CRAB from job configuration using current data discovery and location tools (RefDB/PubDBs) to using DBS/DLS • DBS : • CERN instance (CERN Oracle, CGI interface ), clients on CRAB • Import from RefDB done: 1.8M files, 2400 datasets, 1.4GB • DLS: • CERN instance (MySQL, python server), clients on CRAB • Import from PubDBs: ~ 3600 fileblocks • Site local catalogue discovery: • script to locate the correct local file catalogs deployed at all sites • Testing phase: systematically submitting CRAB jobs configured using DBS/DLS • Exposed to users in February-March M. Corvo:CRAB: a tool to enable CMS Distributed Analysis, [track DDA-273 ] D. Spiga: CRAB Usage and jobs-flow Monitoring, [poster 252 ]
First experience using the system: MC prod • MC Production system • Since November: integration the new DM and new EDM for MC production • Bringing up a DBS system capable of being used for MC production with the new EDM • This part of the API is necessarily more complex (describe new EDM application, register outputs) • This includes also file merging, file block management • Orchestrate the interactions with local scope DBS/DLS and PhEDEx C. Jones: The New CMS Event Data Model and Framework, [track EPA-242] P. Elmer: Development of the Monte Carlo Production Service for CMS, [track DEPP-278]
Summary • CMS expects to be managing many tens of petabytes of data in tens of sites around the world • CMS is developing the Data Management components of the model described in its Computing TDR • CMS is integrating them together into a coherent system to allow managing large amounts of data produced, processed, and analysed • Preparation for a CMS data challenge in September 2006