Distributed Data Management in CMS

Distributed Data Management in CMS A. Fanfani University of Bologna On behalf of CMS Collaboration

Tier-1 Tier-2 Tier-0 Online system Tier-1 Tier-2 RAW RECO O(50) primary datasets O(10) streams (RAW) Tier-2 • Scheduled • data-processing • Analysis tasks First pass reconstruction Tier-2 tape tape • Analysis • MC simulation RAW,RECO RECO, AOD CMS Computing model • CMS Computing model as described in CTDR • Online system • RAW events are classified in O(50) primary datasets depending on their trigger history • Primary datasets are grouped into O(10) online streams • Distributed model for all computing including the serving and archiving of the RAW and RECO data J. Hernàndez:The CMS Computing Model [361]  Data Management: provide tools to discover, access and transfer event data

General principles • Optimisation for read access • written once, never modified and subsequently read many times • Optimise for the large bulk case • management of very large amounts of data and jobs in a large distributed computing system • Minimise the dependencies of the jobs on the Worker Node • Application dependencies local to the site to avoid single points of failure for the entire system • asynchronous handling of job output relative to the job finishing • Site-local configuration information should remain site-local • flexibility to configure and evolve the local system as needed without synchronisation to the rest of the world

Data organization • CMS expects to produce large amounts of data (events) • O(PB)/year • Event data are in files • average file size is kept reasonably large (≥ GB) • avoid scaling issues with storage systems, catalogues when dealing with too many small files • foresee file merging • O(106) files/year • Files are grouped in Fileblocks • group files in blocks (1-10TB) for bulk data management reasons • 103 Fileblocks/year • Fileblocks are grouped in Datasets • Datasets are large (100TB) or small (0.1TB) : size driven by physics

Data flow Job flow Info flow Data processing workflow • Data Management System allow to discover, access and transfer event data in a distributed computing environment UI Submission tool DBS Dataset Bookeeping System: What data exists? DLS Data Location Service: Where does data exists? WMS CE Local file catalog Site Local catalog: physical location of files at the site WN SE Storage System PhEDEx PhEDEx: take care of data transfer to reach destination site

Dataset Bookkeeping System (DBS) • The Dataset Bookkeeping System (DBS) provides the means to define, discover and use CMS event data • Data definition: • dataset specification (content and associated metadata) • track data provenance • Data discovery: • which data exists • dataset organization in term of files/fileblocks • site independent information DBS • Use: • Distributed analysis tool (CRAB) • MC Production system

DBS scopes and dynamics • A DBS instance describing data CMS-wide (Global scope) • DBS instances with more “local” scope CERN Global DBS Oracle Local DBS Local DBS Working group Official production light-weight (SQLite)

SECERN SEFNAL fileblock1 fileblock1 ..... ..... fileblockN fileblockM Data Location Service (DLS) • The Data Location Service (DLS) provides the means to locate replicas of data in the distributed computing system • it maps file-blocks to storage elements (SE’s) where they are located • few attributes (custodial replica = considered a permanent copy at a site) • very generic: is not CMS-specific • global/local scopes DLS

........ LFN1 PFN1 LFNM PFNM Local data access WN • CMS application read and write files at a site • DLS has names of sites hosting the data and not the physical location of constituent files at the sites • Local file catalogues provides site local information about how to access any given logical file name • Baseline is to use a “Trivial catalogue” • Need to sustain very high-scale performance Local file Catalog • Site local discovery mechanism : discover at runtime on WN the site-dependent job configuration to access data • CMS application interface to storage with a POSIX-like interface • Storage System with SRM (disk, mass storage)

Data placement and transfer (PhEDEx) • Data distribution among Tier-0,Tier-1s,Tier-2s sites • PhEDEx Data placement • administrative decision of data movement (how many copies and where, which ones are custodial copy,…) • management at file block level • PhEDEx Data transfer • Replicating and moving files • Transfer nodes with site-specific replication and tape management J. Rehn: PhEDEx high-throughput data transfer management system,[track DEPP-389] T. Barrass: Techniques for high-throughput, reliable transfer systems: break-down of PhEDEx design, [poster 390]

technical descriptions of the components • DBS • Database: CERN Oracle RAC (Reliable application clusters) server for CMS • Server + Clients: • Set of tools to access the schema + CGI "pseudo" server + Client CGI API • business logic + Web service + Client web service API • DLS • CMS prototype (placeholder for evaluation of Grid catalogues) • python server with MySQL backend + clients tools • no authentication/authorization mechanisms • Evaluation of LCG LFC • Local file catalogues • Storage System • Phedex J. Rehn:PhEDEx high-throughput data transfer management system,[track DEPP-389] • Sitedependent • XML,MySQL,trivial .... • dCache,Castor,...

First experience using the system: CRAB • CRAB (distributed analysis tool) • move CRAB from job configuration using current data discovery and location tools (RefDB/PubDBs) to using DBS/DLS • DBS : • CERN instance (CERN Oracle, CGI interface ), clients on CRAB • Import from RefDB done: 1.8M files, 2400 datasets, 1.4GB • DLS: • CERN instance (MySQL, python server), clients on CRAB • Import from PubDBs: ~ 3600 fileblocks • Site local catalogue discovery: • script to locate the correct local file catalogs deployed at all sites • Testing phase: systematically submitting CRAB jobs configured using DBS/DLS • Exposed to users in February-March M. Corvo:CRAB: a tool to enable CMS Distributed Analysis, [track DDA-273 ] D. Spiga: CRAB Usage and jobs-flow Monitoring, [poster 252 ]

First experience using the system: MC prod • MC Production system • Since November: integration the new DM and new EDM for MC production • Bringing up a DBS system capable of being used for MC production with the new EDM • This part of the API is necessarily more complex (describe new EDM application, register outputs) • This includes also file merging, file block management • Orchestrate the interactions with local scope DBS/DLS and PhEDEx C. Jones: The New CMS Event Data Model and Framework, [track EPA-242] P. Elmer: Development of the Monte Carlo Production Service for CMS, [track DEPP-278]

Summary • CMS expects to be managing many tens of petabytes of data in tens of sites around the world • CMS is developing the Data Management components of the model described in its Computing TDR • CMS is integrating them together into a coherent system to allow managing large amounts of data produced, processed, and analysed • Preparation for a CMS data challenge in September 2006

Distributed Data Management in CMS

Distributed Data Management in CMS

Presentation Transcript

Foundations of Distributed Data Management

Policy-Driven Distributed Data Management

Distributed Data Management

Issues In Distributed Data Management

Distributed Data Management

Information Management and Distributed Data

CMS Distributed Data Analysis Challenges

Intelligent Distributed Data Management in Earth System Science

LHC Distributed Data Management

CMS Distributed Production

Rule-Based Distributed Data Management

Policy-Based Distributed Data Management Systems

Intelligent Distributed Data Management in Earth system science

Distributed Computing Grid Experiences in CMS Data Challenge

Distributed Data Management at DKRZ

Distributed Data Management

ATLAS Distributed Data Management

ATLAS Distributed Data Management Operations

Information Management and Distributed Data

Distributed Data Management