110 likes | 215 Views
Producing and Reading DST: DC04 and beyond. Vincenzo Innocente CERN/PH. DAG: “coarse granularity”. DataSet: a tag that identify a set of data (by its origin) users are supposed to know it!. COBRA Ower identifies a transformation applied to a dataset.
E N D
Producing and Reading DST:DC04and beyond Vincenzo Innocente CERN/PH DC04
DAG: “coarse granularity” DataSet: a tag that identify a set of data (by its origin) users are supposed to know it! COBRA Ower identifies a transformation applied to a dataset The couple Owner/Dataset identifies uniquely the output of a transformation: usually a collection of events each containing a given set of event-data-products (collections of RecObjs for instance) DC04
DAG high-granularity DC04
DST production: 1) initialization # define the transformation to be a dataset initialization InitDataSet = true # define input catalog (read only) InputFileCatalogURL = contact-string-of-hits&digi-meta #define output catalog OutputFileCatalogURL = catalog-of-dst-metafile #force one file for the ower meta OneMetaFile = true #force one file of meta for each dataset ( from 772) DataSetMetaFile = true #define input collection InputCollections = /System/digiOwner/Dataset/EvC_RunNNN #define output dataset OutputDataset = /System/dstOwner/Dataset #define streams (create new datasets) DC04
DST production: 2) produce EVD # define input catalog (read only) InputFileCatalogURL = catalog-of-dst-metafile contact-string-of-hits&digi-meta contact-string-of-digi-EVD #define new output catalog (if winter, cobra adds an oid to the name) OutputFileCatalogURL = file:local.xml #define input collection InputCollections = /System/digiOwner/Dataset/EvC_RunNNN #define output dataset OutputDataset = /System/dstOwner/Dataset #define streams (create new datasets) #define output run number OutputRunNumber = NNN DC04
DST production: products • Initialization • Creation or update of file CARF_System.META.ower • Creation of one file META.dataset.ower_ for each dataset • EVD production • Creation of a catalog local_jobid.xml • For each output dataset (streams) • Creation of EVD files • EVDn_DataType.jobid.runid.dataset.owner • runid file : owner.dataset.jobid.runid • runend file : owner.dataset.jobid.runend DC04
DST Production: 3) verification • The presence of the runend files guarantees that the run has been closed • Inspection of runend, running findColls or even batchcobra on the output collections can be used to verify and validate that the job completed its assignment. • All this can run on the working node or later • Publish and distribute EVD files • Publish runid, runend and catalog • Meta have been published after initialization • Run AttachRun for each runid file • If meta on shared file system can even be run on the WN • More efficient to batch them DC04
DST analysis: 1) output of a job # define input catalogs (read only) InputFileCatalogURL = file:local_jobid.xml catalog-of-dst-metafile contact-string-of-hits&digi-virgin_meta #define input collection InputCollections = oid-extracted-from-.runid-file DC04
DST analysis: 1a) remote one job • Given a .runid-file verify that files are present • FClistLFN –u localCatalog –q “jobid=‘jobid’ AND dataset AND owner” • Run a modified version of findColls that uses the oid from .runid instead of the LFN of EVD file # define input catalogs (read only) InputFileCatalogURL = localCatalog global-catalog-of-dst-metafile contact-string-of-hits&digi-virgin_meta #define input collection InputCollections = oid-extracted-from-.runid-file DC04
DST analysis: 2) remote “full” dataset # define input catalogs (read only) InputFileCatalogURL = localCatalog global-catalog-of-dst-metafile contact-string-of-hits&digi-virgin_meta #define input collection InputCollections = /System/Owner/DataSet One will run on all run attached to the global-dst-metafile for which EVD files are present in the localCatalog FirstEvent MaxEvents Can be used to secure the event interval to be identical for subsequent processing (more clever selections require FixColl to be run) Jobs will crash if AttachRun is run at the same time against global-meta (backup should be used) DC04
DST Analysis:what is not needed • Attached MetaData for all previous owners • Even if access to mc-truth or digis is required • Event file from previous owners • Unless analysis wishes to access mc-truth or digis DC04