1 / 11

Producing and Reading DST: DC04 and beyond

Producing and Reading DST: DC04 and beyond. Vincenzo Innocente CERN/PH. DAG: “coarse granularity”. DataSet: a tag that identify a set of data (by its origin) users are supposed to know it!. COBRA Ower identifies a transformation applied to a dataset.

march
Download Presentation

Producing and Reading DST: DC04 and beyond

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Producing and Reading DST:DC04and beyond Vincenzo Innocente CERN/PH DC04

  2. DAG: “coarse granularity” DataSet: a tag that identify a set of data (by its origin) users are supposed to know it! COBRA Ower identifies a transformation applied to a dataset The couple Owner/Dataset identifies uniquely the output of a transformation: usually a collection of events each containing a given set of event-data-products (collections of RecObjs for instance) DC04

  3. DAG high-granularity DC04

  4. DST production: 1) initialization # define the transformation to be a dataset initialization InitDataSet = true # define input catalog (read only) InputFileCatalogURL = contact-string-of-hits&digi-meta #define output catalog OutputFileCatalogURL = catalog-of-dst-metafile #force one file for the ower meta OneMetaFile = true #force one file of meta for each dataset ( from 772) DataSetMetaFile = true #define input collection InputCollections = /System/digiOwner/Dataset/EvC_RunNNN #define output dataset OutputDataset = /System/dstOwner/Dataset #define streams (create new datasets) DC04

  5. DST production: 2) produce EVD # define input catalog (read only) InputFileCatalogURL = catalog-of-dst-metafile contact-string-of-hits&digi-meta contact-string-of-digi-EVD #define new output catalog (if winter, cobra adds an oid to the name) OutputFileCatalogURL = file:local.xml #define input collection InputCollections = /System/digiOwner/Dataset/EvC_RunNNN #define output dataset OutputDataset = /System/dstOwner/Dataset #define streams (create new datasets) #define output run number OutputRunNumber = NNN DC04

  6. DST production: products • Initialization • Creation or update of file CARF_System.META.ower • Creation of one file META.dataset.ower_ for each dataset • EVD production • Creation of a catalog local_jobid.xml • For each output dataset (streams) • Creation of EVD files • EVDn_DataType.jobid.runid.dataset.owner • runid file : owner.dataset.jobid.runid • runend file : owner.dataset.jobid.runend DC04

  7. DST Production: 3) verification • The presence of the runend files guarantees that the run has been closed • Inspection of runend, running findColls or even batchcobra on the output collections can be used to verify and validate that the job completed its assignment. • All this can run on the working node or later • Publish and distribute EVD files • Publish runid, runend and catalog • Meta have been published after initialization • Run AttachRun for each runid file • If meta on shared file system can even be run on the WN • More efficient to batch them DC04

  8. DST analysis: 1) output of a job # define input catalogs (read only) InputFileCatalogURL = file:local_jobid.xml catalog-of-dst-metafile contact-string-of-hits&digi-virgin_meta #define input collection InputCollections = oid-extracted-from-.runid-file DC04

  9. DST analysis: 1a) remote one job • Given a .runid-file verify that files are present • FClistLFN –u localCatalog –q “jobid=‘jobid’ AND dataset AND owner” • Run a modified version of findColls that uses the oid from .runid instead of the LFN of EVD file # define input catalogs (read only) InputFileCatalogURL = localCatalog global-catalog-of-dst-metafile contact-string-of-hits&digi-virgin_meta #define input collection InputCollections = oid-extracted-from-.runid-file DC04

  10. DST analysis: 2) remote “full” dataset # define input catalogs (read only) InputFileCatalogURL = localCatalog global-catalog-of-dst-metafile contact-string-of-hits&digi-virgin_meta #define input collection InputCollections = /System/Owner/DataSet One will run on all run attached to the global-dst-metafile for which EVD files are present in the localCatalog FirstEvent MaxEvents Can be used to secure the event interval to be identical for subsequent processing (more clever selections require FixColl to be run) Jobs will crash if AttachRun is run at the same time against global-meta (backup should be used) DC04

  11. DST Analysis:what is not needed • Attached MetaData for all previous owners • Even if access to mc-truth or digis is required • Event file from previous owners • Unless analysis wishes to access mc-truth or digis DC04

More Related