150 likes | 251 Views
A proposal: from CDR to CDH. Paolo Valente – INFN Roma [Acknowledgements to A. Di Girolamo ]. Requirements/1. 1. Execute each operation [transfer, reconstruction…]. 2. Log operations and errors. Execute/Launch the transfer/reconstruction operations
E N D
A proposal: from CDR to CDH Paolo Valente – INFN Roma [Acknowledgements to A. Di Girolamo] NA62 collaboration meeting
Requirements/1 1. Execute each operation [transfer, reconstruction…] 2. Log operations and errors • Execute/Launch the transfer/reconstruction operations • Typically done with a set of scripts, in part running as demons, in part controlled by an operator • [Adapted] NA48 CDR and [adapted] COMPASS CDR have been used in 2007-2008 and during the technical run. • Logging and [in some case] error recovery
Requirements/2 1. Execute each operation, controlling the sequence of all steps 2. Record every operation, keep a catalog of all filesand relative operations on it • Not only execute operation, but also know their status, recognize success/failure, handle anomalies, interface to operator, … • Know and control the sequence of operations • Handle/Notify the status of the “sequence” 3. Monitor/display the status of entire process, following each element during its lifetime Central Data Recording Central Data Handler
States and Transitions Burst 99923-0000 • The atomic unit is the burst. • A burst is connected to a sequence of operations to be performed: • First of all, generation of the RAW file • From the RAW file, a number of tasks involving generation of other files or file transfers • Each operation is a transition from one state to another: • RAW_on_farm_disk RAW_file_on_disk_pool RAW_on_tape • RAW_generated RECO-1_generated THIN-1_generated… • An “operation” is performed for each transition: RAWRECO-1, the reconstruction pass-1 has to be executed; the appropriate copy or remote copy command for file transfers • A new entry has to be created in the file catalogfor each transition • Essentially, 2 kinds of transition: • File generation • File transfer cdr00099923-0000 RAW burst RAW RECO Reconstruction RAW RAW Filter RECO THIN Thinning RECO RECO Split THIN NTUP MakeNtup
The idea • We must have the catalog of all the files [+ their meta-data, e.g. data quality information, basic information from TDAQ, etc.] • Link all the files relative to a given burst: the logical unit is the burst • Define the sequence of statesthroughwhicheachbursthas to pass • Eachstate transitiondefines an operation to be performed on the files • Define a “task” as the operations to be applied to givenset of entries in the file catalog[thuscausing a state transition for the relative bursts] • Webuild an “Handler” process to control operations: given a task the Handlerwill: • Create the list of files on whichexecute the command(s) • Trigger the execution of the appropriate command(s) on them [typicallylaunching a script] • The trigger for startingthis can be eitherautomatic or performed by an operator • Check the execution and notify/handleanomalous or failures The file catalog The “Handler” on_farm_disk on_disk_pool Sequence: file storage The file catalog The “Handler” on_T0_tape Distributed_to_T1
Burst 99923-0000 • The atomic unit is the burst. A burst is connected to a number of files: • There is only one RAW for each burst • Many RECO, THIN, NTUP, …, files can be generated starting from one burst • The files can have multiple copies on different filesystems and in different sites • Files of different kind are generated (RECO, THIN, …) • Use the burst id as primary key. • Generate the first entry as soon as the RAW file appears in the farm disk • Then, attach to it all the following steps in the lifetime of the burst cdr00099923-0000.dat • cdr00099923-0000.dat • cdr00099923-0000.reco-1 • cdr00099923-0000.reco-1.thin-1 • cdr00099923-0000.reco-1.thin-2 • … • cdr00099923-0000.reco-2 • cdr00099923-0000.reco-2.thin-1 • cdr00099923-0000.reco2.thin-2 • … • …
Let’s make a toy example: file storage For the first step it would be ideal to have the MERGER to insert a new record for each new burst into the catalog, as soon as it creates a new RAW file (otherwise we’ll have to poll) on_farm_disk • /merger/../cdr00099925-0000.dat • /merger/../cdr00099923-0000.dat • /merger/../cdr00099924-0000.dat • /merger/../cdr00099925-0000.dat • The Handler queries for bursts in the state on_farm_disk and creates the list of files to be copied • The Handler creates the appropriate tranfer command • The Handler issues the execution of the command on each of the files in the list and checks for success: • If success: • Create new entry in the file catalog, corresponding to the new replica of the RAW file • Change the status of the burst N to on_disk pool • Otherwise: handle or just notify the failure Probably intermediate states are needed in order to correctly handle the progress of the operation + + … + + xrdcp • file //eos/na62/data/cdr root://eosna62.cern.ch • //eos/../cdr00099923-0000.dat • //eos/../cdr00099924-0000.dat • //eos/../cdr00099925-0000.dat on_farm_disk on_disk_pool on_farm_disk on_disk_pool_canceled on_disk_pool_pending on_disk_pool_failed on_disk_pool_started on_disk_pool
The database • The file catalog and the states plus all necessary information will be in this database • Basic tasks of the catalog: • Give an unique file-id and relate to local filename • Relate to its metadata • We also want to: • Keep the relations between all the files related to the same burst, • Keep the state related to the reconstruction/transfer steps • The Handler will trigger the transition, based on the current state of the file Table: Burst • Number* • MotherRAW [File] • RunType • RunNumber • … Table: Storage Table: File NA62-FARM CERN-PROD RAL INFN-CNAF … Table: Site • Name* • StorageType [StorageType] • isActive • hasReplica • … • Name* • FileType [FileType] • CustodialLevel • Version • CreationTimestamp • ModificationTimestamp • DeletionTimeStamp • Site [Site] • Storage [Storage] • CopyNumber • Mother [File] • … SCRATCH-1 FARMDISK-1 EOSNA62 CASTORNA62 … • Name* • SiteType [SiteType] • Location • ContactPerson • isActive • … Table: StorageType Table: SiteType • Name* • isCustodial • … TAPE EOS DISK … • Name* • hasTape • hasDisk • … Table: FileType FARM TIER-0 TIER-1 TIER-2 … • Name* • isData • hasVersion • … RAW RECO THIN NTUP …
Example Burst RAW (farm) RAW (disk pool) RAW (T0 tape) File File File RAW (T1 disk) RAW (T1 tape) File File RECO-1 (T1 disk) RECO-1 (T1 tape) File File THIN-1 (T2 disk) Reconstruction & thinning THIN-1 (T1 disk) File File File File File RAW (T1 disk) copy 2 First reprocessing File RECO-2 (T1 disk) File 300k bursts/year × 3 years 1,000,000 bursts × O(100) entries = 100M entries THIN-2 (T1 disk) File
Which DB technology? 300k bursts/year × 3 years 1,000,000 bursts × O(100) entries = 100M entries • Looks huge e.g. for MySQL, but ALiEn (ALICE distributed environment, including CATALOG an JOB management) successfully uses MySQL • A number of optimizations/tricks can be used: • Partitioning • Indexes • Common queries/caching • … • Of course there are alternatives. SQUID caching necessary. • By the way… • ALiEn is a very close example: it uses open source software and can be inspirational or even reused • ALiEn project started to provide a file catalog to ALICE and then expanded
Grid services Catalog Job management Handler The other piece to have a complete system… User Interface
ALICE WMS LHCb ATLAS
Pull vs. Push job submission • gLite: a set of grid middleware components responsible for the distribution and management of tasks across grid resources • Push model: • Working as a super-batch system • Jobs submitted to the WMS which schedules the jobs to a Grid CE (computing center) • Computing centers implement their internal batch queues to schedule jobs on the worker nodes • Experiments have implemented their solutions to integrate between middleware and application layer • Frameworks born to manage high-level workflows • Direct control on translation from workflow into grid jobs Independently, the LHC experiments are evolving towards “Pilot job” systems: • Pull model: • Pilot jobs are asynchronously submitted jobs which are running on worker nodes • Users submit jobs to a centralized queue • Pilot jobs communicate with the WMS (pilot aggregator) pulling user jobs from the repository