1 / 11

Pavel Nevski

DDM Operation and Production. Pavel Nevski. DDM Operation Workshop CERN, January 26, 2007. Outlook. DDM Operation in Job Definition DDM Operation in Data Replication. DDM and Job definition.

karellano
Download Presentation

Pavel Nevski

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DDM Operation and Production Pavel Nevski DDM Operation Workshop CERN, January 26, 2007

  2. Outlook • DDM Operation in Job Definition • DDM Operation in Data Replication

  3. DDM and Job definition • Atlas Jobs are defined dynamically – i.e. only after verification that input exists on the proper GRID • This mode requires some data validation before jobs can be submitted • Currently only event generation (evgen) input and output from a different GRID are accepted • Other types of job are running on their original GRIDs

  4. Modified Flow Control • Tasks are going through a chain of states: • After is requested it goes to Pending state to allow possible user’s correction, input verification and output Dataset creation • After a delay (~3 hours) output datasets are created and task goes to submitted or submitting state • When Submitted and jobs start to succeed it goes to Running • When all jobs are terminated (DONE or ABORTED) task is Done,Finished or Failed

  5. Components and their relations Production Manager Web interface (AKTR) Request Table USER metadata Task table DDM Dataset JobDef entry Outputs Production Operation JobExec

  6. Input Data Control integrated with DDM • If input data were produced on the same grid flavor, jobs are released in TOBEDONE status • still, simulations jobs are in WAITING status to allow evgen file replication • If input data were produced on a different grid flavor, jobs are defined in WAITINGCOPY status • If user input (events) is required, jobs are defined as WAITINGINPUT until input is fully available • Input data needed are first collected at CERN or BNL • Inputs are moved using DDM

  7. Event Input • Input events are registered by users on a site they are working (typically CERN, BNL; sometime Lyon, RAL) • Job Definition cron detects evgen jobs which require input events and put them in WAITINGINPUT state • The same cron updates a list of requested inputs with the proper destination grid GRID • Another cron is trying to locate inputs and copy them to the proper GRID

  8. Input replication • Till december 2006 copying was done using subscription mechanism • A lot of problems with delayed execution and especially error analysis. • Repeating subscription almost never helps • Few cross-GRID submissions succeeded • Since December copying is done using dq2_cr • Smooth operation, most of errors are corrected by dq2_cr internally • Remaining problems fixed with repeating copy • A few dozen of tasks succeed during last month

  9. Reconstruction inputs • A validation procedure is developed to verify that outputs are registered in a T1 catalog • For the moment it is too slow to use for all job submission • However it allows to verify at least a posteriorithat a task is closed properly • More work is needed to optimize the dataset validation

  10. Dataset replication using local subscription agent • Presence of a dataset from a “list of interest” is detected on a T1 • Verification of T2 replica (incomplete) • If no replica exists on T2 , subscribe it • Process is repeated until data reach T2 • Needs verification on T2 (?)

  11. Conclusion • Job definition for ATLAS distributed production is integrated with DDM • Common set of scripts is developed to support Production and central Data Distribution

More Related