80 likes | 93 Views
Learn about the next logical step after job splitting: job merging. Discover how to concatenate the results of subjobs and the benefits of merging. Follow the status of merging and automate the process.
E N D
AliEn Job Merging Pablo Saiz CAF and Grid User Forum
Job Merging • Next logical step after job splitting • See http://indico.cern.ch/conferenceDisplay.py?confId=31167 • Concatenate the result of all subjobs of a given masterjob • New status if a masterjob needs merge: INSERTED SPLITTING SPLITMERGINGDONE • The ‘merging’ is another job • It will wait in the queue like any other job pablo.saiz@cern.ch
1 image is better than 1000 words histo.root analysis.log Subjob 1 Merge Histo AllHisto.root Subjob 2 histo.root analysis.log User JDL Subjob 3 Merge Logs ERROR!! Alllogs.txt … histo.root analysis.log Subjob n Time INSERTED SPLIT MERGING DONE pablo.saiz@cern.ch
How to specify Merging • In the JDL of the masterJob: • Merge={“<input>:<jdl>:<output>” (,“<input2>:<jdl2>:<output2>”)* } • MergeOutputDir=“/path/where/you/want/the/output”; • Default /proc/<user>/<masterid>/merge • AliEn will do: submit <jdl> <masterJobId> <input> <output> <user> <procdir> <outputdir> pablo.saiz@cern.ch
How to start the merging • Automatically: • When all the subjobs are in a final state, AliEn sends the merging • masterJob <id> merge • Force the merging of the subjobs that have finished • By hand: • submit <jdl> <masterJobId> <input> <output> <user> <procdir> <outputdir> pablo.saiz@cern.ch
Existing merging JDLs • /alice/jdl/mergerootfile.jdl • /alice/jdl/mergerootfile-sequential.jdl • No requirements. The merging can be executed anywhere! • User defined • Variations of the previous jdl • There is no merging for text files: • Needed? pablo.saiz@cern.ch
ToDo • Given a masterjobid, follow up the status of the merging • Automatically put requirements on the execution site for the merging. • More documentation in the bible • ? pablo.saiz@cern.ch
Conclusions • Merging collects the output of subjobs into a single file • Performed when all the subjobs are in a final state: ERROR or DONE • Can also be trigger manually • Documentation will be added to the bible pablo.saiz@cern.ch