90 likes | 151 Views
Explore the architectural relationship between SAMGrid and MCRunJob, JIM's detailed requirements, and the current status of integration efforts. Learn about job submission, execution, file handling, and validation processes. Gain insights into the necessary services, configuration layers, and reproducibility goals. Get details on ongoing developments, bug fixes, and future plans for a seamless transition to the next generation MCRunJob.
E N D
MCRunJob in SAMGrid i.t.
Outline • Architectural relationship • JIM’s Requirements/expectations • SAMGrid Status w.r.t. D0 MC in relation to RunJob/Shahkar Igor Terekhov, FNAL
SAMGrid/MCRunJob Interaction JIM User Interface Client Site Sanity Checks SAM DH Services Req details JIM Local Job Submission Generate Local Macro Headnode At exec site MCRunJob (Retrieve)/Store Files Execute Local Macro Worker node At exec site JIM Local Job Execution (Sand-boxing) Igor Terekhov, FNAL
Interactions, cont-d • Sanity Check – see if request can be executed an any site. Add requirements on sites based on D0 soft (to go away) • Macro Generation – Request details retrieval (a must) and local settings incorporation (being revised) • Execution – JIM sandbox package initializes environment and calls MCRJ. • DH – Store files from worker nodes (now) or prepare for merging (later). Igor Terekhov, FNAL
Abstracted Interactions • Need to validate request • Need to prepare job execution • (Use external job submission) actually execute it • Need to use a Grid Data Handling like SAM for file access Igor Terekhov, FNAL
Requirement Derivation • Must be able to provide several services rather than mix all in one command • Must use (externally configured) Fabric Adapter services such as: • SAM batch adapters • JIM sandboxing • Must NOT assume that “qsub” is in the path and shared file systems “rcp” works • Must have thin configuration layer • Goal is (statistical) reproducibility of Grid job results • On-site editing of core files is out of question • More than that, should not have 20 obscure parameters that will ensure that your results will differ Igor Terekhov, FNAL
Xmas Wish List (Req-s Continued) • Should NOT assume control of everything it gets a hold of • Should not have concepts like “HOME directory” (WM – grid site is time-shared condo not your house) • Should have ability to validate statistically results from multiple sites Igor Terekhov, FNAL
JIM status and MCRunJob • Initial phase of SAMGrid integration complete • Separated job preparation from execution • Inserted fabric management tools • Reimplemented SAM file storing in new Executor framework • Miscellaneous bug fixes • Much work was done by JIMmers due to core folks doing re-processing • We are running SAMGrid MC jobs at Wisconsin, Manchester and Lyon • We need to freeze MCRunJob to complete MC commissioning with SAMGrid: • Understand brokering issues • Understand Monitoring issues • Decompose Jobs at Grid level • Use SAMGrid for unified management of Data and Job files T<10min? Igor Terekhov, FNAL
Status, Continued • Expect a few months (depends on D0 participation. Warning, Rod leaving). • Et Apres, (or in parallel if resources from outside the SAMGrid team), migrate to the next generation mc_runjob. Igor Terekhov, FNAL