170 likes | 330 Views
CMS Distributed Production. Michael Thomas Sep. 20, 2006. With thanks to Ajit Mohapatra from UW. Old Production Model. Large scale MC production in CMS has been going on for past several years. Based on agreement/willingness of individual CMS institutes to participate in the MC production.
E N D
CMS Distributed Production Michael Thomas Sep. 20, 2006 With thanks to Ajit Mohapatra from UW CMS Production System
Old Production Model • Large scale MC production in CMS has been going on for past several years. • Based on agreement/willingness of individual CMS institutes to participate in the MC production. • Installation/maintenance of CMS software at the involved sites by their own people. • MC production, storage and transfer was managed by the sites. • Site responsibility to arrange/negotiate the needed resources by themselves. • Each production site performed the task based on the directions from the production mgmt group at CERN. CMS Production System
New Grid-based Distributed Production • Based on new CMSSW software framework • Centralized software installation at both LCG and OSG sites • (Centralized) Production Manager maintains all work to be done (1 OSG operator, 3 for LCG) • Remote Production Agents pull work from the Production Manager, run and track jobs • Jobs are monitored and resubmitted in case of failure • Makes extensive use of Grid resources • In use since July 2006 CMS Production System
CMS Computing Infrastructure CMS Production System
ProdAgent CMS Production System
Production System Production Manager Production Agent Production Agent Site Site Site Job Job Job Job Job Job Job Job Job Job Job Job CMS Production System
Production System Production/Workflow Manager Workflow Spec * Production Agent Job Spec * Job * Application CMS Production System
Production System Workflow Spec • Request/Assignment level template • Has Unique ID • Corresponds to CTDR Task • Used by the (central) Production Manager process Job Spec Job Application CMS Production System
Production System II Workflow Spec • Processing Job Definition • Has a unique ID within a workflow/request • Used to create physical jobs • Dealt with by the Production Agent Job Spec Job Application CMS Production System
Production System III Workflow Spec • Physical job created from Job Spec • Combination of JobSpecID and batch/grid job id are unique • Corresponds to a CTDR job • Multiple jobs created from the same JobSpec will be the same job in terms of physics (retries, etc.) Job Spec Job Application CMS Production System
Production System IIII Workflow Spec • Application like nodes within a job, managed by SHREEK • CMSSW, StageOut, Rescue types of tasks • May be multiple CMSSW apps run in a single job • Each app has a unique node name within the workflow Job Spec Job Application CMS Production System
Monitoring • Prod. Manager receives notifications from Prod. Agent • Allocated • Final states: success, failure • Prod. Agent monitors the high level status of jobs via Message Service and reports to CMS Dashboard • Created • Running • Finished • Fail (requeue) • Job Tracker monitors the job status • Queued • % completed • Output text • Job publishes data • BOSS wrapper captures stdout • Job may use ApMon directly to publish to MonALISA CMS Production System
Data Storage on OSG CMS Production System
Data Transfer and Management CMS Production System
Production Summary CMS Production System
Production Summary CMS Production System
Looking Forward • Integration of Prod Manager • Scale to multiple Prod Agents • Extend production to non-CMS OSG sites • Ramp up for CSA06 CMS Production System