100 likes | 203 Views
Remote Operation of a Monte Carlo Production Farm Using Globus. Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio State University) . The Problem:. High luminosity experiments need large MC sample (Belle,BaBar require hundreds of millions of MC events)
E N D
Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio State University) CHEP2003
The Problem: • High luminosity experiments need large MC sample (Belle,BaBar require hundreds of millions of MC events) • Massive computing power needed (farms of Linux machines) • Farms are typically geographically distributed CLEO two sites DELPHI five sites BaBar two dozen sites (US and Europe) Belle eight sites CHEP2003
Hardware alone is not sufficient: • Hardware, system level software maintenance • Experiment specific MC software setup • MC production • Job submission • Job monitoring (rerun failed jobs) • Data transfer • Coordination CHEP2003
Is there another way? • Reduced manpower requirements • More efficient coordination • Our approach • Select one of the steps in the MC production chain • MC Production • Centralize operations • Remote submission and monitoring • Evaluate GRID tools. Can they help with MC production? • Globus toolkit CHEP2003
OSU MC Production Farm • 27 dual Athlon nodes 1U • 1 dual Athlon server 4U • 840GB disk in RAID • OpenPBS batch system • File/batch queue server • 600-700k MC events/day CHEP2003
Globus Toolkit • Globus • Secure access • Certificates for client and server • Remote command execution system • We observed significant overhead • few seconds for single command • Integrated tools • e.g. GRIDftp • Installation at Ohio State • Globus 2.2.4 on dedicated server • Separate batch queue system for testing • No Resource Broker • Farm configuration details hidden • Loss of dynamic configurability but much simpler CHEP2003
MC production I : Job submission • Typical input information : • (MC software release), run range, #events … • To do : • build MC jobs and submit them • Choose on option: • One Globus command starts whole run range production • many (thousands) of local jobs • still need local script • One Globus command starts a single MC production job • Too slow • Submit all production runs at once • Only submit enough runs to fill queue • Re-submitted jobs proceed faster CHEP2003
MC production II : Job monitoring • Job Status (“qstat”) • Use local script to monitor log files • Resubmit crashed jobs locally • Monitor through Globus (remotely) • Speed? • Data Quality Monitoring • check physics histograms • not always done during production CHEP2003
MC production III : Data transfer • Easy if MC output is in file format • GridFTP … • Can be complicated otherwise • Example would be writing MC into a database • Remote or local file management? • Limited disk space -> delete generated MC • Log files CHEP2003
Conclusion • MC production for a high luminosity experiment requires significant hardware and manpower resources. • GRID tools can help to centralize this effort. • Simple test show that remote operation of MC farms is possible • Relatively easy to setup • Globus framework (secure access, remote command execution) • Local scripts for job submission, monitoring • Still, significant software infrastructure (“local scripts” required. • Other parts of the MC production chain need to be addressed before this becomes a realistic option. • Remote MC software installation and version management CHEP2003