10 likes | 65 Views
JIM Deployment for the CDF Experiment.
E N D
JIM Deployment for the CDF Experiment M. Burgon-Lyon1, A. Baranowski2, V. Bartsch3,S. Belforte4, G. Garzoglio2, R. Herber2, R. Illingworth2, R. Kennedy2, U. Kerzel5, A. Kreymer2, M. Leslie3, L. Loebel-Carpenter2, A. Lyon2, W. Merrit2, F. Ratnikov6, R. St. Denis1, A. Sill7, S. Stonjek2,3, I. Terekhov2, J. Trumbo2, S. Veseli2, S. White2 1University of Glasgow, 2Fermi National Accelerator Laboratory, 3University of Oxford, 4Istituto Nazionale di Fisica Nucleare, 5Universität Karlsruhe, 6Rutgers University, 7Texas Tech University Introduction The CDF (Collider Detector at Fermilab) Experiment is in the process of distributing computing infrastructure to numerous sites worldwide. The initial target of 25% of computer processing to be offsite by June 2004 has been achieved. JIM deployment will help achieve the second milestone of 50%, an estimated 4.5THz by June 2005. The software used for this task is comprised of: a mature data handling system called SAM (Sequential Access to data via Metadata); DCAF (Decentralised CDF Analysis Farm) for local job queuing and execution; and JIM (Job and Information Management), used to collect and distribute jobs to SAM stations and DCAF farms. DCAF Client JIM Client JIM Submission JIM Execution & Monitoring JIM Execution & Monitoring JIM Execution & Monitoring JIM Execution & Monitoring JIM Execution & Monitoring JIM Execution & Monitoring SAM DCAF SAM DCAF SAM DCAF SAM SAM SAM JIM Web Pages The screen shots above show: job submissions to the CDF Oxford JIM site over the past two weeks; The main JIM monitoring page; A section from the JIM installation manual. The job monitoring pages enable users to download the output of their completed job using a web browser. For problem resolution these pages are used in conjunction with SAM TV, a web monitoring tool displaying information on each file, project and site used by the SAM data handling system. SAM Database (FNAL) JIM Broker Components of CDF Grid The diagram above shows how elements of the CDF Grid fit together. Users currently submit jobs from their terminal to DCAF, which uses SAM to transfer files. Once JIM is fully deployed, users will be encouraged to submit their jobs through the JIM client software, though the old interface may be used for JIM submissions. JIM client passes the job to the area submission site for queuing. After communicating with the broker, the job will be sent to an execution site, which may have a DCAF. The job will be executed, using SAM to transfer files, and DCAF or the local batch system (e.g. PBS) to execute the job. Simplification of the JIM installation and upgrade procedure SAM station installations have been vastly simplified by the creation of a script. The once timely and difficult process can now be completed within a couple of hours, largely unattended. Simplifying the installation procedure was a crucial step to allow the quick roll-out of SAM, a critical element of the CDF Grid software set-up. A similar script is currently under development to provide the same ease of installation for JIM coupled with the efforts of the developers to reduce product tailoring requirements. A new product that installs and tailors many of the JIM components has been developed. Monte Carlo Production Earlier this year the JIM development team focused on Monte Carlo (MC) production for the D0 experiment. The D0 success rate for MC is now over 99%. A script that makes a tarball from the CDF software environment has been used to run CDF MC on D0 Challenges and future work The most challenging step has been the tailoring of local batch systems. Individual execution sites have been tailored successfully with expert help, however this is not sufficiently easy to reproduce for widespread deployment. Investigations into Grid3 are underway and the possibility of using a combination of JIM and Grid3 components is under consideration. computing facilities at Wisconsin, first manually and then as a JIM submission. Thus the CDF software environment can be transferred around the grid, preventing problems with differing code versions on execution sites. This ensures that shared resources can be used fully for CDF jobs without application version issues.