130 likes | 149 Views
This paper discusses the limitations of current job monitoring facilities in GRID systems and proposes a solution to provide specific real-time information for complex applications. It explores the Impala/McRunJob solution, the GRID and application job monitoring, and highlights the importance of application job monitoring through examples like MC event simulation for LHC. The implementation of MySQL server for job monitoring and the challenges of IO buffering are also addressed. The conclusions emphasize the need for enhanced security measures and future plans to re-implement the monitoring scheme using OGSA/Globus3 technologies.
E N D
Problem of Application Job Monitoring in GRID Systems V. Kalyaev (kalyaev@theory.sinp.msu.ru), A. Kryukov (kryukov@theory.sinp.msu.ru) SINP MSU, Moscow A.Kryukov NEC-2003, Varna, 15-20 September
Outlook • Introduction • Impala/McRunJob solution • GRID and Application Job Monitoring • Conclusion A.Kryukov NEC-2003, Varna, 15-20 September
Introduction: Job Monitoring in GRID In the GRID there are some monitoring facilities. However, these facilities just fixed general status of jobs: • Scheduled • Running • Canceled • Finished It is completely insufficient for complex applications. A.Kryukov NEC-2003, Varna, 15-20 September
What is Application Job Monitoring? Let us consider very simple example: CMSIM. Summary information of the program is a number of generated events. The knowledge of this number can be used by user for diagnostic of the process of generation of events. So, it is very important to supply user some specific information from application in real-time mode. A.Kryukov NEC-2003, Varna, 15-20 September
MC Event Simulation for LHC(on CMS example) • Simulation of physical events • Pythia • Detector simulation • GEANT-3/4 • Digitization (overlap, noise) • ORCA • Reconstruction • ORCA A.Kryukov NEC-2003, Varna, 15-20 September
MySQL server JOB MySQL client JOB MySQL client Impala/McRunJob scheme • Insecurity. • User have to know where information is. • Predefine type of monitoring information. A.Kryukov NEC-2003, Varna, 15-20 September
MC event generation with GRID GRIDMiddleWare PC farm RB PC farm A.Kryukov NEC-2003, Varna, 15-20 September
MySQL server MC event generation with GRID GRIDMiddleWare PC farm RB PC farm A.Kryukov NEC-2003, Varna, 15-20 September
Application Job Monitoring Scheme UI WN RB CE atm-user-register atm-job-wrapper atm-job-register Original job atm-jdl-parser edg-job-submit monitor ATM DB atm-job-register-c Allowed user DB atm-register-s Allowed job DB atm-user-register-c Job status DB atm-user-register-s atm-job-monitor-s A.Kryukov NEC-2003, Varna, 15-20 September
Job status DB Authentication Application Job Monitoring: Web Interface Web Server Web Client A.Kryukov NEC-2003, Varna, 15-20 September
JDL Example Executable = “atm-wrapper”; StdOutput = “aliroot.out”; StdError = “aliroot.err”; InputSandbox = {“atm-wrapper”,“start_aliroot2.sh”,” rootrc”,”grun2.C”,”Confiig.C”}; OutputSandbox = {“aliroot.err”,”alirot.out”,”galice.root”}; RetryCount = 10; Arguments = -id=123 –password=567 –site=test.domain /bin/sh start_aliroot.sh 3.02.04 3.07.01; Requirements = Member(other.RunTimeEnvironment,”ALICE-3.07.01”); The old JDL file is converted to new one automatically. A.Kryukov NEC-2003, Varna, 15-20 September
Problems of IO Buffering • If a program send to standard output something like “completed 20 from 200 events”, then output buffer will complete after 20 hours of work. • modify code to invoke IO buffer flush • forbid use of IO buffer. A.Kryukov NEC-2003, Varna, 15-20 September
Conclusions • Security • GSI • User can monitor his jobs only. • Monitoring information • In current realization – standard output. • There is Web interface for authorize access to application job status • We plan to re-implement the scheme by using OGSA/Globus3. A.Kryukov NEC-2003, Varna, 15-20 September