40 likes | 109 Views
Observations from LCG experts on managing job workflows, resource interfaces, and network challenges in LCG systems. Discusses features of LCG-2 job management and common problems faced. Recommendations and insights for job submission processes.
E N D
Observations from LCG David Smith, Maarten Litmaath For LCG-GD, CERN CRM Interface discussion, Rome INFN 17-18 Feb 2005
For submission LCG uses: • ‘Workload Management’ software (from EDG project) • Condor-G • Globus GRAM • User submits to a submission service which in turn a client that sends submission requests to the resource interfaces themselves • CRM Interface periodically reports back status of all managed jobs to clients • Problems: • Scaling problems – primarily on the resource interfaces • Network requirements can cause deployment complications • Initially the CRM interface used a fairly simple query batch system for state changes – have had loading problems • Some problems are higher level job management issues but may have implications for CRM interfaces
Some features of job management in LCG-2: • Selection of job destination according to requirements expressed in a JDL • Optional job resubmission in case of error • Supply of input files • Retrieval of nominated output files • Best resource selection • Submit a job to a chosen destination • Use metric to measure response time • Once a resource is chosen it stays there until it completes (or fails) • Network connectivity • All the LCG-2 service machine require inbound connectivity • But several applications use port ranges e.g. GRAM to provide callback addresses This is has been a source of deployment problems
Some users have arrived at broadly similar solutions to common problems: • The submission of a job which contacts an application specific server to request a task. • This may improve the distribution of tasks and the response time