Grid Middleware & TOOLS session summary

Ian Bird, CERN Rob Gardner, University of Chicago Grid Middleware & TOOLS session summary

Introduction • 82 abstracts submitted, • 36 oral presentations (7 sessions), 44 posters, [2 withdrawn] • Categories: cover a broad range • Experiment experiences • Data Management • Workload Management • Monitoring, Information, Accounting • Security & Authorization • Fabric & Deployment

Experiment experiences

Grid reliability – Pablo Saiz

Grid efficiency during CMS data challenges – Oliver Gutsche

D0 – reprocessing on OSG Amber Boehnlein Common theme: making sites reliable requires debugging sites/systems one by one

Job agents – pilot jobs Monitoring Alien grid environment - Pablo Saiz

Data management

SRM v2.2 – Flavia Donno 18 month effort to agree, build, test, deploy new version

dCache – one of several MSS systems • Patrick Fuhrmann – overview of dCache developments • - Gerd Behrmann – distributed instance for NDGF

LCG Data management tools LFC, DPM, FTS – Markus Schulz

Examples of services that consider deployment & management issues

CORAL – distributed database access Dirk Duellmann

Workload management

Pilot jobs?

Pilot jobs – and variants: Such a good idea – everyone wants one …

Stuart Paterson – optimizations in DIRAC Marianne Bargiotti Integrity checking in DIRAC

Pilots can move intelligence into the jobPaul Nilsson – Panda experience

gLite WMS developments Marco Cecchi

Igor Sfiligoi – comparison of WMS CHEP'07, Victoria

Monitoring, information, etc.

Experiment dashboards Julia Andreeva Monitoring from VO/user perspective

GridICE – monitoring Guido Cuscela Permits different views of running jobs

James Casey Advances in monitoring of grid services

Stephen Burke – 6 years experience with GLUE schema Martin Flechl – details on integration of information systems

Security, authorization, etc

David Groep - glExec Supporting pilot jobs

Fabric & Deployment

Greig Cowan Using DPM over the WAN

Addressing failover for core operations services – Alfredo Pagano Various strategies

Platform LSF – Robert Stober Integrating heterogeneous clusters

Observations • Solutions exist for most needs now – • Certainly not all perfect yet • Experiment layer relatively deep • Plethora of workload management systems • Not so many for data management … • Service management issues starting to be addressed by some services (DPM, LFC, FTS, Gridsite, Coral) • But in general little thought on how site managers should manage services • Interoperability / interoperation

Observations • Workload management • Everyone wants pilot (aka glidein) jobs (and everyone has written a system to submit them) • Commonality – to reach a reliable service experiments need to systematically debug sites being used: • D0, CMS, dashboards, … • Sophisticated systems to monitor, debug, recover • Dirac, dashboards, grid service monitoring, etc., • To improve reliability and help debug the system

Grid Middleware & TOOLS session summary