230 likes | 341 Views
WLCG Middleware Status Report. 16 th February, 2009. Overview. The three WLCG middleware stacks ARC (NDGF) Most sites in northern Europe ~ 10 % of WLCG CPUs OSG Most North American sites > 25 % of WLCG CPUs gLite Used by the EGEE infrastructure Summary and Issues
E N D
WLCG Middleware Status Report 16th February, 2009
Overview • The three WLCG middleware stacks • ARC (NDGF) • Most sites in northern Europe • ~ 10 % of WLCG CPUs • OSG • Most North American sites • > 25 % of WLCG CPUs • gLite • Used by the EGEE infrastructure • Summary and Issues • s have been added by me
Michael Grønager Project Director, NDGF LCG-LHCC Mini Review CERN, Geneva, February 16th 2009 ARC middleware status
WLCG sites with ARC Tier-1: NDGF Tier-2s: Finnish Norwegian Slovenian Swedish Tier-3s: Danish Norwegian Swedish Swiss
ARC Status – Current Version Current stable release 0.6.5 - “Earth Quake” (December) Improved cache scalability added ARC supports caching of files used by several jobs. This boosts performance for e.g. Analysis, but scalability issues were detected for large clusters. ARC0.6.5 enables to split this load to several file servers Optional patch for replacing Globus MDS by new solution: EGIIS, which includes BDII - This is deployed at most NDGF related sites Minor issue with LFC fixed
ARC Status – Next Version Next stable release “Fastelavn”* (February) Further scalability improvements included: Support for sharing the load on multiple file system servers Support for distributing multiple up and down loaders on multiple machines - these new features makes ARC ready for running production on large +5000 core machines MDS fully replaced by EGIIS and BDII Optional publishing of GLUE1.3 along with ARC schema (Currently in testing e.g. at NDGF-BENEDICT-T3) KnowARC features stating to appear: Optional Module for OGF BES submission, based on new and more modular code base * aka Mardi Gras
ARC Future The production release of ARC (sometimes called ARC classic) will continue to evolve More and more components will be integrated from e.g. the KnowARC project. The KnowARC development adds new service interfaces that adhere to standards like GLUE2, BES and JSDL These will be incorporated into the production rel. of ARC. There will be no “migrations” but a graduate incorporation of the novel components into the stable branch, like OGF BES in “Fastelavn” ARC components will be included in UMD, and ARC now supports building on ETICS.
Staus of OSG Middleware for WLCG Ruth Pordes, OSG Executive Director Alain Roy, OSG Software Coordinator LHCC MiniReview Feb 16th 2009
OSG Middleware Scope & Status • OSG provides packages for Compute Elements, Storage Elements, VO managers, Worker-Node Client and User Client. • OSG middleware is tested to allow Applications to interoperate across OSG and EGEE (and NDGF). • Thus WLCG users are able to transparently use the multiple grids. • OSG V1.0 stable for during data taking, cosmic runs, ramped up simulation production and analysis during second half of 2008. LHC Mini-Review, Feb 2009
Progress over last 6 months • Bestman/xrootd Storage Elements now installed at several Tier-2. Bestman + nfs/luster/hadoop) installed on Tier-3s and a couple of Tier-2s. • Addition of WLCG Client utilities (LFC, lcg_utils) enables use OSG Client with no need to install both the OSG and EGEE client packages. • Roll-out of joint gLite/VO services/ GLobus common interfaces and protocols in security components. Significant testing effort across the projects including SCAS/LCAS, glexec, GUMS. • EGEE packages continue to be included in OSG s/w stack: VOMS/VOMS-Admin glexec edg-mkgridmap LHC Mini-Review, Feb 2009
Software Tools Group • Part of new OSG project structure in FY09. Led by Alain Roy and Mine Altunay. • Central hub for all software projects/plans. • Aims to ensure stakeholder’s needs are met from planning to deployment. • Single point of contact for software providers. • Inputs: • User/VO/Site requirements • Software providers timelines/plans • Outputs: • Plans for software stack evolution • Point of contact with the EGEE EMT and gLite. LHC Mini-Review, Feb 209
External Software Provision OSG,US ATLAS,US CMS working closely with software development groups for • Timely deployment of new versions of dCache and Bestman for WLCG needs. • Evolution of the identity systems (looking at backends to Shib, Kerberos) and compatability. • Condor changes to support scalability in number of jobs. • Internet2/ESNEt for deployment of perfsonar network monitoring tools. • Gratia accounting, OIM operations database & tools. • Use of xrootd. OSG & US ATLAS working on generalization of PANDA for other users. OSG and US CMS working on generalization of Glide-in WMS for other users. LHC Mini-Review, Feb 2009
OSG support for gLite underpinnings • We continue to supply a subset of the VDT as RPMs: • Condor • Globus • MyProxy • GSI OpenSSH • GPT LHC Mini-Review, Feb 2009
Current Work • Major focus is on better support for incremental upgrades, roll-back, forward compatability. • Includes a redesign of the packaging to improve native packaging • Debian 5 support for LIGO • Software upgrades only if really needed. • Not looking yet at Globus 4.2 • Interoperability: • Testing of compatability of CREAM with OSG Client stack • Ensure availablity, reliability, installed capacity, accounting software and sevices all report correctly from OSG to EGEE and to WLCG. LHC Mini-Review, Feb 2009
Currently Supported Platforms • Linux (32 & 64 bit) • RHEL 3 • RHEL 4 • RHEL 5 • Debian 4 • ROCKS 3 • SuSE Linux 9 (just 64-bit) • Scientific Linux 3 • Scientific Linux 4 • Mac OS 10.4 (client only) • AIX 5.3 (limited support) LHC Mini-Review, Feb 2009
Concerns (nothing new) Need to continue to ensure modularity/separation of EGEE services and WLCG, to enable OSG to effectively contribute and peer. Need WLCG to work with OSG middleware activities as closely as with the EGEE middleware activities. We are all trying hard here! Interoperability activities will become more challenging in an EGI era where the number of independent s/w stacks may grow or diverge. OSG committed to work with EGI partners in these areas. OSG pleased to contribute to the Infrastructure Policy Group. These are pragmatic activities for understanding commonalities and differences. OSG remains nervous at the potential of OGF standards being really successful. LHC Mini-Review, Feb 2009
gLite • The current release is gLite 3.1 • It is updated almost every week ( 30+ updates/year) • Its purpose is to provide a stable platform for production grid usage • It covers: • Data Management • Workload Management • Information System • AAA • Distributed lifecycle • Tools and formal processes • Links teams and tasks • Monitor progress • Large code base (~1.6 Million lines of code)
Most Active Areas • Workload management ( access to computing resources) • Support for multiuser pilot jobs • Used by experiment’s frameworks: Dirac, Panda, ALIEN • Move to next OS platform: SL5 • Continuous evolution of other components • FTS, DPM, LFC……..
Workload management • LCG-RB has been phased out • WMS-3.1 SL4 major update (accumulates patches from > 8 months) • Certified • Will be released to production in the next weeks • Can handle >30K jobs/day • Better support for bulk submission • Almost ready to support CREAM-CE • ICE integrated, but needs more testing • Support for multiuser pilot jobs • SCAS and glexec on WNs are late • Now under stress testing • Still issues with memory management • Fails at 0.03% rate Not good enough for an authorization system • Scales to > 10 Hz ( ok for most sites) • Will start a pilot service during the next week
Computing Resource Access (CE) • In production at all EGEE sites: LCG-CE • Legacy service • Introduced end 2002 • Has been improved over the years to handle 50 users and 4K jobs • This is good enough for production use • Might be problematic for analysis tasks • CREAM-CE • New architecture • Web Service interface, supports BES standard • Parameter passing to batch systems • Scalability!!! • First version has been released to production 8 months ago • 13 instances in production + 13 in PPS • Used by ALICE • New version with many bug fixes in final certification state
Scientific Linux 5 • SL5 Worker Node pilot phase has come to an end • Experiments encountered no major problems • New formal release is being prepared • Will arrive in production soon • Other activities: • Multi compiler support • Support for multiple versions • Improved rollback support • Long term: • Support of new information system schema ( GLUE-2) • Introduction of first components of new EGEE Authorization Framework • Policy management system
Issues and Outlook • EGEE-III ends early 2010 • The new environment for middleware support is under discussion • Less CERN involvement in integration and release management • Will the new entities be up and running in time? • gLite Consortium • Discussions on formal agreement are taking place • Required to organize support for gLite middleware • Unified Middleware Distribution is forming • ARC + gLite + UNICORE • Move towards standards based middleware • WLCG has a wider scope • Maintaining interoperability might become more difficult
Summary • All 3 middleware stacks provide stable production environments • And are aware of scalability issues and addressed most of them • All 3 stacks interoperate with each other • And work on improving interoperability and interoperation • OSG supports actively supports pilot jobs (glexec/Gums) • gLite will soon ( glexec/SCAS) • Middleware stacks still evolve • successfullyintroducedmajorchanges to theproductionsystem • Withoutinterruptingtheservice • Thetransitionfrom EGEE-III to EGI, UMD and the gLite consortium will bechallenging