150 likes | 370 Views
London Tier 2. Status Report GridPP 12, Brunel, 1 st February 2005 Owen Maroney. LT2 Sites. Brunel University Imperial College London (including London e-Science Centre) Queen Mary University of London Royal Holloway University of London University College London. LT2 Management.
E N D
London Tier 2 Status Report GridPP 12, Brunel, 1st February 2005 Owen Maroney
LT2 Sites • Brunel University • Imperial College London • (including London e-Science Centre) • Queen Mary University of London • Royal Holloway University of London • University College London GridPP 12: London Tier 2 Status
LT2 Management • Management board had first meeting on 3rd December 2004 • Next meeting 9th March 2005 • Members of management board: • Brunel: Paul Kyberd • IC: John Darlington (& Steve McGough) • RHUL: Michael Green • QMUL: Alex Martin • UCL: Ben Waugh • Chair: David Colling • Secretary: Owen Maroney GridPP 12: London Tier 2 Status
Brunel • 1 WN PBS @ LCG-2_2_0 • R-GMA installed but not APEL • In process of adding 60 WN’s • Issues with private networking attempted to resolve with LCG-2_2_0 • Will now proceed directly to LCG-2_3 • Investigating installation of SL on nodes • If goes well will use YAIM • If goes badly will use RH7.3 LCFG GridPP 12: London Tier 2 Status
Imperial College London • 66 CPU PBS HEP farm @ LCG-2_2_0 • APEL installed • Upgrading to LCG-2_3_0 (this week!) • Will still use RH7.3 LFCGng • HEP computing undergoing re-organisation • LCG nodes will be incorporated into SGE cluster, and made available to LCG (dependancy on LeSC SGE integration) • Will re-install as RHEL OS at that time. • London e-Science Centre • Problems over internal re-organisation • SGE farm, 64bit RHEL • Problems with default installation tool (APT) supplied by LCG • Also LCG-2_3 not supported on 64bit systems • Working on deploying LCG-2_3 on 32bit frontend nodes using YUM and RHEL • Tarball install on WN. Hope this is binary compatible! • Then need to work on SGE information provider GridPP 12: London Tier 2 Status
Queen Mary • 320 CPU Torque farm • OS is Fedora 2 • Currently running LCG-2_1_1 on frontend, LCG-2_0_0 on WN. • More up-to-date versions of LCG were not binary compatible with Fedora • Trinity College Dublin have recently provided Fedora port of LCG-2_2_0 and are working on port of LCG-2_3_0 • Will install LCG-2_3_0 frontend as SL3 machines, using yaim. • Install LCG-2_2_0 on Fedora WN • Upgrade to 2_3_0 on WN when TCD ready. GridPP 12: London Tier 2 Status
Little change: 148 CPU PBS farm APEL installed But no data reported! Very little manpower available Currently running LCG-2_2_0 Hoped to upgrade to LCG-2_3_0 during February Late breaking news…. RHUL PBS server hacked and taken offline…. Royal Holloway GridPP 12: London Tier 2 Status
University College London • UCL-HEP 20 CPU PBS farm @ LCG-2_2_0 • In process of upgrading to LCG-2_3_0 • Frontends SL3 using YAIM • WN stay on RH7.3 • UCL-CCC 88 CPU PBS farm @ LCG-2_2_0 • Running APEL • Upgrade to LCG-2_3_0 SL3 during February GridPP 12: London Tier 2 Status
Contribution to GridPP • Promised vs. Delivered : No change since GridPP11 *CPU count includes shared resources where CPU’s are not 100% dedicated to Grid/HEP kSI2K value takes this sharing into account GridPP 12: London Tier 2 Status
Usage by VO (APEL) GridPP 12: London Tier 2 Status
Usage by VO (Jobs) GridPP 12: London Tier 2 Status
Usage by VO (CPU) GridPP 12: London Tier 2 Status
Site Experiences (I) • Storage Elements are all ‘classic’ gridftp servers • Still waiting for deployment release of SRM solution • Problems with experiments use of Tier 2 Storage • Assumption: Tier 2 SE used as a import/export buffer for local farm • Input data staged in for jobs on farm • Output data staged out to long term storage at Tier 0/1 • Tier 2 not permanent storage: no backup! • In practice: Grid does not distinguish between SE’s. No automatic data migration tools. No SE “clean-up” tools. • All SE’s advertised as “Permanent” by default. • “Volatile” and “Durable” settings only appropriate for SRM? • SE’s fill up with data: become ‘read-only’ data servers • Some datafiles left on SE without entry in RLS: dead-space! • One VO can fill an SE blocking all other VO’s • Disk quota integration with information provider • Clean-up tools needed to deal with files older than “x” weeks? • Delete from SE and entry in RLS, if another copy exists • Migrate to different (nearest Tier 1?) SE if only copy • But site admin needs to be in all VO’s to do this! GridPP 12: London Tier 2 Status
Site Experiences (II) • Timing and release of LCG-2_3_0 still could have been improved • Information flow (pre-)release still a problem. • But at least a long upgrade period was allowed! • Structure of documentation changed • Generally an improvement • Some documents clearly not proof read before release • BUT: NO LT2 sites have managed to upgrade yet! • WHY NOT? • Lot’s of absence over Christmas/New Year period: not really 2 months • Perception that YAIM installation tool was not mature: lots of ‘bugs’ • Bugs fixed quickly, but still the temptation to let other sites ‘go first’ • YAIM did not originally handle separate CE and PBS server • Most common configuration in LT2! • Still need to schedule time against other constraints • Hardware support posts still not appointed • Sites still supported on unfunded ‘best-effort’ basis. • Uncertainty at sites if experiments were ready to use SL • New release schedule proposed by LCG Deployment at CERN should help • As should appointment of hardware support posts GridPP 12: London Tier 2 Status
Summary • Little change since GridPP11 • R-GMA and APEL installations • Additional resources (Brunel, LeSC) still to come online • Failure to upgrade to LCG-2_3_0 rapidly • Significant effort over Summer 2004 put a lot of resources into LCG • But manpower was coming from unfunded ‘best-effort’ • When term-time starts, much less effort available! • Maintenance manageable • Upgrades difficult • Major upgrades very difficult! • Use of resources in practice is turning out to be different to expectations! GridPP 12: London Tier 2 Status