150 likes | 326 Views
UK middleware deployment. Status & plans. GridPP27 - CERN 15 th September 2011. Jeremy Coles. Overview. Baselines and recommended versions The UK situation Moving away from gLite Issues and concerns Discussion?. UMD. Baselines (are we even up-to-date with gLite?).
E N D
UK middleware deployment Status & plans GridPP27 - CERN 15th September 2011 Jeremy Coles
Overview • Baselines and recommended versions • The UK situation • Moving away from gLite • Issues and concerns • Discussion? UMD
Baselines (are we even up-to-date with gLite?) There is a WLCG wiki page listing some useful information and it has been updated recently. In what follows the figure in (brakets) refers to the minimum recommended version listed here: https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions. Note that the recommended versions • do not necessarily reflect the latest versions of packages available in the gLite/UMD/EMI/... repositories • are versions fixing significant bugs or introducing important features. Versions newer than those indicated are assumed to be at least as good, unless otherwise indicated. Also note that support for versions changes in October: http://glite.cern.ch/support_calendar/ Information in the following slides was drawn from site inputs here: http://www.gridpp.ac.uk/wiki/Middleware_transition. Some sites usefully provided more details than others and may be highlighted more often but all those in the list are likely to be in a similar position!
gLite 3.1 LCG-CE (3.1.40-0) Brunel; QMUL*; ECDF*; Lancs(3.1.40); Liv(3.1.46);Cam(3.1 condor); T1; UCL WMS (3.1.31-0) – still best option IC; Glas(3.1.31); T1(3.1); Ox (for gridppnagios) BDII-site(3.2.11-1) Brunel; UCL UI ( ) Liv(3.1.45) VOMS() Glas(2.0.15)
gLite 3.2 WNs (3.2.11-1) Brunel; IC; QMUL (3.2.10 tarball); RHUL (3.2.11); UCL (3.2); Lancs (3.2.9); Liv (3.2.9); Man (3.2.10); ECDF (3.2.10); Glas (3.2.9 -11); Bham; Bris; Cam; Ox; RALPP; T1 (3.2.7) BDII-site (3.2.11-1) Brunel; IC; QMUL (probs->openldap2.4); RHUL (3.2.11); Lancs (3.2.10-1); Liv (3.2.11); Man (3.2.11-1); ECDF (3.2.11); Glas (3.2.9); Bham; Bris; Cam; Ox; RALPP (+VM); T1 (3.2.10-1) BDII-top (3.2.12-1) Man(3.2.11-1); T1(3.2.10-3)
gLite 3.2 cont CREAM CE(3.2.10-0 (1.6.5) ) IC(SGE issues*); QMUL (SGE issues); RHUL (3.2.10); UCL; Lancs(3.2.10); Liv(3.2.11); Man(3.2.10 & 3.2.11); ECDF(SGE issues); Glas(3.2.8-2); Bham; Bris; Cam; Ox; RALPP; T1(3.2.10 & 3.2.6) UI(3.2.10-1) Brunel; IC; RHUL; Lancs; Liv(3.2.10); Man(3.2.8); Glas(3.2.8); Bham; Bris; Cam; Ox; RALPP; T1(3.2.10) ARGUS(3.2.4-2 (1.2)) RHUL(3.2.4-2); Liv(3.2.4-2); Man(3.2.4-2); Bham; Ox; RALPP; T1 + Glasgow(SCAS) Glexec_wn(3.2.5-1) Brunel; Liv(3.2.4-1); Man; Glas; Bham; Ox; RALPP; T1(3.2.2-2) *SGE issues – see http://tinyurl.com/6gxp5lz. (eg. Deadlocks; CreamDB (and InnoDB) setup… timeouts (so change purge_interval in blah.config ); tomcat processes survive gLite restart)
gLite 3.2 cont.2 VOMS (1.9.19?) Glas FTS (2.2.4) T1(3.2.1-2) LFC(1.8.0-1) T1(3.2.7-2) VOBOX(3.2.11-0) T1(3.2.11); Bham Note: Frontier(3.2.4?) not covered Frontier/Squid Launchpad 2.7.STABLE9-3.7 ? Not covered. Not present: ARGUS/glexec: IC; QMUL; UCL; Lancs; ECDF -> relocatable install not yet available. Bris;Cam UI: QMUL; ECDF(T3) No entries for Durham or EFDA-JET
Storage SE_dpm_disk/mysql SL4 (1.8.0-2) ECDF; UCL SE_dpm_disk/mysql SL5 (1.8.0-1) Brunel; Bham; Shef; RHUL(1.8.1); Man(1.8.1); Lancs(1.7.4-7); Liv(1.7.4-7); Glas(1.8.0); Cam(1.7.4); Ox(1.7.?) dCache(1.9.5) IC(1.9.12-8); RALPP(1.9.5) Storm(1.5.6) Bris(1.3); QMUL(1.7) CASTOR(2.1.11-2) T1(2.1.10-0)
The last words of Mr Fix It – why move!? • Bug fixes • Security updates • Those maintaining the middleware have other communities to satisfy too meaning new functionality has to find a way in… • The underlying operating systems (hardware) evolve and the middleware has to be updated. “Life cycle” • RHEL4 to February 29, 2012 (SL4 2012-02-02) • RHEL5 to March 31, 2014 (SL5 2014-03-03+) • RHEL6 from November 10, 2010 • ref: http://www.scientificlinux.org/distributions • Other EDG EGEE SL3
(Expected) end of support For gLite 3.1: The LCG-CE (which means TORQUE_utils,SGE_utilsandLSF_utils and glite-CLUSTER) and WMS are fully supported until 31st October 2011. Security update for the FTS stops on 7th October. For gLite 3.2: Bug fixes and minor functionality updates of a certain priority continue until 31st October 2011.Data management services will be supported until April 2012. Security updates for most components (not ARGUS) carry on until 30th April 2012 (the time EMI-2 is released). Track the patches via http://tinyurl.com/gLitePatches. For LCG priorities and news check: https://twiki.cern.ch/twiki/bin/view/EGEE/LCGprioritiesgLite.
UMD/EMI • The Unified Middleware Distribution (UMD) is the integrated set of software components that EGI makes available from technology providers within the EGI Community. These components are packaged to provide an integrated offering for deployment on the EGI production infrastructure. • EMI-1 (Kebnekaise) released on 12 May 2011 • EGI early adopter (staged rollout) sites took up EMI-1 https://www.egi.eu/earlyAdopters/table • UMD-1 released July 2011 http://www.eu-emi.eu/emi-1-kebnekaise-updates
So “what next” advice • https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions. When moving towards EMI-1 based middleware the recommendation is to use the UMD repository. • gLite 3.2 to UMD 1.x: services should be migrated either at a convenient time, when the service is moved to new hardware, or when sites or users will benefit significantly. • gLite 3.1 to UMD 1.x: move from gLite-3.1 directly to UMD/EMI-1. If a required service hasn't passed the transition, it is advisable to wait for the service to pass the UMD validation. • New services: should use UMD if staged rollout is complete and the UMD repository has been updated. • services with little persistent state should move to UMD as soon as they move to new hardware or whenever a re-installation of the node is scheduled. Verify that your local fabric management and monitoring are aware of the path changes that come with the improved structure of EMI-1. • Storage and catalogue services that are in operation should not move to UMD in the near future. If taking releases directly from providers check with them for advice. • gLite-3.2 clients for the workload management and data management are compatible with the UMD versions of services. Some problems have been spotted on the EMI-1 WN and UI related to SAM tests and lcg_util clients. Need more experiment usage/feedback. Given the sensitivity to correct library and binary paths and the complexity of the configuration, the changes to those locations in EMI-1 might have an impact. Until it has been verified that there aren't problems, sites should stay with the gLite-3.2 WN and UI.
UK EMI/UMD 1.0 deployment progress so far ARGUS: Brunel; ECDF(soon) CE: Brunel; Lancs(soon); Shef(soon); Glasgow(in test); Ox(in test); RALPP(UMD1.1) WMS: IC (test) DPM: ECDF(soon); Glas(test) Storm: QMUL(1.7.0/1.7.1) Cluster publisher: RALPP(UMD1.1) ARC CE: Glas(test) Do we have comments from these sites on their experiences? Staged rollout: https://www.egi.eu/earlyAdopters/table Global CREAM 6th Sept
What next? • “We’ll move when the benefits outweigh the risks” • “What are the plans for the experiment nodes?” • “Will there be access to installation and maintenance recipes from the developers?” • We “… understand the UMD WN has problems!” • The site plans to “virtualize additional grid services” • Example questions: • What is the main driver for sites? • When will the SL6 middleware be released? • What currently ‘stops’ us tracking the baseline? • When is the best time to transition? (Accounting is now continuous) • Can we really plan in any detail when EMI/LCG/experiment plans are not clear!?
Summary & ‘strategy’ • Confirm UMD validations • Where possible install all new hardware with UMD • Recheck experiment plans • Storage and catalogues – test but do not move existing services yet. Upgrades to 1.8 should happen. Storage sort of decoupled. • Stay with gLite-3.2 WN and UI until WLCG verification. Migrate resources in stages. • Those involved with early adoption lead the migration Current spread EOL gLite Stability of releases SL4->SL5 -> SL6 Experience with UMD - Experiments & sites Keeping site available