70 likes | 205 Views
Tier 1 (Grid) Services. Ian Collier GridPP Review June 20 th 2012. Past Year. EMI Updates Migration off gLite to EMI(2) Formally engaged with Staged Rollout & Early Adopters process Virtualisation (Nearly) all services on (Hyper-V) virtualised platform
E N D
Tier 1 (Grid) Services Ian Collier GridPP Review June 20th 2012
Past Year • EMI Updates • Migration off gLite to EMI(2) • Formally engaged with Staged Rollout & Early Adopters process • Virtualisation • (Nearly) all services on (Hyper-V) virtualised platform • Much easier to set up & manage than collection of bare metal • Quick recovery after power events notable • CVMFS Stratum 0 for non-LHC Vos • Actively used now • Responses have been very positive • WeNMR latest, enthusiastic, users
Operational Issues • Batch start rates • Limited • Have been testing alternatives to torque/maui • Condor & SLURM frontrunners • Condor looking very good • Have been hitting scaling limits with SLURM • As side effect also looking at ARC CE • Ne step: test with half of the retiring 2007 WNs on SL6 with new Condor & ARC CE • (cvmfs) Job timeout failures • Low but persistent rate (~5% varying) • Have been testing 2.1.x client • Found much worse problems • Investigation continuing
Coming Year • Continue Updates • Starting on EMI-3 • Further Staged Rollout & Early adoption • Complete SL6 migrations • Virtualisation • Shared storage just coming on-line • Investigations to make full use of that • Replication between buildings, etc. • Distribute services • Between R89 & Atlas ‘outpost’ as it develops • ie BDIIs, FTS’, CEs, etc., spread between 2 buildings • CVMFS Stratum 0 • Erasmus project to build web interface for SW upload • Negotiating for sites to replicate • Reference architecture may be different from WLCG • EGI have picked coordinating network of repositories & replicas • Nikhef & OSG, maybe CERN
Configuration Management • Quattor working well • Although we benefit from QWG, we could do so more • Made some ‘expedient’ choices early on – ready to revisit now • Quattor community more active recently • No longer held back by backward compatibility for CERN • Migration to Aquilon • Opportunity to refactor • Will allow more automation • Will improve workflows. • Of course track other activities& developments
Cloud • SCD Cloud • Concept well proven • ~300 cores, 90-95% use • Adding half of 2007 WNs • Member of staff (not rotating graduate) in plan • Storage • Have small ceph cluster to deploy • Image store • Object (S3) store - service • Active use cases: • Internal (Tier1 & SCT) development & testbeds • High level of user trust • Developing Use cases • Other users in STFC (ISIS, RAL Space) • EGI, GridPP & WLCG Cloud work
Looking to Future Starting to think about: • Post GridPP 4 • Cloud is great for ‘disposable’ resources • What would it take for us to consider it to be solid enough for services now on Hyper-V? • What about layer (& interface) in batch farm?