120 likes | 259 Views
Tier1A Status. Martin Bly 28 April 2003. CPU Farm. Older hardware: 108 dual processors (450, 600 and 1GHz) 156 dual processor 1400MHz PIII Recent delivery: 80 dual 2.66GHz P4 Xeon 533MHz FSB, 2GB memory Next delivery expected in the summer. Operating Systems. Operating Systems:
E N D
Tier1A Status Martin Bly 28 April 2003
CPU Farm • Older hardware: • 108 dual processors (450, 600 and 1GHz) • 156 dual processor 1400MHz PIII • Recent delivery: • 80 dual 2.66GHz P4 Xeon • 533MHz FSB, 2GB memory • Next delivery expected in the summer
Operating Systems • Operating Systems: • Redhat 6.2 service will close in May • Redhat 7.2 service has been in production for Babar for 6 months. • New Redhat 7.3 service now available for LHC/other experiments • Increasing demands for security updates becoming problematic.
Disk Farm (last Year) • Last year – 26 servers, each with 2 external RAID arrays - 1.7TB disk per server: • Excellent performance, well balanced system • Problems with a bad batch of Maxtor drives – many failures and high error rate – all 620 drives now replaced by Maxtor. • Still outstanding problems with Accusys controller failing to eject bad drives from RAID set.
Disk Farm (this year) • Recent upgrade to disk farm. • 11 dual P4 servers (with PCIx), each with 2 Infortrend IFT-6300 arrays • 12 Maxtor 200GB Diamondmax Plus 9 drives per array. • Not yet in production – but a few snags: • Original tendered Maxtor: Maxline Plus II drive was found not to exist. • Infortrend array has 2TB limit per RAID set – some (10%) wasted space! • Nick White (N.G.H.White@rl.ac.uk) for more info
New Projects • Basic fabric performance monitoring (ganglia) • Resource CPU accounting (based on PBS accounts/mysql) • New CA in production • New batch scheduler (MAUI) • Deploy new helpdesk (May)
Ganglia Monitoring • Urgently needed live performance and utilisation monitoring • RAL Ganglia Monitoring (live) • RAL Ganglia Monitoring (Static) • Scalable solution based on multicast • Very rapidly deployable - reasonable support on all Tier1A Hardware • See: http://ganglia.sourceforge.net/
PBS Accounting Software • Need to keep track of system CPU and disk usage. • Home grown PBS accounting package (Derek Ross): • Upload PBS and disk stats into MYSQL • Process with perl DBI script • Serve via Apache • http://www.gridpp.rl.ac.uk/stats • Contact Derek (D.Ross@rl.ac.uk) for more info.
MAUI/PBS • Maui scheduler has been in production for last 3 months. • Allows extremely flexible scheduling with many features. But …. • Not all of it works – we have done much work with developers for fixes. • Major problem – MAUI schedules on wall clock time – not CPU time. Had to bodge it!!
New Helpdesk Software • Old helpdesk mail based/unfriendly. • With additional staff, urgently need to deploy new solution. • Expect new system to be based on free software – probably Request Tracker • Hope that deployed system will also meet needs of Testbed and may also satisfy Tier 2 sites. • Expect deployment by end of May. • http://requestracker.gridpp.rl.ac.uk/ (Static)
Outstanding Issues/worries • We have to run many distinct services. For example, FERMI Linux, RH 6.2/7.2/7.3, EDG testbeds, LCG … • Farm management is getting very complex. We need better tools and automation. • Security Is becoming a big concern again.