180 likes | 298 Views
Tier1A Status. Andrew Sansum 30 January 2003. Overview. Systems Staff Projects. Lots of Services. TESTBEDS. CPU FARM. CDF. Babar Suns. AFS. DISK FARM. Datastore. Core Services. Support Systems. Lots of Operating Systems. Production Farm Redhat 6.2 (Close to end of life)
E N D
Tier1A Status Andrew Sansum 30 January 2003
Overview • Systems • Staff • Projects
Lots of Services TESTBEDS CPU FARM CDF Babar Suns AFS DISK FARM Datastore Core Services Support Systems
Lots of Operating Systems • Production Farm • Redhat 6.2 (Close to end of life) • Redhat 7.2 (In production/ Babar) • Redhat 7.3 (close to Trial Service: For LHC) • CDF Service • Redhat 7.1 (Kerberised Fermi Distribution) • Redhat 7.3 (Possible Future release) • Solaris Service • Solaris 2.6/Solaris 8 • EDG Testbed(s) - Redhat 6.2 -> Redhat 7.3
Lots of EDG Testbeds! • Production Testbed (CE, SE, 3*WN+NM) • Development Testbed (CE, SE, 1*WN) • RGMA Testbed (CE, SE, WN and RB) • WP5 SE • WP3/WP5 development systems • EDG UI • CE for REDhat 7.2 service
Babar Tier1A SAMGrid Lots of Grid Testbeds!
New Hardware • Disk • Expect 40TB • Continue with existing IDE technology, but different manufacturer. • CPU • Expect 100 CPUs • Move to Pentium 4 or possible AMD
Some New Staff Users Experiment Support Staff (RAL and elsewhere) GridPP Staff: Traylen, Radden, Bly ESC/PPD System Staff: Wheeler, White, Sansum, Saunders, Ross, Folkes, Strong Management: Kelsey, Gordon, Sansum, ... BITD Support: Networking, Operations, User Reg, AFS
Lots of New Projects • Basic fabric performance monitoring (ganglia) • Resource CPU accounting (based on PBS accounts/mysql) • New CA in production • New batch scheduler (MAUI) • Deploy new helpdesk (end March) • Network Performance tests (CERN/Bristol - also maybe WP7) • Get ready for LCG (February deployment?)
Ganglia Monitoring • Urgently needed live performance and utilisation monitoring • RAL Ganglia Monitoring (live) • RAL Ganglia Monitoring (Static) • Scalable solution based on multicast • Very rapidly deployable - reasonable support on all Tier1A Hardware • See: http://ganglia.sourceforge.net/
New CA Deployed • Now fully deployed by E-Science Centre (Jens+Alastair Mills) • In use in UK core GRID • Several PP have RA’s defined • Approved by EDG - not yet in distribution. • Once in EDG - termination date for old CA will be set.
New Scheduler (MAUI) • With Redhat 7.2 now using MAUI Scheduler over PBS • Some problems with MAUI scheduling on wallclock time - now corrected. • Testing algorithms, but essentially have a range of strategies we can apply. • Will make changes to queue structure in due course
New Helpdesk Software • Old helpdesk (Remedy) - mail based, unfriendly. • With additional staff, urgently need to deploy new solution. • Expect new system to be based on free software (Bugzilla, Request Tracker …) • Hope that deployed system will also meet needs of Testbed and Tier 2 sites. • Expect deployment by end of March.
Network Performance Tests • Simon Metson, Nick White, +…. • Preparing for CMS production. Must be able to move data to CERN at 100-200Mbit/second. • Currently aggregate 350Mbit/s to Bristol - but under 100Mbit/s to CERN. • Main problem seems to be within CMS infrastructure
Successes (2002) • Five additional staff online since January 2002. • Fully engaged in EDG testbed. Making an impact in EDG: Steve • Tier1A installation went very well in March/April/May • Tier A service ramp up excellent: • Most successful of the Tier A services. SLAC seem pleased - so far.
Challenges • Complete 2002/2003 tender/deployment • Carry out major EU tenders for 2003/2004 • Expand use of Tier 1 • Need to evolve strategy to cope with diversity of requirements • Deploy the LCG Testbed (What/When?) • Enhance automation / out of hours cover • Improve reporting to GridPP - accountability