200 likes | 309 Views
Oxford PP Computing Site Report. HEPSYSMAN 28 th April 2003 Pete Gronbech. General Strategy . Approx 200 Windows 2000 Desktop PC’s with Exceed used to access central Linux systems Digital Unix and VMS phased out for general use. Red Hat Linux 7.3 is becoming the standard. Network Access.
E N D
Oxford PP Computing Site Report HEPSYSMAN 28th April 2003 Pete Gronbech
General Strategy • Approx 200 Windows 2000 Desktop PC’s with Exceed used to access central Linux systems • Digital Unix and VMS phased out for general use. • Red Hat Linux 7.3 is becoming the standard
Network Access Super Janet 4 2.4Gb/s with Super Janet 4 Physics Backbone Router 100Mb/s Physics Firewall OUCS Firewall 100Mb/s 1Gb/s Backbone Edge Router 1Gb/s 100Mb/s Campus Backbone Router 100Mb/s 1Gb/s depts Backbone Edge Router depts 100Mb/s depts 100Mb/s depts
Physics Backbone Upgrade to Gigabit Autumn 2002 Linux Server 1Gb/s Physics Firewall Server Gb/s switch 1Gb/s Win 2k Server 1Gb/s 100Mb/s Particle Physics 1Gb/s 100Mb/s Physics Backbone Router 100Mb/s 1Gb/s desktop Clarendon Lab 100Mb/s 1Gb/s desktop 1Gb/s 1Gb/s 100Mb/s Astro Atmos Theory
Autumn 2002 RH7.3 RH7.3 RH7.3 RH7.3 PBS Batch Farm Autumn 2002 4*Dual 2.4GHz systems CDF General Purpose Systems Fermi7.3.1 RH7.1 RH7.3 RH7.3 RH6.2 1Gb/s pplx2 pplx1 morpheus pplxfs1 pplxgen minos DAQ RH7.3 RH7.3 RH6.2 RH7.1 ppminos1 ppminos2 pplx3 (SNO) ppnt117 (HARP) cresst DAQ RH7.1 RH7.3 Grid Development ppcresst1 ppcresst2 RH6.2 RH6.2 RH6.2 RH6.2 RH6.2 RH6.2 RH7.3 Atlas DAQ RH7.1 RH7.1 pptb01 tblcfg tbse01 tbce01 grid pplxbatch pptb02 sam testing edg ui ppatlas1 atlassbc
RH7.3 RH7.3 RH7.3 RH7.3 PBS Batch Farm Autumn 2002 4*Dual 2.4GHz systems General Purpose Systems RH7.3 RH7.3 RH6.2 1Gb/s pplx2 pplxfs1 pplxgen
Zero - D X- 3i SCSI -IDE RAID 12 * 160GB Maxtor Drives This proved to be a disaster and was rejected in favour of bare scsi disks which we internally mounted in our rack mounted file server Supplied by Compusys
The Linux File Server: pplxfs1 8*146GB SCSI disks
General Purpose Linux Server : pplxgen pplxgen is a Dual 2.2GHz Pentium 4 Xeon based system with 2GB ram. It is running Red Hat 7.3 It was brought on line at the end of August 2002 to share the load with pplx2 as users migrated off al1 (the Digital Unix Server)
PP batch farm running Red Hat 7.3 with Open PBS can be seen below pplxgen This service became fully operational in Feb 2003.
FEBRUARY 2003 CDF Fermi7.3.1 RH7.1 pplx1 (new) morpheus 1Gb/s LHCB MC Fermi7.3.1 Fermi7.3.1 RH6.2 RH6.2 Fermi7.3.1 node9 Fermi7.3.1 Fermi7.3.1 grid pplxbatch Fermi7.3.1 RH6.1 Fermi7.3.1 Fermi7.3.1 Fermi7.3.1 tbgen01 Fermi7.3.1 Grid Development node1 Fermi7.3.1 Fermi7.3.1 RH7.3 RH6.2 RH6.2 RH6.2 RH6.2 RH6.2 RH6.2 cdfsam matrix pptb01 tblcfg tbse01 tbce01 tbwn01 tbwn02 pptb02 edg ui sam testing
Grid development systems. Including EDG software testbed setup.
New Linux Systems Morpheus is an IBM x370 8 way SMP 700MHz Xeon with 4GB RAM and 1TB Fibre Channel disks Installed August 2001 Purchased as part of a JIF grant for the cdf group Runs Red Hat 7.1 Will use cdf software developed at Fermilab and here to process data from the cdf experiment.
Tape Backup is provided by a Qualstar TLS4480 tape robot with 80 slots and Dual Sony AIT3 drives. Each tape can hold 100GB of data. Installed January 2002. Netvault Software from BakBoneis used, running on morpheus, for backup of both cdf and particle physics systems.
Second round of cdf JIF tender: Dell Cluster - MATRIX 10 Dual 2.4GHz P4 Xeon servers running Fermi linux 7.3.1 and SCALI cluster software. Installed December 2002
Approx 7.5 TB for SCSI RAID 5 disks are attached to the master node. Each shelf holds 14 146GB disks. These are shared via NFS with the worker nodes. OpenPBS batch queuing software is used.
Plenty of space in the second rack for expansion of the cluster.
Lhcb Monte Carlo Setup Compute Node Grid Gateway 8 way 700MHz Xeon Server RH6.2OpenAFSOpenPBS gridRH6.2Globus1.1.3OpenAFSOpenPBS The 8 way SMP has now been reloaded as a MS Windows Terminal Server and lhcb MC jobs will be run on the new pp farm.
Problems • IDE Raid proved to be unreliable, caused lots of down time. • Problems with NAT (using iptables caused NFS problems and hangs) Solved by dropping NAT and using real IP addresses for PP farm • Trouble with ext3 journal errors. • Hackers…
Problems • Lack of Manpower! • Number of Operating systems slowly reducing, Digital unix and vms very nearly gone. NT4 also practically eliminated. • Getting closer to standardising on RH 7.3 especially as the EDG software is now heading that way. • Still finding it very hard to support laptops but now have a standard clone and recommend IBM laptops. • Would be good to have more time to concentrate on security…. (See later talk)