120 likes | 271 Views
CASPUR Site Report. Andrei Maslennikov Group Leader - Systems RAL, April 1999. Will be shortly covered:. Central computers Other nodes Network Distributed storage Tape-related systems CASPUR and HEP Gentes/Ateneo project Short-term plans.
E N D
CASPUR Site Report Andrei Maslennikov Group Leader - Systems RAL, April 1999
Will be shortly covered: • Central computers • Other nodes • Network • Distributed storage • Tape-related systems • CASPUR and HEP • Gentes/Ateneo project • Short-term plans A.Maslennikov - HEPiX - RAL 99
Central computers • Alpha SMP Cluster 4100 - 28 processors - DU 4.0d • interactive (front-end) : 1 x 400Mhz/2Gb • parallel batch (LSF) : 4 x 400Mhz/1Gb + 2 x 600Mhz/2Gb • 1999: 20 more EV6 processors (or upgrade to), 32-proc “wildfire”? • Sun SMP - 22 processors - Solaris 2.6 • interactive + parallel batch (LSF) : 1 x 3500/336Mhz/2Gb (8 processors) • parallel batch (LSF) : 1 x 4500/336Mhz/3.6GB (14 processors) • 1999: waiting for new SMP models • IBM SP2 - 32 processors - AIX 4.3.2++/PSSP2.4++ • interactive : 4 thin nodes (390) • serial batch (LSF) : 12 thin nodes • parallel batch+interactive (EASY) : 16 thin nodes • 1999: waiting for SP3 offer (need SMP nodes with 4-16 proc) A.Maslennikov - HEPiX - RAL 99
Other nodes • Some 200 UNIX nodes under our direct supervision • (all UNIX flavours, single nodes and clusters). • Around 100 PCs running Windows and Linux. • Worth mentioning: • - Linux Beowulf Cluster (10 PPro 200 + 4 PII 400) • (MPI with GAMMA protocol on Digital FE cards) • - Graphics nodes: 2 Alpha 533au(2) with 4D51T • and 4D60T cards with 64 MB of texture memory; • - 2 Power-3 biprocessor AIX nodes A.Maslennikov - HEPiX - RAL 99
Network • In 1998 our LAN became fully switched, currently • we have around 100 100baseT switch ports. • Switch hardware: several Cabletron and Compaq switches • interconnected via Gigabit Ethernet; we also use virtual LANs • Principal nodes are on FDDI (22 DEC GigaSwitch ports) • Planning to try Gigabit Ethernet at host level, • few GE cards are already under test on Sun and Linux A.Maslennikov - HEPiX - RAL 99
Distributed Storage • TCP/IP-less datastore with true data sharing across platforms • is not yet available. So we are still investing in both NFS and • AFS solutions. • NFS is mainly used as a store for large data files, and as an • element of the Staging System. • AFS is used for home directories and as a store for collections • of various ready-to-run software. We currently run 6 cells with • some 300 Gb online, also over WAN. A.Maslennikov - HEPiX - RAL 99
NFS: one more Filer • Current NFS Server: F540 Network Appliance Filer with • 150 Gb of formatted RAID space on FE and FDDI. • Just ordered: another Filer (F760/600Mhz/1Gb) with • 300 Gb of RAID disk and GE/FDDI network interfaces • - 3 times more NFSops/sec than F540 • - allows for clustering (better scaleability) A.Maslennikov - HEPiX - RAL 99
AFS: news since last report • Purchased AFS Source Code. This allowed us to compile AFS on Solaris/Intel • (thanks to Rainer Toebbicke /CERN who proved that this is possible). • University of Rome-3 went Solaris/Intel also for DB (3 servers). • Abdus Salam Centre for Theoretical Physics joined our AFS License. • Upgraded central servers (now 3 Alpha 500au on FE and FDDI). • Proved to be very stable and performant. • We go Fibre Channel! • - Just ordered 280 Gb of RAID-5/FC from Artecon • - Dual active-active controllers • - Gadzoox hubs and HBAs from Genroco • - This system will be replacing most of the on-site AFS disks. A.Maslennikov - HEPiX - RAL 99
Tape access • During l998, all services which use the tape robotics • operated steamlessly: AFS and ADSM backups, staging. • Some 80 Gb were deeply archived via the Staging System. • With F540 Filer we stage at 4+ Mbytes/sec, almost at the • limit of Timberline tape. • In 1999 we plan to replace the STK Silo with 9840 library: • - doubles the tape speed • - BABAR-compliant • - smaller maintenance fees • - frees the physical space in computer centre. A.Maslennikov - HEPiX - RAL 99
CASPUR and HEP • Geographical AFS system support for INFN • Regular ASIS mirroring over WAN to 17 INFN Sections across Italy • Linux system support for INFN. • - Linux tree maintenance • - AFS-enabled bootable Linux CDs at the latest patchlevel. • Software collaboration with CERN (ASIS, Linux, AFS). • Regional Centre for BABAR: fullscale system support. A.Maslennikov - HEPiX - RAL 99
Gentes/Ateneo project • Scope: provide a turnkey computing environment for a generic research organization / university department. • Fully Intel-based • Desktop on Linux and/or WNT • Just 4 Intel machines make into a core: • - Entry Point Linux host with a firewall • - AFS fileserver on Solaris • - Management Linux host with YARD dbms and https tools • - General Services (mail,web,print,efax,ppp,majordomo etc) • on a single Linux (SMP) machine • WNT/Linux AFS-based integration: single password, • common filestore, YARD ODBC • Client installation: cloning with Norton Ghost • Progressing well. First presentation: June 1999. A.Maslennikov - HEPiX - RAL 99
Some short-term plans • Compile AFS 3.5 Server on Solaris/Intel • - will improve performance for en masse serving of small files • Test FC on Linux (QLogic card) • - first to provide a RAID space for mail spool • - next to take a look at Global File System (w. Seagate disks) • Test FC on AIX • - CASPUR will be probably asked to propose a set of • high availability services for PCM; IBM DFS with • FC RAID might make into a good combination. • Try LoadLeveler on Solaris • - LSF becomes too expensive (they charge per CPU) A.Maslennikov - HEPiX - RAL 99