80 likes | 104 Views
This site report provides updates on central computers, storage news, network highlights, and projects for the year 2004 in Edinburgh. It covers IBM SMP, HP SMP, Itanium-2 SMP, NEC SX-6i, storage updates, and network highlights.
E N D
CASPUR Site Report Andrei Maslennikov Lead - Systems Edinburgh, May 2004
Contents • Update on central computers • Storage news • Network highlights • Projects 2004 A.Maslennikov - Edinburgh 2004
IBM SMP: • - 3 frames with 80 POWER-4 CPUs at 1.1 GHz and 144 GB of RAM • - 1 legacy frame with 64 POWER-3 CPUs at 375 MHz and 64 GB of RAM • - AIX 5.2 ML2+, AFS and SGEE on all nodes • - Very stable, all CPUs are heavily used • - Under lease until 2006; will be probably upgrading to POWER-5 in 2005 • HP SMP: • - 8 4-CPU ES45 nodes at 1.25 GHz, 64GB of RAM and 1.2 TB of local FC disk • - 6 Legacy ES40 nodes at 500-600 MHz used for BIOGRID project • - Tru64 5.1a++ on all nodes, AFS + SGEEE on 5 standalone nodes • - True Cluster on 9 nodes (AFS via Translator; powerful Solaris 9 gateway, memcache, modified SSH) • - Requires a lot of attention, but very fast and fully used (mainly computational chemistry apps) • - Arriving: 32-CPU EV7 node • Itanium-2 SMP: • - 1 single CPU, 5 biprocessor and 1 quad nodes (900 MHz - 1 Ghz – 1.5 GHz) • - RH AS 3 on one node, all others run CERN CEL3/AS3 Build for ia64, AFS, SGEEE • NEC SX-6i: • - single CPU 4GB RAM, 8 GFLOP • - speedup up to 10x against POWER4 for some apps, currently considering SMP purchase • Reference: several biprocessor Intel/32bit and AMD/64bit Central computers A.Maslennikov - Edinburgh 2004
Storage update • AFS: 4 cells on site and 6 outside • - OpenAFS 1.2.11 on Linux • - Main Servers: SuperMicro 2x2.8 GHZ, June 2004: 6 TB (Infortrend SATA/FC) • - Vice partitions in SGI XFS – only one XFS-related problem in 1.5 years • - Standalone backup server on GigE, 84 GB/hour with 2 LTO2 drives • - 3 cells are running Heimdal KDC since 6 months • - AFS-aware SSH 3.8p1 binary builds (GSSAPI, K5 or AFSpw login+token) • - Linux / WXP Heimdal single sign-on and AFS homedir in one of the cells. • Administration: ssh but have just successfully tested AD (w. help of INFN-Lecce) • - Will soon be migrating INFN’s national cell to K5-MIT (cross-realm and Win issues) A.Maslennikov - Edinburgh 2004
Storage update - 2 • NFS (Mountain View Data): • - In production since 1.5 years, very stable (runs off XFS, no crashes so far) • - 2 SuperMicro 2x2.8 GHZ, June 2004: 8 TB (Infortrend SATA/FC) • - 0.5 TB under staging (5 TB archived) • Digital Library services on GFS: • - Science Server, Web of Science web services – heavy load • - 3300 scientific magazines, 2.5 million articles in fulltext PDF, searchable DB • - Needed for load balancing: shared filestore with locking • - On Sistina GFS since 6 months, 3 SM 2-way servers, 16 TB (Infortrend SATA/FC) • - EXT3 copy of everything (tape backup is too slow for this number of files) A.Maslennikov - Edinburgh 2004
Network highlights • Plentitude of networks under control of Clavister FW • - Internal workplaces, training class, visitor’s room – only outgoing connectivity • - Internal and external DMZ’s, lab networks, internal DNS – quite complex • - Private NAS GigE network outside FW • - FW is far from saturation • Internet Exchange Point - NAMEX • - About 20 big customers (Telecom, Tiscali, Albacom, mobile operators, industry) • - Traffic: around 1 Gbit / sec • F-Root Name Server • - Second in Europe after Madrid, first (and still the only one) in Italy • IPv6 • - Active member of 6NET project • - CASPUR’s web site can be reached on IPv6 A.Maslennikov - Edinburgh 2004
CASPUR: principal resources in 2004 Internet Internal infrastructure Digital Library 16TB IBM – 150 CPUs (375 -1100 MHz) TSM Backup AFS Backup and Data Movers Itanium2 – 15 CPUs (0.9-1.5 GHz) HP 60 CPUs (667 – 1200MHz) NFS 8 TB AFS 6TB NEC 6Xi FC RAID SYSTEMS 32 TB FC TAPE SYSTEMS 60 / 120 TB Internet FC SAN Private NAS GigE Internal GigE’s A.Maslennikov - Edinburgh 2004
Some activities in 2004 • Technology tracking (in collab. with CERN and other centers) – 1 FTE • - New storage devices • - New software solutions in the field of storage • - Excellent relationship with vendors, • tested so far: more than 600 KUSD worth of hardware • Data replication over WAN (in collab. with ENEA and GARR) – 0.5 FTE • - Several centers with identical data inside and outside RDBMS • - Each center has to be fully autonomous but should be able to forward any new • data to all other centers • - Bidirectional DB and plain data exchanges with eventual mediation at the head organization • - Data mirroring with non-disruptive release scheme • Staging IIa – 1 FTE (funded by CSP/Turin) • - New version of Tape Dispatcher coming out (general clean-up, virtual tape library support) • - Remote FC tape / libraries will be supported • University “La Sapienza” – student accounts • - Provide an account (space, personal web page, mail etc) for each of the 150 000 students • - In progress: active discussions with Interdepartmental Computing Authority (CITICORD) A.Maslennikov - Edinburgh 2004