670 likes | 686 Views
Learn how CERN's computer center manages offline farms, Linux & NT technology, performance issues, online farms, and cost considerations. Explore case studies like Nomad & NA49 farms, SGI Challenge with FDDI & HP, and PCSF simulation goals and milestones. Understand Linux issues, PCSF configuration, applications, and scalability. Discover server configurations, work solutions, and applications on PCSF including ATLAS Dice simulation, CMS reconstruction, and event filtering. Delve into Unix RFIO servers, Event Builder, ATLAS usage, and installation processes at CERN.
E N D
PC Farms at CERN Frédéric Hemmer CERN-IT/PDP
Disclaimer • This will cover farms which imply an involvement of CERN’s computer center. • There are other farms in strict online environments or “private” farms in building. Frédéric Hemmer CERN-IT/PDP
Overview • Off line farms • Linux farms • NT farms • Issues • PC Technology & Performance • Online Farms & quasi online farms • Cost of ownership • Conclusions Frédéric Hemmer CERN-IT/PDP
Linux Farms - Nomad • Proof of concept in Summer 97 • Straight NQS port • SHIFT SW client port • CERNLIB port • NOMAD observed a quasi linearity with clock frequency compared to Alpha’s !!! • I.e. Alpha@266 MHz = PII@266 MHz • Now 17 PC’s dual, 3 types of MB Frédéric Hemmer CERN-IT/PDP
Linux Farms - NA49 • NA49 already deployed privately a PC farm in their premises • Request a new farm to be deployed in order to benefit from the computer center infrastructure (people and equipment …) in 1 H98 • Trivial deployment, running with NQS • Most PC’s are branded PC’s (HP) • Now completely off RISC for CPU • 18 DUALS @ 300->400 MHz Frédéric Hemmer CERN-IT/PDP
SGI Challenge FDDI HP K260 HP K260 HP K260 HP K260 HP K260 PC PC PC PC PC PC CORE Tape Servers Unix Server Unix Server Unix Server NA49 Analysis - data access HiPPI 600 GB 1 Run 100BT From experiment 10-12 TB / month 1 month/year Manual Feed 100 GB Cartridges SONY DMS Frédéric Hemmer CERN-IT/PDP
Linux Farms (NA48) • NA48 was using the QSW CS/2 (128 proc.) • CS/2 overload -> investigate PC’s in late 97 • Installation of 12 Dual machines in 1Q98 and more ... Frédéric Hemmer CERN-IT/PDP
Linux Issues • EEPRO 100 B MP crashes • AFS support (MP) • NFS support (MP) • Commercial software • Manufacturer support for Linux • Very few Linux experts Frédéric Hemmer CERN-IT/PDP
NT offline Farms • PCSF • Simulation facility but … • COMPASS • Evaluating & benchmarking technology Frédéric Hemmer CERN-IT/PDP
PCSF - Overview • Configuration • Applications • Data access • Specific work & solutions • Key issues • Conclusions Frédéric Hemmer CERN-IT/PDP
PCSF - Goals • Make PC+NT a standard option for Physics Data Processing, starting with simulation • Establish a minimum management model for NT farm management • Address scalability issues • Gain Windows NT experience Frédéric Hemmer CERN-IT/PDP
PCSF Milestones • Joined RD47 in Autumn 96 • Price inquiry issued in 12/96 • Hardware delivered 4/97 • Ready to use 6/97 • RD47 report 10/97 • Expansion 5/98 Frédéric Hemmer CERN-IT/PDP
PCSF Configuration (1) • Server running NT 4.0 Server SP3 • 1 dual capable Ppro @ 200 MHz, 96 MB, with 9 GB data disk (with mirroring). LSF central queues. • Server running NT Terminal Server Beta 2 • 1 dual Ppro @ 200 MHz, 128 MB, with 4 GB data disk. Runs IIS 3.0 and is accessible from outside CERN. It also host the asp’s for Web access • Servers running NT 4.0 Workstation SP3 • 9 dual Ppro’s @ 200 MHz, 64 MB, 2*4GB • 25 dual PII’s @ 300 MHz, 128 MB, 2*4GB All equipped with boot proms Frédéric Hemmer CERN-IT/PDP
PCSF Configuration (2) • Machines interconnected with 4 3com 3000 100BaseT switch • Display/Keyboard/Mouse connected to a Raritan multiplexor • PC Duo for remote admin access There were problems with other products • All running LSF 3.0. LSF 3.2 does not work, support weak • Completely integrated with NICE Frédéric Hemmer CERN-IT/PDP
Applications on PCSF • ATLAS Dice simulation • NA45 1996 reconstruction • CMS reconstruction with Objectivity being tested • LHCB simulation code ready • ATLAS reconstruction being ported • ATLAS/Marseille event filter prototype scalability tests Frédéric Hemmer CERN-IT/PDP
Unix RFIO Server Unix RFIO Server Unix RFIO Server Unix RFIO Server Network NT PC NT PC NT PC NT PC NT PC NT PC Data access RFIO Unix Tape Server stagexxx commands Frédéric Hemmer CERN-IT/PDP
Event Builder SFI SFI SFI l l l l l l l l l ATLAS Level 3 DAQ Readout Buffers 1 GB/s Processor Farm Storage (100 MB/s) Frédéric Hemmer CERN-IT/PDP
ATLAS Event Filter • Testbed for evaluating algorithms & sizing • Architecture & simulation studies • Monitoring, system management, feedback, etc… • Interface prototypes (SFI, SFO) • Timescale : prototype -1 (I.e. end 98) • Status : sizing of an initial farm Frédéric Hemmer CERN-IT/PDP
PCSF Usage Frédéric Hemmer CERN-IT/PDP
Specific work so far • Installation (Remote Boot, Winstall, NICE replica’s, Install Server) • User codes, CERNLIB, SHIFT • Job Starter • PC MGR • WNTS • Web Interface Frédéric Hemmer CERN-IT/PDP
Installation • Disk cloning + change SID Fastest method, but not very automated • Remote boot • Remote boot install procedures with virtual disk • Use unattended setup, installs Winstall and other things • Third party packages installed through Winstall boot prom support on some hardware Frédéric Hemmer CERN-IT/PDP
Porting • Usually porting code from Unix to NT is easy (NA45 code ported in 1 week) • Usually porting production environment from Unix to NT is difficult (shell scripts) • Porting build environment is difficult, better to use native tools (Dev Studio) Mixing Unix and NT build environment, revision control, etc. Frédéric Hemmer CERN-IT/PDP
Jobstarter • Initially inherited from Unix LSF CERN JobStarter • Rewritten in C++, using PcMgrSvc for drive mapping • Check execution preconditions • Clean up normal and abnormal job end • Kill popup dialog windows Excel & Winzip in batch Frédéric Hemmer CERN-IT/PDP
PcMgrSvc/Ctl • Checks • Status of monitored processes/services • Amount of scratch space • Drive mapping(s) • Map/Unmap drives • Sync. with time servers • Generate alarms on request • Gets all parameters from registry Frédéric Hemmer CERN-IT/PDP
Web Interface • As a solution to • Remote access from outside CERN • Access from non NT hosts • Implemented as ASP’s with VB • Requires IIS on the server Frédéric Hemmer CERN-IT/PDP
Web Interface - authentication Frédéric Hemmer CERN-IT/PDP
Web Interface - Overview Frédéric Hemmer CERN-IT/PDP
Web Interface - bjobs Frédéric Hemmer CERN-IT/PDP
Web interface - bjobs result Frédéric Hemmer CERN-IT/PDP
Windows NT Terminal Server Frédéric Hemmer CERN-IT/PDP
Next Steps • Finish and understand remote boot issues • Complete remote boot - remote install • AFS Integration • Build up resilience • Investigate how to use the new WfM, DMI, PXE, ACPI, etc. initiatives • Investigate whether WSH is an alternative • Investigate NT’s I/O capabilities Frédéric Hemmer CERN-IT/PDP
Key Issues • AFS access • LSF support • Boot proms, equipment interoperability • CODE reintegration (Physics & CERNLIB) • Think Windows • Scalability & Management (home grown solution vs. commercial apps.) • Remote & external access Frédéric Hemmer CERN-IT/PDP
PC with NT • PC+NT has proven to work in batch environment, and is now an option for Physics Data Processing • Farm management is less of a concern after have built a few tools (alternatives would be to use SMS or TNG), but some work is still needed • Scalability has started to be addressed, but the relatively small number of nodes does not help here • Considerable NT experience has been gained Frédéric Hemmer CERN-IT/PDP
Issues so far • Linux • EEPRO 100 B MP support • Commercial software • Manufacturer support • Very few local Linux experts • NT • AFS access • LSF support • Think Windows • Remote and external access • PC • Interoperability (cards/MB combination • Remote Boot support Frédéric Hemmer CERN-IT/PDP
PC Technology evolution in 97 • Pentium Pro Pentium II • 50 % raw performance increase • but 50 % cache performance reduction • SEC new motherboards • 440 FX 440 LX (SDRAM, AGP) • Recent MB’s embedded SCSI, E’net, VGA • 100 Mbit E’net switches standard, 1000 Mbit arriving Frédéric Hemmer CERN-IT/PDP
PC Technology evolution in 98 • Pentium II @300 MHz Pentium Xeon @ 450 MHz • MP support • 50 % cache performance increase • Slot 2 new motherboards • 440 LX 440 BX, 440 NX (100 MHz, EDO) • Recent MB’s No more available through Intel, TYAN • 1000 Mbit/s E’net switches standard, >> 1000 Mbit/s arriving Frédéric Hemmer CERN-IT/PDP
Racking evolution 1998 1997 Frédéric Hemmer CERN-IT/PDP
Fast Ethernet Switches (Oct. 98) Frédéric Hemmer CERN-IT/PDP
At the back of Fast Ethernet Switches (Oct. 98) Frédéric Hemmer CERN-IT/PDP
Gigabit Ethernet Switches Frédéric Hemmer CERN-IT/PDP
Network performance: Results • PC’s interconnected through 100 BaseT 3Com 3000 switch • Repeated with other H/W • Half duplex behavior • Block size does not matter • Linux uses less CPU than NT Good unidirectional performance Disappointing CPU consumption on NT Disappointing bi-directional performance Frédéric Hemmer CERN-IT/PDP
PC to PC Network performance Frédéric Hemmer CERN-IT/PDP
Network performance: issues • Unexplained 0.5 MB/s observed with some eepro100 versions on PCRD hardware, but OK on PCSF • Recent DEC E'net boards with chipset > 21140 give poor performance on Linux • Surprising results PC/Alpha Frédéric Hemmer CERN-IT/PDP
PC/Alpha Network performance Frédéric Hemmer CERN-IT/PDP
HiPPI (5/98) PII, 300 MHz, 440LX, SDRAM, Roadrunner to SGI O2000, 4 CPU, IRIX 6.4 Transmit: 50 MB/s Receive: 50 MB/s (53 MB/s with SMP) Gigabit Ethernet (10/98) PII, 400 MHz, 440 BX, 100 MHz SDRAM, PCI 32/33, Tigon I 1500 bytes/packet: 28 MB/s, 40% CPU 9000 bytes/packet, 90 MB/s, 90% CPU PC High Performance Networking Frédéric Hemmer CERN-IT/PDP
Disk performance • PC’s connected to SEAGATE ST19171W using two Adaptec 2940 UW • NT needs a lot of tuning (default behavior is to swap data out!) • Block size, BIOS settings, EDO/FPM does not matter Poor performance Windows NT even worse Memory bandwidth is suspected Frédéric Hemmer CERN-IT/PDP
Disk performance • Striping has no effect • 1 stream 2 stripes : 21 MB/s (22 max) • 1 stream 3 stripes : 21 MB/s (33 max) Frédéric Hemmer CERN-IT/PDP
Disk performance: issues • Memory bandwidth suspected • Need to test with LX/SDRAM, BX SDRAM@100 Mhz • RISC PCI does not support variety of boards • Combined disk/network performance even worse : 5-6 MB/s on Linux Frédéric Hemmer CERN-IT/PDP
Memory bandwidth (lmbench) Frédéric Hemmer CERN-IT/PDP