1 / 67

PC Farms at CERN

Learn how CERN's computer center manages offline farms, Linux & NT technology, performance issues, online farms, and cost considerations. Explore case studies like Nomad & NA49 farms, SGI Challenge with FDDI & HP, and PCSF simulation goals and milestones. Understand Linux issues, PCSF configuration, applications, and scalability. Discover server configurations, work solutions, and applications on PCSF including ATLAS Dice simulation, CMS reconstruction, and event filtering. Delve into Unix RFIO servers, Event Builder, ATLAS usage, and installation processes at CERN.

emims
Download Presentation

PC Farms at CERN

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PC Farms at CERN Frédéric Hemmer CERN-IT/PDP

  2. Disclaimer • This will cover farms which imply an involvement of CERN’s computer center. • There are other farms in strict online environments or “private” farms in building. Frédéric Hemmer CERN-IT/PDP

  3. Overview • Off line farms • Linux farms • NT farms • Issues • PC Technology & Performance • Online Farms & quasi online farms • Cost of ownership • Conclusions Frédéric Hemmer CERN-IT/PDP

  4. Linux Farms - Nomad • Proof of concept in Summer 97 • Straight NQS port • SHIFT SW client port • CERNLIB port • NOMAD observed a quasi linearity with clock frequency compared to Alpha’s !!! • I.e. Alpha@266 MHz = PII@266 MHz • Now 17 PC’s dual, 3 types of MB Frédéric Hemmer CERN-IT/PDP

  5. Linux Farms - NA49 • NA49 already deployed privately a PC farm in their premises • Request a new farm to be deployed in order to benefit from the computer center infrastructure (people and equipment …) in 1 H98 • Trivial deployment, running with NQS • Most PC’s are branded PC’s (HP) • Now completely off RISC for CPU • 18 DUALS @ 300->400 MHz Frédéric Hemmer CERN-IT/PDP

  6. SGI Challenge FDDI HP K260 HP K260 HP K260 HP K260 HP K260 PC PC PC PC PC PC CORE Tape Servers Unix Server Unix Server Unix Server NA49 Analysis - data access HiPPI 600 GB 1 Run 100BT From experiment 10-12 TB / month 1 month/year Manual Feed 100 GB Cartridges SONY DMS Frédéric Hemmer CERN-IT/PDP

  7. Linux Farms (NA48) • NA48 was using the QSW CS/2 (128 proc.) • CS/2 overload -> investigate PC’s in late 97 • Installation of 12 Dual machines in 1Q98 and more ... Frédéric Hemmer CERN-IT/PDP

  8. Linux Issues • EEPRO 100 B MP crashes • AFS support (MP) • NFS support (MP) • Commercial software • Manufacturer support for Linux • Very few Linux experts Frédéric Hemmer CERN-IT/PDP

  9. NT offline Farms • PCSF • Simulation facility but … • COMPASS • Evaluating & benchmarking technology Frédéric Hemmer CERN-IT/PDP

  10. PCSF - Overview • Configuration • Applications • Data access • Specific work & solutions • Key issues • Conclusions Frédéric Hemmer CERN-IT/PDP

  11. PCSF - Goals • Make PC+NT a standard option for Physics Data Processing, starting with simulation • Establish a minimum management model for NT farm management • Address scalability issues • Gain Windows NT experience Frédéric Hemmer CERN-IT/PDP

  12. PCSF Milestones • Joined RD47 in Autumn 96 • Price inquiry issued in 12/96 • Hardware delivered 4/97 • Ready to use 6/97 • RD47 report 10/97 • Expansion 5/98 Frédéric Hemmer CERN-IT/PDP

  13. PCSF Configuration (1) • Server running NT 4.0 Server SP3 • 1 dual capable Ppro @ 200 MHz, 96 MB, with 9 GB data disk (with mirroring). LSF central queues. • Server running NT Terminal Server Beta 2 • 1 dual Ppro @ 200 MHz, 128 MB, with 4 GB data disk. Runs IIS 3.0 and is accessible from outside CERN. It also host the asp’s for Web access • Servers running NT 4.0 Workstation SP3 • 9 dual Ppro’s @ 200 MHz, 64 MB, 2*4GB • 25 dual PII’s @ 300 MHz, 128 MB, 2*4GB All equipped with boot proms Frédéric Hemmer CERN-IT/PDP

  14. PCSF Configuration (2) • Machines interconnected with 4 3com 3000 100BaseT switch • Display/Keyboard/Mouse connected to a Raritan multiplexor • PC Duo for remote admin access  There were problems with other products • All running LSF 3.0.  LSF 3.2 does not work, support weak • Completely integrated with NICE Frédéric Hemmer CERN-IT/PDP

  15. Applications on PCSF • ATLAS Dice simulation • NA45 1996 reconstruction • CMS reconstruction with Objectivity being tested • LHCB simulation code ready • ATLAS reconstruction being ported • ATLAS/Marseille event filter prototype scalability tests Frédéric Hemmer CERN-IT/PDP

  16. Unix RFIO Server Unix RFIO Server Unix RFIO Server Unix RFIO Server Network NT PC NT PC NT PC NT PC NT PC NT PC Data access RFIO Unix Tape Server stagexxx commands Frédéric Hemmer CERN-IT/PDP

  17. Event Builder SFI SFI SFI l l l l l l l l l ATLAS Level 3 DAQ Readout Buffers 1 GB/s Processor Farm Storage (100 MB/s) Frédéric Hemmer CERN-IT/PDP

  18. ATLAS Event Filter • Testbed for evaluating algorithms & sizing • Architecture & simulation studies • Monitoring, system management, feedback, etc… • Interface prototypes (SFI, SFO) • Timescale : prototype -1 (I.e. end 98) • Status : sizing of an initial farm Frédéric Hemmer CERN-IT/PDP

  19. PCSF Usage Frédéric Hemmer CERN-IT/PDP

  20. Frédéric Hemmer CERN-IT/PDP

  21. Specific work so far • Installation (Remote Boot, Winstall, NICE replica’s, Install Server) • User codes, CERNLIB, SHIFT • Job Starter • PC MGR • WNTS • Web Interface Frédéric Hemmer CERN-IT/PDP

  22. Installation • Disk cloning + change SID  Fastest method, but not very automated • Remote boot • Remote boot install procedures with virtual disk • Use unattended setup, installs Winstall and other things • Third party packages installed through Winstall  boot prom support on some hardware Frédéric Hemmer CERN-IT/PDP

  23. Porting • Usually porting code from Unix to NT is easy (NA45 code ported in 1 week) • Usually porting production environment from Unix to NT is difficult (shell scripts) • Porting build environment is difficult, better to use native tools (Dev Studio)  Mixing Unix and NT build environment, revision control, etc. Frédéric Hemmer CERN-IT/PDP

  24. Jobstarter • Initially inherited from Unix LSF CERN JobStarter • Rewritten in C++, using PcMgrSvc for drive mapping • Check execution preconditions • Clean up normal and abnormal job end • Kill popup dialog windows  Excel & Winzip in batch Frédéric Hemmer CERN-IT/PDP

  25. PcMgrSvc/Ctl • Checks • Status of monitored processes/services • Amount of scratch space • Drive mapping(s) • Map/Unmap drives • Sync. with time servers • Generate alarms on request • Gets all parameters from registry Frédéric Hemmer CERN-IT/PDP

  26. Web Interface • As a solution to • Remote access from outside CERN • Access from non NT hosts • Implemented as ASP’s with VB • Requires IIS on the server Frédéric Hemmer CERN-IT/PDP

  27. Web Interface - authentication Frédéric Hemmer CERN-IT/PDP

  28. Web Interface - Overview Frédéric Hemmer CERN-IT/PDP

  29. Web Interface - bjobs Frédéric Hemmer CERN-IT/PDP

  30. Web interface - bjobs result Frédéric Hemmer CERN-IT/PDP

  31. Windows NT Terminal Server Frédéric Hemmer CERN-IT/PDP

  32. Next Steps • Finish and understand remote boot issues • Complete remote boot - remote install • AFS Integration • Build up resilience • Investigate how to use the new WfM, DMI, PXE, ACPI, etc. initiatives • Investigate whether WSH is an alternative • Investigate NT’s I/O capabilities Frédéric Hemmer CERN-IT/PDP

  33. Key Issues • AFS access • LSF support • Boot proms, equipment interoperability • CODE reintegration (Physics & CERNLIB) • Think Windows • Scalability & Management (home grown solution vs. commercial apps.) • Remote & external access Frédéric Hemmer CERN-IT/PDP

  34. PC with NT • PC+NT has proven to work in batch environment, and is now an option for Physics Data Processing • Farm management is less of a concern after have built a few tools (alternatives would be to use SMS or TNG), but some work is still needed • Scalability has started to be addressed, but the relatively small number of nodes does not help here • Considerable NT experience has been gained Frédéric Hemmer CERN-IT/PDP

  35. Issues so far • Linux • EEPRO 100 B MP support • Commercial software • Manufacturer support • Very few local Linux experts • NT • AFS access • LSF support • Think Windows • Remote and external access • PC • Interoperability (cards/MB combination • Remote Boot support Frédéric Hemmer CERN-IT/PDP

  36. PC Technology evolution in 97 • Pentium Pro  Pentium II • 50 % raw performance increase • but 50 % cache performance reduction • SEC  new motherboards • 440 FX  440 LX (SDRAM, AGP) • Recent MB’s  embedded SCSI, E’net, VGA • 100 Mbit E’net switches standard, 1000 Mbit arriving Frédéric Hemmer CERN-IT/PDP

  37. PC Technology evolution in 98 • Pentium II @300 MHz  Pentium Xeon @ 450 MHz • MP support • 50 % cache performance increase • Slot 2  new motherboards • 440 LX  440 BX, 440 NX (100 MHz, EDO) • Recent MB’s  No more available through Intel, TYAN • 1000 Mbit/s E’net switches standard, >> 1000 Mbit/s arriving Frédéric Hemmer CERN-IT/PDP

  38. Racking evolution 1998 1997 Frédéric Hemmer CERN-IT/PDP

  39. Fast Ethernet Switches (Oct. 98) Frédéric Hemmer CERN-IT/PDP

  40. At the back of Fast Ethernet Switches (Oct. 98) Frédéric Hemmer CERN-IT/PDP

  41. Gigabit Ethernet Switches Frédéric Hemmer CERN-IT/PDP

  42. Network performance: Results • PC’s interconnected through 100 BaseT 3Com 3000 switch • Repeated with other H/W • Half duplex behavior • Block size does not matter • Linux uses less CPU than NT  Good unidirectional performance  Disappointing CPU consumption on NT  Disappointing bi-directional performance Frédéric Hemmer CERN-IT/PDP

  43. PC to PC Network performance Frédéric Hemmer CERN-IT/PDP

  44. Network performance: issues • Unexplained 0.5 MB/s observed with some eepro100 versions on PCRD hardware, but OK on PCSF • Recent DEC E'net boards with chipset > 21140 give poor performance on Linux • Surprising results PC/Alpha Frédéric Hemmer CERN-IT/PDP

  45. PC/Alpha Network performance Frédéric Hemmer CERN-IT/PDP

  46. HiPPI (5/98) PII, 300 MHz, 440LX, SDRAM, Roadrunner to SGI O2000, 4 CPU, IRIX 6.4 Transmit: 50 MB/s Receive: 50 MB/s (53 MB/s with SMP) Gigabit Ethernet (10/98) PII, 400 MHz, 440 BX, 100 MHz SDRAM, PCI 32/33, Tigon I 1500 bytes/packet: 28 MB/s, 40% CPU 9000 bytes/packet, 90 MB/s, 90% CPU PC High Performance Networking Frédéric Hemmer CERN-IT/PDP

  47. Disk performance • PC’s connected to SEAGATE ST19171W using two Adaptec 2940 UW • NT needs a lot of tuning (default behavior is to swap data out!) • Block size, BIOS settings, EDO/FPM does not matter  Poor performance  Windows NT even worse  Memory bandwidth is suspected Frédéric Hemmer CERN-IT/PDP

  48. Disk performance • Striping has no effect • 1 stream 2 stripes : 21 MB/s (22 max) • 1 stream 3 stripes : 21 MB/s (33 max) Frédéric Hemmer CERN-IT/PDP

  49. Disk performance: issues • Memory bandwidth suspected • Need to test with LX/SDRAM, BX SDRAM@100 Mhz • RISC PCI does not support variety of boards • Combined disk/network performance even worse : 5-6 MB/s on Linux Frédéric Hemmer CERN-IT/PDP

  50. Memory bandwidth (lmbench) Frédéric Hemmer CERN-IT/PDP

More Related