1 / 31

Computer Hardware and Procurement at CERN

Computer Hardware and Procurement at CERN. Helge Meinhard (at) cern ch HEPiX fall 2005 @ SLAC. Outline. Procedures Hardware (being) procured Power measurements Observations. Procedures. Constraints (1). CERN is an international organisation with strict administrative rules

paul
Download Presentation

Computer Hardware and Procurement at CERN

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computer Hardware and Procurement at CERN Helge Meinhard (at) cern ch HEPiX fall 2005 @ SLAC

  2. Outline • Procedures • Hardware (being) procured • Power measurements • Observations HEPiX@SLAC: Hardware procurement at CERN

  3. Procedures

  4. Constraints (1) • CERN is an international organisation with strict administrative rules • Competitive tendering required covering (at least) member states • No way to avoid for commodity equipment • Lowest compliant bid wins • No negotiations about added value of higher offers HEPiX@SLAC: Hardware procurement at CERN

  5. Constraints (2) • Different procedures depending on expected volume • < 10’000 CHF: IT seeks 3 offers • < 200’000 CHF: Formal price enquiry by purchasing service. Four weeks response time • < 750’000 CHF: Formal call for tender preceded by market survey. Six weeks response time • > 750’000 CHF: As < 750’000 CHF, plus approval by CERN’s Finance Committee (5 sessions/year, papers ready two months in advance) (1 CHF = 0.78 USD = 0.65 EUR) HEPiX@SLAC: Hardware procurement at CERN

  6. Our problems • Procedures badly adapted to quickly evolving computing market • Difficult to give preference to “good”, reliable equipment HEPiX@SLAC: Hardware procurement at CERN

  7. Our choices (1) • For significant purchases (> 100 kCHF) we require (a) sample system(s) • with the tender for big tenders • on CERN’s request for small tenders • Tenders include 3 years on-site warranty for hardware • Typical requirements: • 4 working hours response / 12 working hours repair for critical machines • 3 working days response / 5 working days repair for farm nodes • Supplier can subcontract on-site warranty HEPiX@SLAC: Hardware procurement at CERN

  8. Our choices (2) • Payment within 30 days after provisional acceptance on receipt of bank guarantee of 5% of purchase sum valid until end of warranty period • Delivery within 6 weeks, penalty for late delivery: 2% of purchase sum per complete week, max. 10% HEPiX@SLAC: Hardware procurement at CERN

  9. Our choices (3) • If more than 10% systems fail during acceptance or during first month after: right to return the whole batch • If a system fails 3 or more times during any 6 months’ period, right to request complete replacement of system • If more than 20% of any component fail during any 6 months’ period, right to request complete replacement of this component across batch • If CERN adds third-party devices, no impact on warranty obligations for system as delivered HEPiX@SLAC: Hardware procurement at CERN

  10. Our choices (4) • If justified by volume, procure from two suppliers (lowest and second-lowest compliant) • Better protection if one delivers crap or nothing at all • Better chance for companies to win an order • Increased workload on our part HEPiX@SLAC: Hardware procurement at CERN

  11. Example of a procurement • Procurement of equipment worth < 750 kCHF • Approval by Finance Committee not needed • Market survey already done • Market survey can cover different types of equipment • Valid for 1 year • If not done yet, add ~ 16 weeks HEPiX@SLAC: Hardware procurement at CERN

  12. Steps (1) • Fix scope 2 w • Write technical, commercial docs 3 w • IT-internal review • Revise technical, commercial docs 2 w • Specification meeting • Revise technical, commercial docs 1 w • Tender out • Deadline for replies 6 w • Opening of replies 1 w (Total so far: 15 weeks, at best compressible to 12 weeks) Typical case HEPiX@SLAC: Hardware procurement at CERN

  13. Steps (2) (Total from previous slide: 15 w, min. 12 w) • Technical analysis of replies 1 w • Visual inspection, mounting 1 w • Benchmarks, reports 3 w • Technical clarifications 1 w • Purchase request, order 2 w • Delivery 7 w • Preliminary acceptance 6 w Total: 36 weeks, compressible to 30 weeks Typical case HEPiX@SLAC: Hardware procurement at CERN

  14. Hardware (being) procured

  15. Objectives • Cover existing needs with as few different models and as few procurement procedures as possible • Closely follow technology and market evolution and satisfy requirements with modern hardware at low cost contradiction HEPiX@SLAC: Hardware procurement at CERN

  16. From CERN site report 2005/10/11 Fabric Infrastructure and Operations (1) • RedHat 7.3 phased out on public services • Campaign on storage nodes far advanced • New in machine room since Karlsruhe: • 200 farm PCs (dual Nocona): in production • 116 disk servers (> 5 TB usable each, total of 900 TB gross capacity): part in production, part under acceptance test • 112 “midrange servers”: under acceptance test • 32-node Infiniband-based cluster for Theory • Refurbishment of machine room proceeding • LHS being populated, but power remains limited Talk HEPiX@SLAC: Hardware procurement at CERN

  17. Hardware being procured (1) • Large volumes – several times < 750 kCHF per year • “Farm PCs” – non-redundant, cheap dual-processor work horses • “Disk servers” – storage-in-a-box systems with many SATA disks for streaming applications HEPiX@SLAC: Hardware procurement at CERN

  18. Hardware being procured (2) • Medium-size volumes – once < 750 kCHF per year or once or several times < 200 kCHF per year • “Midrange servers” – redundant building blocks for specific applications • “Tape servers” – midrange servers with an FC interface • “Disk arrays” – autonomous RAID units with FC uplinks • SAN infrastructure (most notably FC switches) • Head nodes for serial console infrastructure • “Small disk servers”, somewhere between disk servers and midrange servers • Miscellaneous HEPiX@SLAC: Hardware procurement at CERN

  19. Specifications: Farm PCs (1) • 2 boxed Intel Noconas of 2.8 GHz • Mainboard: • BMC (IPMI 1.5 or higher) • PXE, USB boot • BBS menu • Console redirection • Configurable to stay off on AC power loss • 2 GB ECC memory • From mainboard manuf. approved list • Upgradable to 4 GB without removing modules HEPiX@SLAC: Hardware procurement at CERN

  20. Specifications: Farm PCs (2) • 1 disk > 140 GB, IDE not permitted • Certified for 24/7, 3 y warranty by disk manuf. • 1 GigE providing PXE and IPMI access • 19” chassis max. 4 U, with rails • Power, reset button • Power, disk activity LED • Power supply supporting machine + 50 W • Active PFC • C13 to C14 LSZH power cord • Guaranteed to run under RHEL 3 (i386 and x86_64) • Delivery within 6 weeks from dispatch of order HEPiX@SLAC: Hardware procurement at CERN

  21. Specifications: Disk server (1) • 1 or 2 boxed Intel Xeon with EM64T • Mainboard as for Farm PCs • Now adding support for memory mirroring • Memory as for Farm PCs • General requirements for disks etc. • ≥ 7200 rpm, no EIDE, 3 y warranty, certified for 24/7 by manufacturer • Metallic hot-swap trays certified by chassis manuf. • Indicators for power and activity for each tray • PCB backplanes for disks, multilane cabling • “Intelligent” RAID controllers HEPiX@SLAC: Hardware procurement at CERN

  22. Specifications: Disk server (2) • System disks: 2 x ≥ 140 GB mirrored • Data disks: all identical • Redundant RAIDs with hot spares (min. 1/15) • Total usable capacity per system above 5 TB • Battery buffer if controller with active cache • 1 GigE providing required performance, PXE, IPMI access • 19” chassis rack-mountable with rails • Min. 40 TB usable in 42 U high rack • Power supply: N+1 redundant, active PFC • Guaranteed to run under RHEL 3 (i386 and x86_64) • Delivery within 6 weeks from dispatch of order HEPiX@SLAC: Hardware procurement at CERN

  23. Specifications: Disk server (3) • Performance: memory to disk: iozone with 16 GB files and 256 kb record size • Single stream: 40 MB/s write, 40 MB/s read • Multi-stream (at least 10): 115 MB/s write, 170 MB/s read (*) • Memory to network: iperf • Single stream: 100 MB/s write, 100 MB/s read • Two streams: 110 MB/s write, 110 MB/s read • Two streams in, two streams out: 145 MB/s HEPiX@SLAC: Hardware procurement at CERN

  24. Specifications: Disk server (4) • Global (disk to network) performance: At least 10 clients transferring 2 GB files via rfio • Reading from system: 95 MB/s (*) • Writing to system: 90 MB/s (*) (*): Requirements scale linearly with usable capacity, numbers for 5000 GB usable HEPiX@SLAC: Hardware procurement at CERN

  25. Power measurements Done by Andras Horvath, CERN

  26. Power measurements http://ahorvath.home.cern.ch/ahorvath/power HEPiX@SLAC: Hardware procurement at CERN

  27. Observations

  28. Observations (1) • Profile of winning companies • Tier-1 suppliers competing with large integrators • Small ‘round the corner companies eliminated at Market Survey stage • Almost always the integrators win • Specially tailored solutions responding to our specifications • Prices of Tier-1s rather high in Europe HEPiX@SLAC: Hardware procurement at CERN

  29. Observations (2) • Stress test as (important) part of the acceptance test • Introduced ~ 2 years ago (triggered by presentations from SLAC and FNAL at HEPiX) • Very useful • Based on va-ctcs • No longer sufficiently actively maintained • Large number of false positives • Looking for a replacement HEPiX@SLAC: Hardware procurement at CERN

  30. Observations (3) • Pushing these procedures through requires dedicated (and knowledgeable) person power • Not obvious to run multiple procedures in parallel • In particular, if things go wrong, e.g. stress test fails HEPiX@SLAC: Hardware procurement at CERN

  31. Summary • Computer hardware procurement is an excellent experimental confirmation of two fundamental laws of human nature • Murphy: “Everything that can go wrong will go wrong.” • Hoffstaedter: “Things always take longer than you think, even if you take into account Hoffstaedter’s law.” HEPiX@SLAC: Hardware procurement at CERN

More Related