320 likes | 516 Views
Computer Hardware and Procurement at CERN. Helge Meinhard (at) cern ch HEPiX fall 2005 @ SLAC. Outline. Procedures Hardware (being) procured Power measurements Observations. Procedures. Constraints (1). CERN is an international organisation with strict administrative rules
E N D
Computer Hardware and Procurement at CERN Helge Meinhard (at) cern ch HEPiX fall 2005 @ SLAC
Outline • Procedures • Hardware (being) procured • Power measurements • Observations HEPiX@SLAC: Hardware procurement at CERN
Constraints (1) • CERN is an international organisation with strict administrative rules • Competitive tendering required covering (at least) member states • No way to avoid for commodity equipment • Lowest compliant bid wins • No negotiations about added value of higher offers HEPiX@SLAC: Hardware procurement at CERN
Constraints (2) • Different procedures depending on expected volume • < 10’000 CHF: IT seeks 3 offers • < 200’000 CHF: Formal price enquiry by purchasing service. Four weeks response time • < 750’000 CHF: Formal call for tender preceded by market survey. Six weeks response time • > 750’000 CHF: As < 750’000 CHF, plus approval by CERN’s Finance Committee (5 sessions/year, papers ready two months in advance) (1 CHF = 0.78 USD = 0.65 EUR) HEPiX@SLAC: Hardware procurement at CERN
Our problems • Procedures badly adapted to quickly evolving computing market • Difficult to give preference to “good”, reliable equipment HEPiX@SLAC: Hardware procurement at CERN
Our choices (1) • For significant purchases (> 100 kCHF) we require (a) sample system(s) • with the tender for big tenders • on CERN’s request for small tenders • Tenders include 3 years on-site warranty for hardware • Typical requirements: • 4 working hours response / 12 working hours repair for critical machines • 3 working days response / 5 working days repair for farm nodes • Supplier can subcontract on-site warranty HEPiX@SLAC: Hardware procurement at CERN
Our choices (2) • Payment within 30 days after provisional acceptance on receipt of bank guarantee of 5% of purchase sum valid until end of warranty period • Delivery within 6 weeks, penalty for late delivery: 2% of purchase sum per complete week, max. 10% HEPiX@SLAC: Hardware procurement at CERN
Our choices (3) • If more than 10% systems fail during acceptance or during first month after: right to return the whole batch • If a system fails 3 or more times during any 6 months’ period, right to request complete replacement of system • If more than 20% of any component fail during any 6 months’ period, right to request complete replacement of this component across batch • If CERN adds third-party devices, no impact on warranty obligations for system as delivered HEPiX@SLAC: Hardware procurement at CERN
Our choices (4) • If justified by volume, procure from two suppliers (lowest and second-lowest compliant) • Better protection if one delivers crap or nothing at all • Better chance for companies to win an order • Increased workload on our part HEPiX@SLAC: Hardware procurement at CERN
Example of a procurement • Procurement of equipment worth < 750 kCHF • Approval by Finance Committee not needed • Market survey already done • Market survey can cover different types of equipment • Valid for 1 year • If not done yet, add ~ 16 weeks HEPiX@SLAC: Hardware procurement at CERN
Steps (1) • Fix scope 2 w • Write technical, commercial docs 3 w • IT-internal review • Revise technical, commercial docs 2 w • Specification meeting • Revise technical, commercial docs 1 w • Tender out • Deadline for replies 6 w • Opening of replies 1 w (Total so far: 15 weeks, at best compressible to 12 weeks) Typical case HEPiX@SLAC: Hardware procurement at CERN
Steps (2) (Total from previous slide: 15 w, min. 12 w) • Technical analysis of replies 1 w • Visual inspection, mounting 1 w • Benchmarks, reports 3 w • Technical clarifications 1 w • Purchase request, order 2 w • Delivery 7 w • Preliminary acceptance 6 w Total: 36 weeks, compressible to 30 weeks Typical case HEPiX@SLAC: Hardware procurement at CERN
Objectives • Cover existing needs with as few different models and as few procurement procedures as possible • Closely follow technology and market evolution and satisfy requirements with modern hardware at low cost contradiction HEPiX@SLAC: Hardware procurement at CERN
From CERN site report 2005/10/11 Fabric Infrastructure and Operations (1) • RedHat 7.3 phased out on public services • Campaign on storage nodes far advanced • New in machine room since Karlsruhe: • 200 farm PCs (dual Nocona): in production • 116 disk servers (> 5 TB usable each, total of 900 TB gross capacity): part in production, part under acceptance test • 112 “midrange servers”: under acceptance test • 32-node Infiniband-based cluster for Theory • Refurbishment of machine room proceeding • LHS being populated, but power remains limited Talk HEPiX@SLAC: Hardware procurement at CERN
Hardware being procured (1) • Large volumes – several times < 750 kCHF per year • “Farm PCs” – non-redundant, cheap dual-processor work horses • “Disk servers” – storage-in-a-box systems with many SATA disks for streaming applications HEPiX@SLAC: Hardware procurement at CERN
Hardware being procured (2) • Medium-size volumes – once < 750 kCHF per year or once or several times < 200 kCHF per year • “Midrange servers” – redundant building blocks for specific applications • “Tape servers” – midrange servers with an FC interface • “Disk arrays” – autonomous RAID units with FC uplinks • SAN infrastructure (most notably FC switches) • Head nodes for serial console infrastructure • “Small disk servers”, somewhere between disk servers and midrange servers • Miscellaneous HEPiX@SLAC: Hardware procurement at CERN
Specifications: Farm PCs (1) • 2 boxed Intel Noconas of 2.8 GHz • Mainboard: • BMC (IPMI 1.5 or higher) • PXE, USB boot • BBS menu • Console redirection • Configurable to stay off on AC power loss • 2 GB ECC memory • From mainboard manuf. approved list • Upgradable to 4 GB without removing modules HEPiX@SLAC: Hardware procurement at CERN
Specifications: Farm PCs (2) • 1 disk > 140 GB, IDE not permitted • Certified for 24/7, 3 y warranty by disk manuf. • 1 GigE providing PXE and IPMI access • 19” chassis max. 4 U, with rails • Power, reset button • Power, disk activity LED • Power supply supporting machine + 50 W • Active PFC • C13 to C14 LSZH power cord • Guaranteed to run under RHEL 3 (i386 and x86_64) • Delivery within 6 weeks from dispatch of order HEPiX@SLAC: Hardware procurement at CERN
Specifications: Disk server (1) • 1 or 2 boxed Intel Xeon with EM64T • Mainboard as for Farm PCs • Now adding support for memory mirroring • Memory as for Farm PCs • General requirements for disks etc. • ≥ 7200 rpm, no EIDE, 3 y warranty, certified for 24/7 by manufacturer • Metallic hot-swap trays certified by chassis manuf. • Indicators for power and activity for each tray • PCB backplanes for disks, multilane cabling • “Intelligent” RAID controllers HEPiX@SLAC: Hardware procurement at CERN
Specifications: Disk server (2) • System disks: 2 x ≥ 140 GB mirrored • Data disks: all identical • Redundant RAIDs with hot spares (min. 1/15) • Total usable capacity per system above 5 TB • Battery buffer if controller with active cache • 1 GigE providing required performance, PXE, IPMI access • 19” chassis rack-mountable with rails • Min. 40 TB usable in 42 U high rack • Power supply: N+1 redundant, active PFC • Guaranteed to run under RHEL 3 (i386 and x86_64) • Delivery within 6 weeks from dispatch of order HEPiX@SLAC: Hardware procurement at CERN
Specifications: Disk server (3) • Performance: memory to disk: iozone with 16 GB files and 256 kb record size • Single stream: 40 MB/s write, 40 MB/s read • Multi-stream (at least 10): 115 MB/s write, 170 MB/s read (*) • Memory to network: iperf • Single stream: 100 MB/s write, 100 MB/s read • Two streams: 110 MB/s write, 110 MB/s read • Two streams in, two streams out: 145 MB/s HEPiX@SLAC: Hardware procurement at CERN
Specifications: Disk server (4) • Global (disk to network) performance: At least 10 clients transferring 2 GB files via rfio • Reading from system: 95 MB/s (*) • Writing to system: 90 MB/s (*) (*): Requirements scale linearly with usable capacity, numbers for 5000 GB usable HEPiX@SLAC: Hardware procurement at CERN
Power measurements Done by Andras Horvath, CERN
Power measurements http://ahorvath.home.cern.ch/ahorvath/power HEPiX@SLAC: Hardware procurement at CERN
Observations (1) • Profile of winning companies • Tier-1 suppliers competing with large integrators • Small ‘round the corner companies eliminated at Market Survey stage • Almost always the integrators win • Specially tailored solutions responding to our specifications • Prices of Tier-1s rather high in Europe HEPiX@SLAC: Hardware procurement at CERN
Observations (2) • Stress test as (important) part of the acceptance test • Introduced ~ 2 years ago (triggered by presentations from SLAC and FNAL at HEPiX) • Very useful • Based on va-ctcs • No longer sufficiently actively maintained • Large number of false positives • Looking for a replacement HEPiX@SLAC: Hardware procurement at CERN
Observations (3) • Pushing these procedures through requires dedicated (and knowledgeable) person power • Not obvious to run multiple procedures in parallel • In particular, if things go wrong, e.g. stress test fails HEPiX@SLAC: Hardware procurement at CERN
Summary • Computer hardware procurement is an excellent experimental confirmation of two fundamental laws of human nature • Murphy: “Everything that can go wrong will go wrong.” • Hoffstaedter: “Things always take longer than you think, even if you take into account Hoffstaedter’s law.” HEPiX@SLAC: Hardware procurement at CERN