120 likes | 298 Views
CERN Clusters. Tim Smith CERN/IT. Heterogeneous Clusters. 10 years evolution… HP-UX, IRIX, AIX, DUX, Solaris, WNT 4 years ago… Linux additions / replacements 37 clusters configurations ! e.g. CMS Interactive: Solaris, Linux Batch: Solaris, HP-UX, Linux. ‘RISC’ Decommissioning.
E N D
CERN Clusters Tim Smith CERN/IT
Heterogeneous Clusters • 10 years evolution… • HP-UX, IRIX, AIX, DUX, Solaris, WNT • 4 years ago… • Linux additions / replacements 37 clusters configurations ! • e.g. CMS • Interactive: Solaris, Linux • Batch: Solaris, HP-UX, Linux Tim Smith: LCW in FNAL
‘RISC’ Decommissioning Tim Smith: LCW in FNAL
The Rise and Fall of PC Clusters Tim Smith: LCW in FNAL
The Rise and Fall of PC Clusters Elonex III TechAS Elonex II Siemens HP Elonex I Cogestra Tim Smith: LCW in FNAL
Component Architecture High capacitybackboneswitch Application Server 100/1000baseT switch CPU CPU CPU CPU CPU Disk Server 1000baseT switch Tape Server Tape Server Tape Server Tape Server Tim Smith: LCW in FNAL
Concentrated Facilities • Interactive Cluster • 50 bi-processor PCs; 512 MB, 440-800 MHz • Batch Cluster with chaotic access • 280 bi-processor PCs; 0.1-1 GB, 440-800 MHz • Batch Cluster with scheduled access • 190 bi-processor PCs; 512 MB, 600-800 MHz • Tape and Disk server ‘Clusters’ Tim Smith: LCW in FNAL
‘Chaotic’ Clusters lxbatch001 lxbatch001 DNS load balancing lxbatch001 lxbatch001 lxbatch001 lxbatch001 lxbatch001 lxbatch001 lxbatch001 lxbatch001 lxplus001 lxplus001 lxplus001 LSF lxplus001 lxplus001 lxplus001 rfio lxplus001 lxplus001 lxplus001 tape001 rfio tape001 disk001 disk001 Tim Smith: LCW in FNAL
‘Chaotic’ Clusters Public_queues lxbatch001 lxbatch001 lxbatch001 lxbatch001 DNS load balancing ATLAS_queues lxbatch001 lxbatch001 lxbatch001 lxbatch001 lxbatch001 lxbatch001 CMS_queues lxbatch001 lxbatch001 lxbatch001 lxbatch001 lxbatch001 lxbatch001 lxbatch001 lxbatch001 lxbatch001 lxbatch001 production_queues lxplus001 lxplus001 lxplus001 LSF lxplus001 lxplus001 lxplus001 rfio lxplus001 lxplus001 lxplus001 tape001 rfio tape001 disk001 disk001 Tim Smith: LCW in FNAL
Scheduled Cluster Tim Smith: LCW in FNAL
Management Techniques I • KickStart & JumpStart (Linux & Solaris) • System installation • ANIS • installation automation (bootp, dhcp, tftp, nfs) • SUE • System post installation and configuration • ASIS • Application installation (3 GB local) Tim Smith: LCW in FNAL
Management Techniques II • Console Mgmt • PCM (DEC PolyConsole Manager) • Console Concentrators • Cross Wiring serial ports • Etherlite, VACM • Power Mgmt • NONE • Monitoring • SURE, perfmon, remperf, … Tim Smith: LCW in FNAL