240 likes | 354 Views
Design & Management of the JLAB Farms. Ian Bird, Jefferson Lab May 24, 2001 FNAL LCCWS. Overview. JLAB clusters Aims Description Environment Batch software Management Configuration Maintenance Monitoring Performance monitoring Comments. Clusters at JLAB - 1. Farm
E N D
Design & Management of the JLAB Farms Ian Bird, Jefferson Lab May 24, 2001 FNAL LCCWS
Overview • JLAB clusters • Aims • Description • Environment • Batch software • Management • Configuration • Maintenance • Monitoring • Performance monitoring • Comments
Clusters at JLAB - 1 • Farm • Support experiments – reconstruction, analysis • 250 ( 320) Intel Linux CPU ( + 8 Sun Solaris) • 6400 8000 SPECint95 • Goals: • Provide 2 passes of 1st level reconstruction at average incoming data rate (10 MB/s) • (More recently) provide analysis, simulation, and general batch facility • Systems • First phase (1997) was 5 dual Ultra2 + 5 dual IBM 43p • 10 dual Linux (PII 300) acquired in 1998 • Currently 165 dual PII/III (300, 400, 450, 500, 750, 1GHz) • ASUS motherboards, 256 MB, ~40 GB SCSI, IDE, 100 Mbit • First 75 systems towers, 50 2u rackmount, 40 1u (½u?) • Interactive front-ends • Sun E450’s, 4-proc Intel Xeon, (2 each), 2GB RAM, Gb Ethernet
First purchases, 9 duals per 24” rack Last summer, 16 duals (2u) + 500 GB cache (8u) per 19” rack Recently, 5 TB IDE cache disk (5 x 8u) per 19” Intel Linux Farm
Clusters at JLAB - 2 • Lattice QCD cluster(s) • Existing clusters – in collaboration with MIT, at JLAB: • Compaq Alpha • 16 XP1000 (500 MHz 21264), 256 or 512 MB, 100 Mbit • 12 Dual UP2000 (667 MHz 21264), 256 MB, 100 Mbit • All have Myrinet interconnect • Front-end (login) machine has GB Ethernet, 400 GB fileserver for data staging and transfers MIT JLAB • Anticipated (funded) • 128 cpu (June 2001), Alpha or P4(?) in 1u • 128 cpu (Dec/Jan ?) – identical to 1st 128 • Myrinet
16 single Alpha 21264, 1999 12 dual Alpha (Linux Networks), 2000 LQCD Clusters
Environment • JLAB has central computing environment (CUE) • NetApp fileservers – NFS & CIFS • Home directories, group (software) areas, etc. • Centrally provided software apps • Available in • General computing environment • Farms and clusters • Managed desktops • Compatibility between all environments – home and group areas available in farm, library compatibility, etc. • Locally written software provides access to farm (and mass storage) from any JLAB system • Campus network backbone is Gigabit Ethernet, with 100 Mbit to physicist desktops, OC-3 to ESnet
DST/Cache File Servers 15 TB – RAID 0 Jefferson Lab Mass Storage and Farm Systems 2001 Tape Servers Farm Cache File Servers 4 x 400GB DB Server Work File Servers 10 TB – RAID 5 From CLAS DAQ From Hall A,C DAQ 100 Mbit/s 1000 Mbit/s Batch and Interactive Farm FCAL SCSI
Batch Software • Farm • Use LSF (v 4.0.1) • Pricing now acceptable • Manage resource allocation with • Job queues • Production (reconstruction, etc) • Low-priority (for simulations), High-priority (short jobs) • Idle (pre-emptable) • User + group allocations (shares) • Make full use of hierarchical shares - allows single undivided cluster to be used efficiently by many groups • E.g.
Batch software - 2 • Users do not use LSF directly, use Java client (jsub), that: • Is available from any machine (does not need LSF) • Provides missing functionality, e.g. • Submit 1000 jobs in 1 command • Fetches files from tape, pre-stages before job queued for execution (don’t block farm with jobs waiting for data), • Ensures efficient retrieval of files from tape - e.g. sort 1000 files by tape and by file no. on tape. • Web interface (via servlet) to monitor job status and progress (as well as host, queue, etc.)
Batch software - 3 • LQCD clusters use PBS • JLAB written scheduler • 7 stages – mimic LSF hierarchical behaviour • Users access PBS commands directly • Web interface (portal) – authorization based on certificates • Used to submit jobs between JLAB & MIT clusters
Batch software - 4 • Future • Combine jsub & LQCD portal features to wrap both LSF and PBS • XML-based description language • Provide web-interface toolkit to experiments to enable them to generate jobs based on expt. run data • In context of PPDG
Cluster management • Configuration • Initial configuration • Kickstart, 2 post-install scripts for configuration, sw install (LSF etc), driven by a floppy • Looking at PXE – DHCP (available on newer motherboards) • Avoids need for floppy – just power on • System working (last week) • Software: PXE standard bootprom (www.nilo.org/docs/pxe.html) – talks to DHCP, • bpbatch – pre-boot shell (www.bpbatch.org) - downloads vmlinux, kickstart etc • Alphas configured “by hand + kickstart” • Updates etc. • Autorpm (especially for patches) • New kernels – by hand with scripts • OS upgrades • Rolling upgrades – use queues to manage transition • Missing piece: • Remote, network-accessible console screen access • Have used serial console, KVM switches, monitor on a cart … • Linux Networks Alphas have remote power management – don’t use!
System monitoring • Farm systems • LM78 to monitor temp + fans via /proc • This was our largest failure mode for Pentiums • Mon (www.kernel.org/software/mon) • Used extensively for all our systems – page “on-call” • For batch farm checks mostly – fan, temp, ping • Mprime (prime number search) has checks on memory and arithmetic integrity • Used in initial system burn-in
Performance monitoring • Use variety of mechanisms • Publish weekly tables and graphs based on LSF statistics • Graphs from mrtg/rrd • Network performance, #jobs, utilization, etc
Comments & Issues • Space – very limited • Installing a new STK silo, moved all sys admins out • Now have no admins in same building as machine room • Plans to build a new Computer Center … • Have always been lights-out
Future • Accelerator and experiment upgrades • Expect first data in 2006, full rate 2007 • 100 MB/s data acquisition • 1 – 3 PB/year (1 PB raw, > 1 PB simulated) • Compute clusters: • Level 3 triggers • Reconstruction • Simulation • Analysis – PWA can be parallelized, but needs access to very large reconstructed and simulated datasets • Expansion of LQCD clusters • 10 Tflops by 2005