1 / 24

Design & Management of the JLAB Farms

Design & Management of the JLAB Farms. Ian Bird, Jefferson Lab May 24, 2001 FNAL LCCWS. Overview. JLAB clusters Aims Description Environment Batch software Management Configuration Maintenance Monitoring Performance monitoring Comments. Clusters at JLAB - 1. Farm

bedros
Download Presentation

Design & Management of the JLAB Farms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Design & Management of the JLAB Farms Ian Bird, Jefferson Lab May 24, 2001 FNAL LCCWS

  2. Overview • JLAB clusters • Aims • Description • Environment • Batch software • Management • Configuration • Maintenance • Monitoring • Performance monitoring • Comments

  3. Clusters at JLAB - 1 • Farm • Support experiments – reconstruction, analysis • 250 ( 320) Intel Linux CPU ( + 8 Sun Solaris) • 6400  8000 SPECint95 • Goals: • Provide 2 passes of 1st level reconstruction at average incoming data rate (10 MB/s) • (More recently) provide analysis, simulation, and general batch facility • Systems • First phase (1997) was 5 dual Ultra2 + 5 dual IBM 43p • 10 dual Linux (PII 300) acquired in 1998 • Currently 165 dual PII/III (300, 400, 450, 500, 750, 1GHz) • ASUS motherboards, 256 MB, ~40 GB SCSI, IDE, 100 Mbit • First 75 systems towers, 50 2u rackmount, 40 1u (½u?) • Interactive front-ends • Sun E450’s, 4-proc Intel Xeon, (2 each), 2GB RAM, Gb Ethernet

  4. First purchases, 9 duals per 24” rack Last summer, 16 duals (2u) + 500 GB cache (8u) per 19” rack Recently, 5 TB IDE cache disk (5 x 8u) per 19” Intel Linux Farm

  5. Clusters at JLAB - 2 • Lattice QCD cluster(s) • Existing clusters – in collaboration with MIT, at JLAB: • Compaq Alpha • 16 XP1000 (500 MHz 21264), 256 or 512 MB, 100 Mbit • 12 Dual UP2000 (667 MHz 21264), 256 MB, 100 Mbit • All have Myrinet interconnect • Front-end (login) machine has GB Ethernet, 400 GB fileserver for data staging and transfers MIT  JLAB • Anticipated (funded) • 128 cpu (June 2001), Alpha or P4(?) in 1u • 128 cpu (Dec/Jan ?) – identical to 1st 128 • Myrinet

  6. 16 single Alpha 21264, 1999 12 dual Alpha (Linux Networks), 2000 LQCD Clusters

  7. Environment • JLAB has central computing environment (CUE) • NetApp fileservers – NFS & CIFS • Home directories, group (software) areas, etc. • Centrally provided software apps • Available in • General computing environment • Farms and clusters • Managed desktops • Compatibility between all environments – home and group areas available in farm, library compatibility, etc. • Locally written software provides access to farm (and mass storage) from any JLAB system • Campus network backbone is Gigabit Ethernet, with 100 Mbit to physicist desktops, OC-3 to ESnet

  8. DST/Cache File Servers 15 TB – RAID 0 Jefferson Lab Mass Storage and Farm Systems 2001 Tape Servers Farm Cache File Servers 4 x 400GB DB Server Work File Servers 10 TB – RAID 5 From CLAS DAQ From Hall A,C DAQ 100 Mbit/s 1000 Mbit/s Batch and Interactive Farm FCAL SCSI

  9. Batch Software • Farm • Use LSF (v 4.0.1) • Pricing now acceptable • Manage resource allocation with • Job queues • Production (reconstruction, etc) • Low-priority (for simulations), High-priority (short jobs) • Idle (pre-emptable) • User + group allocations (shares) • Make full use of hierarchical shares - allows single undivided cluster to be used efficiently by many groups • E.g.

  10. Batch software - 2 • Users do not use LSF directly, use Java client (jsub), that: • Is available from any machine (does not need LSF) • Provides missing functionality, e.g. • Submit 1000 jobs in 1 command • Fetches files from tape, pre-stages before job queued for execution (don’t block farm with jobs waiting for data), • Ensures efficient retrieval of files from tape - e.g. sort 1000 files by tape and by file no. on tape. • Web interface (via servlet) to monitor job status and progress (as well as host, queue, etc.)

  11. View job status

  12. View host status

  13. Batch software - 3 • LQCD clusters use PBS • JLAB written scheduler • 7 stages – mimic LSF hierarchical behaviour • Users access PBS commands directly • Web interface (portal) – authorization based on certificates • Used to submit jobs between JLAB & MIT clusters

  14. Batch software - 4 • Future • Combine jsub & LQCD portal features to wrap both LSF and PBS • XML-based description language • Provide web-interface toolkit to experiments to enable them to generate jobs based on expt. run data • In context of PPDG

  15. Cluster management • Configuration • Initial configuration • Kickstart, 2 post-install scripts for configuration, sw install (LSF etc), driven by a floppy • Looking at PXE – DHCP (available on newer motherboards) • Avoids need for floppy – just power on • System working (last week) • Software: PXE standard bootprom (www.nilo.org/docs/pxe.html) – talks to DHCP, • bpbatch – pre-boot shell (www.bpbatch.org) - downloads vmlinux, kickstart etc • Alphas configured “by hand + kickstart” • Updates etc. • Autorpm (especially for patches) • New kernels – by hand with scripts • OS upgrades • Rolling upgrades – use queues to manage transition • Missing piece: • Remote, network-accessible console screen access • Have used serial console, KVM switches, monitor on a cart … • Linux Networks Alphas have remote power management – don’t use!

  16. System monitoring • Farm systems • LM78 to monitor temp + fans via /proc • This was our largest failure mode for Pentiums • Mon (www.kernel.org/software/mon) • Used extensively for all our systems – page “on-call” • For batch farm checks mostly – fan, temp, ping • Mprime (prime number search) has checks on memory and arithmetic integrity • Used in initial system burn-in

  17. Monitoring

  18. Performance monitoring • Use variety of mechanisms • Publish weekly tables and graphs based on LSF statistics • Graphs from mrtg/rrd • Network performance, #jobs, utilization, etc

  19. Comments & Issues • Space – very limited • Installing a new STK silo, moved all sys admins out • Now have no admins in same building as machine room • Plans to build a new Computer Center … • Have always been lights-out

  20. Future • Accelerator and experiment upgrades • Expect first data in 2006, full rate 2007 • 100 MB/s data acquisition • 1 – 3 PB/year (1 PB raw, > 1 PB simulated) • Compute clusters: • Level 3 triggers • Reconstruction • Simulation • Analysis – PWA can be parallelized, but needs access to very large reconstructed and simulated datasets • Expansion of LQCD clusters • 10 Tflops by 2005

More Related