Guide to Building and Operating a Large Cluster CHEP 2003 24 th March 2003

Guide to Building and Operating a Large ClusterCHEP 200324th March 2003 Alan Silverman CERN/IT/PS

Outline • Background and Goals • The Large Cluster Workshop – input for the Guide • Cluster Guide Outline • Some of the questions to be asked • Future plans and some references Alan Silverman

Background • HEPiX created a Large System SIG to discuss technology specific to large clusters • Large Cluster Workshops held at FNAL in May 2001 and October 2002, co-sponsored by CERN and FNAL • Understand what is involved in building large clusters • In background reading on Grid technologies, we found many papers and USENIX-type talks on cluster techniques, methods and tools. • But often with results and conclusions based on small numbers of nodes. • What is the “real world” doing? • Gathering practical experience was the primary goal Alan Silverman

Goals • Understand what exists and what might scale to large clusters (1000-5000 nodes and up). • And by implication, predict what might not scale • Produce the definitive guide to building and running a cluster - how to choose, acquire, test and operate the hardware; software installation and upgrade tools; performance mgmt, logging, accounting, alarms, security, etc, etc • Maintain this. Alan Silverman

The Workshop Attendees • Participation was targeted at sites with a minimum cluster size (at least 100-200 nodes) • Invitations were sent, not only to HEP sites but to other sciences, including biophysics. We also invited participation by technical representatives from commercial firms (sales people refused!) • The second workshop was more open, mostly HEPiX attendees • 60 attended the first workshop, 90+ attended the second. Alan Silverman

Cluster Builders Guide • A framework covering all (we hope) aspects of designing, configuring, acquiring, building, installing, administering, monitoring, upgrading a cluster. • Not the only way to do it but it should make cluster owners think of the correct questions to ask and hopefully where to start looking for answers. • Section headings to be filled in as we gain experience. Alan Silverman

1. Cluster Design Considerations • 1.1 What are characteristics of the computational problems ? • 1.1.1 Is there a “natural” unit of work ? • 1.1.1.1 Executable size • 1.1.1.2 Input data size • ………. • 1.2 What are characteristics of the budget available ? • 1.2.1 What initial investment is available ? • 1.2.2 What is the annual budget available ? • ………… • ……. • 5. Operations • 5.1 Usage • 5.2 Management • 5.2.1 Installation • 5.2.2 Testing • ……….

We will now go through some of the topics, highlighting typical questions which should be asked

Acquisition Procedures - the Options • FNAL and CERN have formal tender cycles with technical evaluations. But FNAL can select the bidders, CERN must invite bids Europe-wide and the lowest-price, technically-valid bid wins. • Also, FNAL qualifies N suppliers for 18-24 months while CERN rebids each major order, lowest bid wins. Variety is the spice of life? • KEK funding agency demands long-term leases. The switch to PCs was delayed by in-place leases with RISC vendors Alan Silverman

Configuration Management • Identify useful tools (for example, VA Linux’s VACM and SystemImager, Chiba City tools) in use and some, strangely, not much used in HENP (eg. cfengine). UNIX gurus can help choose these. • Tool sharing not so common – historical constraints, different local environment, less of an intellectual challenge • Almost no prior modelling - previous experience much more the prime planning “method” Alan Silverman

Installation, Upgrading • Again, many tools available, for example, the NPACI ROCKS toolkit at San Diego – uses vendor tools and stores everything in packages; if you doubt the validity of the configuration, just re-install the node • Another option is LCFG as initially used by the European DataGrid WP4 but they are currently building a new tool to overcome some perceived deficiencies of LCFG • Burn-in tests are rare but look at CTCS/Cerberus from VA Linux (handle with care!) Alan Silverman

Monitoring • Bought-in tools in this area for our scales of cluster are expensive and a lot of work to implement but one must not forget the ongoing support costs of in-house developments • Small sites typically write their own tools but use vendors’ tools where possible (e.g. for the AFS and LSF services). The mon tool is also popular. • Large sites sometimes create projects (FNAL’s NGOP) when they find no tool sufficiently flexible, scalable or affordable and they think they have enough resources. Alan Silverman

Data Access • Future direction heavily related to Grid activities • All tools must be freely available • Network bandwidth and error rates/recovery can be the bottleneck not access to the discs • “A single active physics collaborator can generate up to 20 TB of data per year” (Kors Bos, NIKHEF) Alan Silverman

CPU, Resource Allocation • 30% of the audience at the first workshop used LSF, 30% used PBS, 20% used Condor • FNAL developed FBS and then FBSng • CCIN2P3 developed BQS • The usual trade-off – resources needed to develop one’s own tool or adapt public domain tools balanced against cost of a commercial tool and less flexibility with regard to features Alan Silverman

Security • Some sites are adopting formal Kerberos-based security schemes. You need a Kerberos account or a security card to log to FNAL now. • Elsewhere the usual procedures are in place – CRACK password checking, firewalls, local security response teams, etc • Many sites, especially those seriously hacked, forbid access from offsite with clear text passwords • Smart cards and certificates are starting to be used more and more Alan Silverman

Load Balancing • For distributed application sharing, use remote file sharing or perform local node re-synchronisation? • Link applications to libraries dynamically (the users usual preference) or statically (normally the sys admin’s choice)? • Frequent use of a cluster alias and DNS for load balancing; some quite clever algorithms in use • Delegate queue management to users – peer pressure works much better on abusers Alan Silverman

and so on and so on ……

Future Plans and Meetings • Complete more of the Guide from notes taken at the workshops. • Release versions to get feedback from real experts • Large Cluster Workshops will be held alongside all or most HEPiX meetings but the format of each may vary. • After the Spring 2003 HEPiX in NIKHEF (Amsterdam), we will have 1.5 days of interactive workshops on installation, configuration management and monitoring – 22-23 May. • Next one will be late October in TRIUMF, Vancouver. • To get on the Large System mailing list, contact me at (alan.silverman@cern.ch) Alan Silverman

References Most of the overheads presented at the workshops can be found on the web site http://conferences.fnal.gov/lccws/ You will also find there the full programmes, and the Proceedings with many links and refs. and some useful cluster links (including many links within the Proceedings). Other useful links for clusters IEEE Cluster Task Force http://www.ieeetfcc.org Top500 Clusters http://clusters.top500.org Alan Silverman

Guide to Building and Operating a Large Cluster CHEP 2003 24 th March 2003

Guide to Building and Operating a Large Cluster CHEP 2003 24 th March 2003

Presentation Transcript

March 13, 2003

March 2003

Isidro González Instituto de F ísica de Cantabria CHEP 2003 La Jolla, 24 March 2003

March 18, 2003

CHEP 2003 @ La Jolla San Diego 24-28/3/2003

Quito, Ecuador July 24 th , 2003

CHEP 2003 General Summary

September 24, 2003

August 24, 2003

CHEP 2003 San Diego, March 24-28 Trends und Highlights

UKNF – 24 th June 2003

June 24, 2003

CHEP 2003 General Summary

MARCH 2003

CHEP 2003 General Summary

RMS Update March 20 th , 2003

March 2003

September 24, 2003