100 likes | 120 Views
Configuration Life-Cycle Management on the TeraGrid. Ti Leggett. Challenges of Managing Computational Resources. Software, hardware, and user needs change rapidly Maintaining uniform resources Handling one-offs Staying current with patches and security updates
E N D
Configuration Life-Cycle Management on the TeraGrid Ti Leggett
Challenges of Managing Computational Resources • Software, hardware, and user needs change rapidly • Maintaining uniform resources • Handling one-offs • Staying current with patches and security updates • Documenting how and what machines run
Managing Configurations • Unattended OS deployment • Jumpstart, Kickstart, Yast • Cluster distributions • OSCAR, ROCKS • Configuration management systems • Cfengine, LCFG, Bcfg2
UC/ANL Cluster Configuration Management • A microcosm of machine classes • Cluster goals are to maximize availability, predictability and reliability • Originally used SystemImager to duplicate similar classes • Switched to Bcfg2 early 2005
Cluster Uniformity • Necessary for the user • Necessary for the administrator • UC/ANL has two compute classes and many management classes running two different OS versions
Security • Performing security patches • Auditing cluster status • Updating machines after extended downtime or maintenance • Aiding intrusion detection
Reusability • Machine failures • Disk failures • Non-disk failures • Machine replication • New machines
Specification as Documentation • Dealing with administrator absences • Using version control • Teaching new administrators • Dealing with already running and working machines
Future Work • Reduce dependency on tape backups • Integrate with tools such as Nagios, Nessus, and iptables • Integration with LDAP