350 likes | 559 Views
O.S.C.A.R. Open Source Cluster Applications Resources. Overview. What is O.S.C.A.R.? History Installation Operation Spin-offs Conclusions. History. CCDK (Community Cluster Development Kit) OCG (Open Cluster Group) OSCAR (the Open Source Cluster Application Resource)
E N D
O.S.C.A.R. Open Source Cluster Applications Resources
Overview • What is O.S.C.A.R.? • History • Installation • Operation • Spin-offs • Conclusions
History • CCDK (Community Cluster Development Kit) • OCG (Open Cluster Group) • OSCAR (the Open Source Cluster Application Resource) • IBM, Dell, SGI and Intel working closely together • ORNL – Oak Ridge National Laboratory
First Meeting • Tim Mattson and Stephen Scott • Decided on these: • That the adoption of clusters for mainstream, high-performance computing is inhibited by a lack of well-accepted software stacks that are robust and easy to use by the general user. • That the group embraces the open-source model of software distribution. Anything contributed to the group must be freely distributable, preferably as source code under the Berkeley open-source license. • That the group can accomplish its goals by propagating best-known practices built up through many years of hard work by cluster computing pioneers.
Initial Thoughts • Differing architectures (small, medium, large) • Two paths of progress, R&D and ease of use • Primarily for non-computer-savvy users. • Scientists • Academics • Homogeneous system
Timeline • Initial meeting in 2000 • Beta development started the same year • First distribution, OSCAR 1.0 in 2001 at LinuxWorld Expo in New York City • Today up to OSCAR 5.1 • Heterogeneous system • Far more robust • More user friendly
Installation • Detailed Installation notes • Detailed User guide • Basic idea: • Configure head node (server) • Configure image for client nodes • Configure network • Distribute node images • Manage your own cluster!!
Head Node • Install by running ./install_cluster eth1 script • GUI will auto-launch • Chose desired step in GUI, make sure each step is complete before proceeding onto next one • All the configuration can be done from this system from now on
Subversion is used Default is the OSCAR SVN Can set up custom SVN Allows for up to date installation Allows for controlled rollouts of multiple clusters OPD also has powerful command line functionality (LWP for proxy servers) Download
Select & Configure OSCAR packages • Customize server up to your liking/needs • Some packages can be customized • This step is very crucial, choice of packages can affect performance as well as compatibility
Installation of Server Node • Simply installs packages which were selected • Automatically configures the server node • Now the Head or Server is ready to manage, administer and schedule jobs for it’s client nodes
Build Client Image • Choose name • Specify packages within the package file • Specify distribution • Be wary of automatic reboot if network boot is manually selected as default
Define Clients • This step creates the network structure of the nodes • It’s advisable to assign IP based on physical links • GUI short-comings regarding multiple IP spans • Incorrect setup can lead to an error during node installation
Setup Networking • SIS – System Installation Suite • SystemImager • MAC addresses are scanned for • Must link a MAC to a node • Must select network boot method (rsync, multicast, bt) • Must make sure clients support PXE boot or create boot CDs • Own Kernel can be used if the one supplied with SIS does not work
Client Installation and Test • After the network is properly configured, installation can begin • All nodes are installed and rebooted • Once the system imaging is complete, a test can be run to ensure the cluster is working properly • At this point, the cluster is ready to begin parallel job scheduling
Operation • Admin packages are: • Torque Resource Manager • Maui Scheduler • C3 • pfilter • System Imager Suite • Switcher Environment Manager • OPIUM • Ganglia
Operation • Library packages: • LAM/MPI • OpenMPI • MPICH • PVM
Torque Resource Manager • Server on Head node • “mom” daemon on clients • Handles job submission and execution • Keeps track of cluster resources • Has own scheduler but uses Maui by default • Commands are not intuitive, documentation must be read • From OpenPBS • http://svn.oscar.openclustergroup.org/wiki/oscar:5.1:administration_guide:ch4.1.1_torque_overview
Maui Scheduler • Handles job scheduling • Sophisticated algorithms • Customizable • Much literature on it’s algorithms • Has a commercial gen. of Maui called Moab • Accepted as the unofficial HPC standard for scheduling • http://www.clusterresources.com/pages/resources/documentation.php
C3 - Cluster Command Control • Developed by ORNL • Collection of tools for cluster administration • Commands: • cget, cpush, crm, cpushimage • cexec, cexecs, ckill, cshutdown • cnum, cname, clist • Cluster Configuration Files • http://svn.oscar.openclustergroup.org/wiki/oscar:5.1:administration_guide:ch4.3.1_c3_overview
pfilter • Cluster traffic filter • Default is that client nodes can only send outgoing communications, outside the scope of the cluster • If it is desirable to open up client nodes, pfilter config file must be modified
System Imager Suite • Tool for network Linux installations • Image based, can even chroot into image • Also has database which contains cluster configuration information • Tied in with C3 • Can handle multiple images per cluster • Completely automated once image is created • http://wiki.systemimager.org/index.php/Main_Page
Switcher Environment Manager • Handles “dot” files • Does not limit advanced users • Designed to help non-savvy users • Has guards in place that prevent system destruction • Which MPI to use – per user basis • Operates on two levels: user and system • Modules package is included for advanced users (and used by switcher)
OPIUM • Login is handled by the Head node • Once connection is established, client nodes do not require authentication • Synchronization run by root, at intervals • It stores hash values of the password in .shh folder along with a “salt” • Password changes must be done at the Head node as all changes propagate from there
Ganglia • Distributed Monitoring System • Low overhead per node • XML for data representation • Robust • Used in most cluster and grid solutions • http://ganglia.info/papers/science.pdf
LAM/MPI • LAM - Local Area Multicomputer • LAM initializes the runtime environment on a select number of nodes • MPI 1 and some of MPI 2 • MPICH2 can be used if installed • Two tiered debugging system exists: snapshot and communication log • Daemon based • http://www.lam-mpi.org/
Open MPI • Replacement for LAM/MPI • Same team working on it • LAM/MPI relegated to upkeep only, all new development in Open MPI • Much more robust (OS, schedulers) • Full MPI-2 compliance • Much higher performance • http://www.open-mpi.org/
PVM – Parallel Virtual Machine • Same as LAM/MPI • Can be run outside of the scope of Torque and Maui • Supports Windows nodes as well • Much better portability • Not as robust and powerful as Open MPI • http://www.csm.ornl.gov/pvm/
Spin-offs • HA-OSCAR - http://xcr.cenit.latech.edu/ha-oscar/ • VMware with OSCAR - http://www.vmware.com/vmtn/appliances/directory/341 • SSI-OSCAR - http://ssi-oscar.gforge.inria.fr/ • SSS-OSCAR - http://www.csm.ornl.gov/oscar/sss/
Conclusions • Future Direction • Open MPI • Windows, Mac OS?