100 likes | 111 Views
Computing cluster at NCG. Introduction Past upgrades Current state of the cluster Problems with cluster Where to find out information about the cluster Conclusion. Introduction. The cluster has appeared at the end of 1999 Persons who started to tune the cluster :
E N D
Computing cluster at NCG • Introduction • Past upgrades • Current state of the cluster • Problems with cluster • Where to find out information about the cluster • Conclusion Andrey Shevel@bnl.gov
Introduction • The cluster has appeared at the end of 1999 • Persons who started to tune the cluster : • Jerome Lauret and Andrey Shevel • Initially there were 33 machines by 500 MHz and 256 MB of main memory, around 1 TB of disk space (all disks were connected to one RAID controller). • Main machine was Digital Alpha server. • About 25 persons were registered first year of operation (2000). Andrey Shevel@bnl.gov
Past upgrades • With the time the disk storage was increased by 5 times • Computing power has been increased by 3 times at least. • Alpha server has been retired and main computer now is Intel based server (ram11). • All file systems are on separate disk controller. • Many other improvements. • All above permitted us to work many years almost without support. I am proud to inform you about this fact. Andrey Shevel@bnl.gov
Currentstate of the cluster Andrey Shevel@bnl.gov
Computing cluster problems • Liquid leaking from upper flour • The batteries in both UPSs were expired. • The UPS procedure for auto shutting down is out of order • No reservation for central machine (this machine was affected several times by water in past years) • Needs to be watched almost every day (power, water, temperature, etc) • No remote access to consoles of the machines • No remote control of electrical power • No policies (rules) how to use the resources on the cluster. Andrey Shevel@bnl.gov
Nearest upgrades • At first we need to move the cluster physically to another place in the same room. - DONE • We need to install all new machines (9 machines). – in progress • Prepare automatic procedure to install the software – in progress • To upgrade the version of SL to follow RACF (BNL). – in progress Andrey Shevel@bnl.gov
Where is info about the cluster • General info about the clusterhttp://ram3.chem.sunysb.edu/ramdata/news.shtml • User mailing archivehttps://ram3.chem.sunysb.edu/ramdata-news • System mailing archivehttps://ram3.chem.sunysb.edu/ramdata-system Andrey Shevel@bnl.gov
The cluster role • I think now role of the cluster is even more than at the beginning (more people are interested how to use cluster). • For those who needs relatively small fraction for computing power the cluster power is enough. For others who need huge computing power on largest remote clusters the local one is good gateway for remote large cluster. Andrey Shevel@bnl.gov
Conclusion • Several steps must be undertaken to improve the situation: • To find one or two volunteers which would watch the cluster; • To find the funding agency where to submit new request for financial support for cluster upgrade. • May be we need to discuss how to use the cluster as the department computing facility. Andrey Shevel@bnl.gov