210 likes | 346 Views
Implementation of Package Management in a Cluster Environment. So, Jung-ki Distributed Computing System LAB School of Computer Science and Engineering Seoul National University. Introduction (1/2). Supercomputer High performance processor / high network bandwidth
E N D
Implementation of Package Management in a Cluster Environment So, Jung-ki Distributed Computing System LAB School of Computer Science and Engineering Seoul National University
Introduction (1/2) • Supercomputer • High performance processor / high network bandwidth • Expensive system but Beowulf system is cost-effective • Motivation • Focus on Cluster system • Cluster Management system • Manual method / add-on method / integrated method • Registry • Central repository of information about all aspects of the computer
Introduction (2/2) • Challenge • Integrated method has low availability and reliability • Can’t manage computation nodes separately • When failure occurs, system can’t be rejuvenated • Goal ( using Registry ) • Improve availability and reliability of integrated method • Administrator can manage a cluster system easily • Restore cluster system with a backup snapshot
60.8% Supercomputer Domestic Supercomputer Quantity : 14 Cluster : 4 MPP : 4 Constellation : 6 ※ SNU : 2 (51/413)
Cluster Management System • Manual approach • System administrator brings up entire system manually • Add-on method • Bring up a frontend node, then add cluster packages • OSCAR / Warewulf / OpenMosix • Integrated method • Cluster packages are installed and configured during the initial installation • Rocks / Scyld
Application HPC SGE OS (Linux) Cluster Management System • Software Stack Parallel code / Grid / computer lab … Message passing / communication Layer Job Scheduling and Launching Cluster software management Cluster State management / Monitoring HPC Device Drivers Linux Environment Linux Kernel
Rocks Overview • Identity • System to build and manage a Linux Cluster • Free : Open source project • Goal • Make clusters easy • Philosophy • Computation nodes are 100% automatically installed • Roll : set of packages • Graph / Kickstart • Run on heterogeneous system architecture • Doesn’t attempt to incrementally update software
Rocks system • Architecture internet Front-end node eth1 eth0 Local Network eth0 eth0 eth0 eth0 node node node node
What is Registry ? • Central repository of info about all aspects of the computer • Hardware, OS, applications, users information • Function • Retrieve system information • Update / add / delete software • Backup & restore system • Advantage • Easier for applications to access system • Storing large amounts of structured data (system info)
Registry Design Aliases Original Relational Schema ID (primary key) Node Name H/W information Appended Relation Network Nodes ID (primary key) Node MAC IP Gateway Name Device Module Package ID (primary key) Name Membership CPUs Rack Rank Comment S/W information ID (primary key) Node Name Version Release Install Appliances Memberships Distribution ID (primary key) Name Graph Node ID (primary key) Name Appliance Distribution ID (primary key) Name Release Lang
Strategy of management • Rocks Setup • Minimum modification • Take advantage of original Rocks system • Deploy cluster system easily • Modify related source codes • insert-ethers, kickstart.cgi, Kpp, Kgen, Rgen • Running System • Apply package modification • Package management program : add / update / delete packages • DB consistency management program
Rgen Appended component Package variables Registry variables Collection Method
Instruction : Add / update / delete add –c=compute-0-0 –i=amanda-2.4.5-2.i386 add –c=all –i=all del -c=compute-0-0 –i=amanda-2.4.5-2.i386 del -c=all -i=all Packages table Package name / version / release Compute Nodes Packages table Add / delete / update Modification Method Insert command
Registry consistency • Setup time • When frontend node removes / updates computation node • Dependency : change node table → change package table • Modify Kickstart.cgi / kgen • Apply cascading tables change ※mysql not support transaction property • Running system • Package install / delete / update • Compute node rpm information = frontend node’s registry DB
Experiment Data name capacity volume amanda 468KB 3 HPC 117MB 53 Rocks roll 1.5GB 479 Experiment Setup Public Ethernet Rocks.snu.ac.kr CPU 800Mhz RAM 768MB HDD 40G Frontend node Compute-0-(1~14) CPU 850Mhz RAM 1G HDD 10G Compute nodes (14)
Network card DHCP request Original Rocks Evaluation average service time : 18min 14sec average transmit time : 11min 28sec
Amanda Packages Evaluation average install time : 6.62 sec Average delete time : 5.57sec
HPC Roll Evaluation average install time : 3min 38sec average delete time : 1min 18sec
Conclusion • Registry takes advantage of cluster system • Improve availability and reliability using Registry • Administrator can manage cluster systems easily • Restore cluster systems with backup snapshots
Q & A Questions or Comments ? Thank you !