240 likes | 317 Views
EDG WP4: installation task. LSCCW/HEPiX hands-on, NIKHEF 5/03 German Cancio CERN IT/FIO http://cern.ch/wp4-install. Agenda. Part 1: General architectural overview Components description and current status Part 2: Exercises on software distribution Part 3:
E N D
EDG WP4: installation task LSCCW/HEPiX hands-on, NIKHEF 5/03 German Cancio CERN IT/FIO http://cern.ch/wp4-install
Agenda Part 1: • General architectural overview • Components description and current status Part 2: • Exercises on software distribution Part 3: • Discussion: differences to other solutions (if time permits)
Disclaimer • This is not a repetition of the WP4 LCFGng tutorial given last year at CERN. I will describe the proposed replacement for LCFG, developed by EDG WP4-install. • This is a work in progress. Most of the subsystems presented here are currently under design/development, although some are already been deployed at CERN. • There are less practical exercises than theory slides ;-( • Your feedback is a most welcome source for improvements!
EDG WP4: reminder • WP4 is the ‘fabric management’ work package of the EU DataGrid project. • Objective: • To develop system management tools for enabling the deployment of very large computing fabrics […] with reduced sysadmin and operation costs. • Installation task: solutions for • automated from scratch node installation • node configuration/reconfiguration • software storage, distribution and installation • Configuration task: solutions for • storing, maintaining and retrieving configuration information.
Packages (rpm, pkg) Software Package Mgmt Agent (SPMA) • SPMA manages the installed packages • Runs on Linux (RPM) or Solaris (PKG) • SPMA configuration done via an NCM component • Can use a local cache for pre-fetching packages (simultaneous upgrades of large farms) WP4-install arch SWRep Servers http cache SPMA packages Mgmt API nfs SPMA.cfg (RPM, PKG) ACL’s Automated Installation Infrastructure • DHCP and Kickstart (or JumpStart) are re-generated according to CDB contents • PXE can be set to reboot or reinstall by operator ftp SPMA SPMA NCM Components NCM Node (re)install? Software Repository • Packages (in RPM or PKG format) can be uploaded into multiple Software Repositories • Client access is using HTTP, NFS/AFS or FTP • Management access subject to authentication/authorization Installation server Cdispd PXE CCM PXE handling Mgmt API Registration Notification ACL’s Node Install DHCP Node Configuration Manager (NCM) • Configuration Management on the node is done by NCM Components • Each component is responsible for configuring a service (network, NFS, sendmail, PBS) • Components are notified by the Cdispd whenever there was a change in their configuration DHCP handling KS/JS KS/JS generator Client Nodes CCM CDB
AII (Automated Installation Infrastructure) • Subsystem to automate the node base installation via the network • Layer on top of existing technologies (base system installer, DHCP, PXE) • Modules: • AII-dhcp: • manage DHCP server for network installation information • AII-nbp (network bootstrap program): • manages the PXE configuration for each node (boot from HD/ start the installation via network) • AII-osinstall: • Manage OS configuration files required by the OS installation procedure (KickStart, JumpStart) • More details in AII design document: http://edms.cern.ch/document/374559
AII: current status • Architectural design finished • Detailed Design, implementation progressing • first alpha version expected mid July
Node Configuration Management (NCM) • Client software running on the node which takes care of “implementing” what is in the configuration profile • Modules: • “Components” • Invocation and notification framework • Component support libraries
NCM: Components • “Components” (like SUE “features” or LCFG ‘objects’) are responsible for updating local config files, and notifying services if needed • Components register their interest in configuration entries or subtrees, and get invoked in case of changes • Components do only configure the system • Usually, this implies regenerating and/or updating local config files (eg. /etc/sshd_config) • Use standard system facilities (SysV scripts) for managing services • Components can notify services using SysV scripts when their configuration changes. • Possible to define configuration dependencies between components • Eg. configure network before sendmail
Component example sub Configure { my ($self) = @_; # access configuration information my $config=NVA::Config->new(); my $arch=$config->getValue('/system/architecture’); # NVA API $self->Fail (“not supported") unless ($arch eq ‘i386’); # (re)generate and/or update local config file(s) open (myconfig,’/etc/myconfig’); … # notify affected (SysV) services if required if ($changed) { system(‘/sbin/service myservice reload’); … } }
NCM (contd.) • cdispd (Configuration Dispatch Daemon) • Monitors the config profile, and invokes components via the ncd if there were changes • ncd (Node Configuration Deployer): • framework and front-end for executing components (via cron, cdispd, or manually) • Dependency ordering of components • Component support libraries: • For recurring system mgmt tasks (interfaces to system services, sysinfo), log handling, etc • More details in NCM design document http://edms.cern.ch/document/372643
NCM: Status • Architectural design finished • Detailed (class) design progressing • First version expected mid July • Porting/coding of base configuration components completed mid September • more than 60 components to be ported for having a complete EDG solution (configuring all EDG middleware services)! • Pilot deployment on CERN central interactive/batch facilities expected at the end of the year
SPM (Software Package Mgmt) (I) SWRep (Software Repository): • Client-server toolsuite for the management of software packages • Universal repository: • Extendable to multiple platforms and package formats (RHLinux/RPM, Solaris/PKG,… others like Debian dpkg) • Multiple package versions/releases • Management (“product maintainers”) interface: • ACL based mechanism to grant/deny modification rights (packages associated to “areas”) • Current implementation using SSH • Client access: via standard protocols • HTTP (scalability), but also AFS/NFS, FTP • Replication: using standard tools (eg. rsync) • Availability, load balancing
SPM (Software Package Mgmt) (II) Software Package Management Agent (SPMA): • Runs on every target node • Multiple repositories can be accessed (eg. division/experiment specific) • Plug-in framework allows for portability • System packager specific transactional interface (RPMT, PKGT) • Can manage either all or a subset of packages on the nodes • Useful for add-on installations, and also for desktops • Configurable policies (partial or full control, mandatory and unwanted packages, conflict resolution…) • Addresses scalability • Packages can be stored ahead in a local cache, avoiding peak loads on software repository servers (simultaneous upgrades of large farms) • HTTP protocol allows to use web proxy hierarchies
SPM (Software Package Mgmt) (III) • SPMA functionality: • Compares the packages currently installed on the local node with the packages listed in the configuration • Computes the necessary install/deinstall/upgrade operations • Invokes the packager (rpmt/pkgt) with the right operation transaction set • The SPM is driven via a local configuration file • For batch/servers: A NCM component generates/maintains this cf file out of CDB information • For desktops: Possible to write a GUI for locally editing the cf file
Software Package Manager (SPM) RPMT • RPMT (RPM transactions) is a small tool on top of the RPM libraries, which allows for multiple simultaneous package operations resolving dependencies (unlike RPM) • Example: ‘upgrade X, deinstall Y, downgrade Z, install T’ and verify/resolve appropriate dependencies • Does use basic RPM library calls, no added intelligence • Ports available for RPM 3 and 4.0.X • Will try to feedback to rpm user community after porting to RPM 4.2 • CERN IT/PS working on equivalent Solaris port (PKGT)
rpmt Mgmt API Mgmt API NCM/GUI http ftp nfs afs Packages Repository B SWRep/SPMA architecture inventory Repository A CDB CLI packages config (RPM, PKG) GUI (HTTP Proxy) Client nodes SPMA.cfg cache SPMA
SPMA & SWRep: current status • First production version available • Being deployed in the CERN Computer Centre (next slide) • Enhanced functionality (package cache management) for mid-October • Solaris port progressing (cf. M. Guijarro’s talk)
SPMA/SWRep deployment @ CERN CC • Started phasing out legacy SW distribution systems (including ASIS) on the central batch/interactive servers (LXPLUS&LXBATCH) • Using HTTP as package access protocol (scalability) • > 400 nodes currently running it in production • Deployment page: http://cern.ch/wp4-install/CERN/deploy • Server clustering solution • For CDB (XML profiles) and SWRep (RPM’s over HTTP) • Replication done with rsync • Load balancing done with simple DNS round-robin • Currently, 3 servers in production (800 MHz, 500MB RAM, FastEthernet) giving ~ 3*12Mbyte throughput • Future: may include usage of hierarchical web proxys (eg. using squid)