1 / 24

EDG WP4: installation task

EDG WP4: installation task. LSCCW/HEPiX hands-on, NIKHEF 5/03 German Cancio CERN IT/FIO http://cern.ch/wp4-install. Agenda. Part 1: General architectural overview Components description and current status Part 2: Exercises on software distribution Part 3:

Download Presentation

EDG WP4: installation task

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EDG WP4: installation task LSCCW/HEPiX hands-on, NIKHEF 5/03 German Cancio CERN IT/FIO http://cern.ch/wp4-install

  2. Agenda Part 1: • General architectural overview • Components description and current status Part 2: • Exercises on software distribution Part 3: • Discussion: differences to other solutions (if time permits)

  3. Disclaimer • This is not a repetition of the WP4 LCFGng tutorial given last year at CERN. I will describe the proposed replacement for LCFG, developed by EDG WP4-install. • This is a work in progress. Most of the subsystems presented here are currently under design/development, although some are already been deployed at CERN. • There are less practical exercises than theory slides ;-( • Your feedback is a most welcome source for improvements!

  4. EDG WP4: reminder • WP4 is the ‘fabric management’ work package of the EU DataGrid project. • Objective: • To develop system management tools for enabling the deployment of very large computing fabrics […] with reduced sysadmin and operation costs. • Installation task: solutions for • automated from scratch node installation • node configuration/reconfiguration • software storage, distribution and installation • Configuration task: solutions for • storing, maintaining and retrieving configuration information.

  5. Packages (rpm, pkg) Software Package Mgmt Agent (SPMA) • SPMA manages the installed packages • Runs on Linux (RPM) or Solaris (PKG) • SPMA configuration done via an NCM component • Can use a local cache for pre-fetching packages (simultaneous upgrades of large farms) WP4-install arch SWRep Servers http cache SPMA packages Mgmt API nfs SPMA.cfg (RPM, PKG) ACL’s Automated Installation Infrastructure • DHCP and Kickstart (or JumpStart) are re-generated according to CDB contents • PXE can be set to reboot or reinstall by operator ftp SPMA SPMA NCM Components NCM Node (re)install? Software Repository • Packages (in RPM or PKG format) can be uploaded into multiple Software Repositories • Client access is using HTTP, NFS/AFS or FTP • Management access subject to authentication/authorization Installation server Cdispd PXE CCM PXE handling Mgmt API Registration Notification ACL’s Node Install DHCP Node Configuration Manager (NCM) • Configuration Management on the node is done by NCM Components • Each component is responsible for configuring a service (network, NFS, sendmail, PBS) • Components are notified by the Cdispd whenever there was a change in their configuration DHCP handling KS/JS KS/JS generator Client Nodes CCM CDB

  6. Base installation (AII)

  7. AII (Automated Installation Infrastructure) • Subsystem to automate the node base installation via the network • Layer on top of existing technologies (base system installer, DHCP, PXE) • Modules: • AII-dhcp: • manage DHCP server for network installation information • AII-nbp (network bootstrap program): • manages the PXE configuration for each node (boot from HD/ start the installation via network) • AII-osinstall: • Manage OS configuration files required by the OS installation procedure (KickStart, JumpStart) • More details in AII design document: http://edms.cern.ch/document/374559

  8. AII: current status • Architectural design finished • Detailed Design, implementation progressing • first alpha version expected mid July

  9. Node Configuration (NCM)

  10. Node Configuration Management (NCM) • Client software running on the node which takes care of “implementing” what is in the configuration profile • Modules: • “Components” • Invocation and notification framework • Component support libraries

  11. NCM: Components • “Components” (like SUE “features” or LCFG ‘objects’) are responsible for updating local config files, and notifying services if needed • Components register their interest in configuration entries or subtrees, and get invoked in case of changes • Components do only configure the system • Usually, this implies regenerating and/or updating local config files (eg. /etc/sshd_config) • Use standard system facilities (SysV scripts) for managing services • Components can notify services using SysV scripts when their configuration changes. • Possible to define configuration dependencies between components • Eg. configure network before sendmail

  12. Component example sub Configure { my ($self) = @_; # access configuration information my $config=NVA::Config->new(); my $arch=$config->getValue('/system/architecture’); # NVA API $self->Fail (“not supported") unless ($arch eq ‘i386’); # (re)generate and/or update local config file(s) open (myconfig,’/etc/myconfig’); … # notify affected (SysV) services if required if ($changed) { system(‘/sbin/service myservice reload’); … } }

  13. NCM (contd.) • cdispd (Configuration Dispatch Daemon) • Monitors the config profile, and invokes components via the ncd if there were changes • ncd (Node Configuration Deployer): • framework and front-end for executing components (via cron, cdispd, or manually) • Dependency ordering of components • Component support libraries: • For recurring system mgmt tasks (interfaces to system services, sysinfo), log handling, etc • More details in NCM design document http://edms.cern.ch/document/372643

  14. NCM architecture (from design doc.)

  15. NCM: Status • Architectural design finished • Detailed (class) design progressing • First version expected mid July • Porting/coding of base configuration components completed mid September • more than 60 components to be ported for having a complete EDG solution (configuring all EDG middleware services)! • Pilot deployment on CERN central interactive/batch facilities expected at the end of the year

  16. Software Distribution(SWRep and SPMA)

  17. SPM (Software Package Mgmt) (I) SWRep (Software Repository): • Client-server toolsuite for the management of software packages • Universal repository: • Extendable to multiple platforms and package formats (RHLinux/RPM, Solaris/PKG,… others like Debian dpkg) • Multiple package versions/releases • Management (“product maintainers”) interface: • ACL based mechanism to grant/deny modification rights (packages associated to “areas”) • Current implementation using SSH • Client access: via standard protocols • HTTP (scalability), but also AFS/NFS, FTP • Replication: using standard tools (eg. rsync) • Availability, load balancing

  18. SPM (Software Package Mgmt) (II) Software Package Management Agent (SPMA): • Runs on every target node • Multiple repositories can be accessed (eg. division/experiment specific) • Plug-in framework allows for portability • System packager specific transactional interface (RPMT, PKGT) • Can manage either all or a subset of packages on the nodes • Useful for add-on installations, and also for desktops • Configurable policies (partial or full control, mandatory and unwanted packages, conflict resolution…) • Addresses scalability • Packages can be stored ahead in a local cache, avoiding peak loads on software repository servers (simultaneous upgrades of large farms) • HTTP protocol allows to use web proxy hierarchies

  19. SPM (Software Package Mgmt) (III) • SPMA functionality: • Compares the packages currently installed on the local node with the packages listed in the configuration • Computes the necessary install/deinstall/upgrade operations • Invokes the packager (rpmt/pkgt) with the right operation transaction set • The SPM is driven via a local configuration file • For batch/servers: A NCM component generates/maintains this cf file out of CDB information • For desktops: Possible to write a GUI for locally editing the cf file

  20. Software Package Manager (SPM) RPMT • RPMT (RPM transactions) is a small tool on top of the RPM libraries, which allows for multiple simultaneous package operations resolving dependencies (unlike RPM) • Example: ‘upgrade X, deinstall Y, downgrade Z, install T’ and verify/resolve appropriate dependencies • Does use basic RPM library calls, no added intelligence • Ports available for RPM 3 and 4.0.X • Will try to feedback to rpm user community after porting to RPM 4.2 • CERN IT/PS working on equivalent Solaris port (PKGT)

  21. rpmt Mgmt API Mgmt API NCM/GUI http ftp nfs afs Packages Repository B SWRep/SPMA architecture inventory Repository A CDB CLI packages config (RPM, PKG) GUI (HTTP Proxy) Client nodes SPMA.cfg cache SPMA

  22. SPMA & SWRep: current status • First production version available • Being deployed in the CERN Computer Centre (next slide) • Enhanced functionality (package cache management) for mid-October • Solaris port progressing (cf. M. Guijarro’s talk)

  23. SPMA/SWRep deployment @ CERN CC • Started phasing out legacy SW distribution systems (including ASIS) on the central batch/interactive servers (LXPLUS&LXBATCH) • Using HTTP as package access protocol (scalability) • > 400 nodes currently running it in production • Deployment page: http://cern.ch/wp4-install/CERN/deploy • Server clustering solution • For CDB (XML profiles) and SWRep (RPM’s over HTTP) • Replication done with rsync • Load balancing done with simple DNS round-robin • Currently, 3 servers in production (800 MHz, 500MB RAM, FastEthernet) giving ~ 3*12Mbyte throughput • Future: may include usage of hierarchical web proxys (eg. using squid)

  24. Questions / comments ?

More Related