1 / 21

What is Fabric Management?

Fabric Management CCDB2 RTAG April 23 rd 2002 Tony.Cass@ CERN .ch with much help from German Cancio Melia. What is Fabric Management?. Maintaining. Large clusters of servers. In specific desired state. In specific desired state(s). What does this mean/involve?. Maintain Large clusters

keefer
Download Presentation

What is Fabric Management?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fabric ManagementCCDB2 RTAGApril 23rd 2002Tony.Cass@CERN.chwith much help from German Cancio Melia

  2. What is Fabric Management? Maintaining Large clusters of servers In specific desired state In specific desired state(s)

  3. What does this mean/involve? • Maintain • Large clusters • In desired state

  4. What does this mean/involve? • Maintain • Install • Upgrade • Verify • Large clusters • In desired state

  5. What does this mean/involve? • Maintain • Install: Two options • Image • Pro: All systems identical by construction • Con: Building & storing images • Con: Inflexible; reboot almost always required on change; this is disruptive: imagine impact of urgent security patch to application code or updating routing tables for tierX<->tierY transfers. • “Known Process” • Pro: Flexible; reboots only when essential • Con: guaranteeing reproducibility, especially over time. • Upgrade • Verify • Large clusters • In desired state

  6. What does this mean/involve? • Maintain • Install: Two options • Image • Early approach: no standard installation procedures: easy to build image then replicate, very hard to define “known process” except on paper. • “Known Process” • Standardised s/w installation systems, e.g. RPM, bring known process fabric management comes to the fore---define which packages to install, then the installation tool handles the rest, including dependency issues. • Upgrade • Verify • Large clusters • In desired state

  7. What does this mean/involve? • Maintain • Install • Upgrade • Clearly follows from choice of installation mechanism. • For image systems, upgrade is essentially installation of the new image • For known process systems, software package management and/or configuration systems adjust node to match change in desired state. • Verify • Large clusters • In desired state

  8. What does this mean/involve? • Maintain • Install • Upgrade • Verify • As we’ve seen, verification that software is as desired is essential in known process systems: Did we get what we wanted? • But also, “do we still have what we want”? And this is equally needed for image installs: has anything changed, especially wrt security. • Software monitoring systems should be well integrated with the overall system monitoring • Raise alarms for exceptions and ensure they are followed just as for file system full errors. • Large clusters • In desired state

  9. What does this mean/involve? • Maintain • Large clusters • Many boxes, so need to worry about • System errors & failures (what if system out for repair during upgrade?) • Mundane box related issues: arrivals, departures, repairs • Workflow for system upgrades (drain, upgrade, restart, …) • … • Most site dependent part of fabric management • In desired state

  10. What does this mean/involve? • Maintain • Large clusters • In desired state • Need a way to • specify • update • recover • the desired state for each system. • This is fairly easy (well, apart from recover…); you just need a database associating some key (host name, MAC address) with the software packages & required configuration.

  11. What does this mean/involve? • Maintain • Large clusters • In desired state(s) • The ease of specification of multiple states is the harder and more important part • define characteristics for clusters, not systems • host configuration defined by cluster membership, but should be able to override any aspect • inheritance especially useful • many system configuration details (ntp, name servers, …) are independent of system function; define these once and propagate to all clusters • allow similar clusters to share definition of the common configuration definition---avoid potential for drift if only one cluster definition is updated.

  12. Standards Interlude • There are none. • Software installation tools exist for many platforms and distributions but all differ • Still, a good Fabric Management system should have a high level interface allowing free choice at this level • e.g. quattor: interfaced to both RH & Solaris installation tools • No widely acknowledged standards for defining system configuration. • Choices in this area generally define the different fabric management suites • “rules based” systems (cfengine) • “configuration language” systems (LCFG(ng), quattor) • There is work in this area, but obvious common standards are still far away. • CIM, HP/IBM work to define web services based standards, DCML

  13. Some Systems • ELFms • Rocks • Cfengine • LCFG(ng) • OSCAR/SIS • Ganglia • MonALISA

  14. Some Systems • ELFms • A complete package with • quattor (aii/spma/ncm) known process installation • Lemon monitoring integrated • Leaf for workflow management of software hardware processes • Rocks • Cfengine • LCFG(ng) • OSCAR/SIS • Ganglia • MonALISA

  15. Some Systems • ELFms • Rocks • RH specific system, kickstart based but reinstalls nodes for configuration changes. • Limited config capabilites • No support for multiple packages versions (either in repository or on a node) • Cfengine • LCFG(ng) • OSCAR/SIS • Ganglia • MonALISA

  16. Some Systems • ELFms • Rocks • Cfengine • A set of tools to administer and configure systems • Rules based approach • state maintained in set of rule files; cfengine tools read these, check the status and update systems accordingly • LCFG(ng) • OSCAR/SIS • Ganglia • MonALISA

  17. Some Systems • ELFms • Rocks • Cfengine • LCFG(ng) • Known process installation and configuration • Key feature is introduction of “language” for description of required system configuration. • this approach adopted and enhanced by EDG/WP4 for quattor • OSCAR/SIS • Ganglia • MonALISA

  18. Some Systems • ELFms • Rocks • Cfengine • LCFG(ng) • OSCAR/SIS • Image based installation (SIS) • Ganglia • MonALISA

  19. Some Systems • ELFms • Rocks • Cfengine • LCFG(ng) • OSCAR/SIS • Ganglia • “a scalable distributed monitoring system for high-performance computing systems” • can monitor many standard parameters for systems • but not integrated with s/w installation systems for verification • MonALISA

  20. Some Systems • ELFms • Rocks • Cfengine • LCFG(ng) • OSCAR/SIS • Ganglia • MonALISA • Distributed monitoring system • Aimed at performance issues, not integration with installation frameworks • Can collect input from other monitoring systems (e.g. Lemon) as well as directly from nodes.

  21. Summary • Fabric Management is concerned with maintaining large clusters in defined states, handling evolution over time. • Installation/Upgrade can be via disk image or a more flexible “known process” • No standards (yet) for definition of system configuration • Installation toolkits mostly differ in approach in this area. • Many monitoring systems, but these are independent developments, mostly concentrating on performance related metrics. • ELFms integrates quattor installation and configuration toolkit with the Lemon monitoring system to provide tight control over node status • and adds a (CERN specific) package to manage software and hardware workflows.

More Related