1 / 12

Advanced Fabric Management

Advanced Fabric Management. Bill Tomlin for CERN IT/FIO GRIDPP 10 th Collaboration Meeting June 2004. Managing a large installation. ~2800 nodes in the CERN CC Approaching 10,000 by 2008 Frequent mass installs, moves, retirements Daily failures of hardware Heterogeneous H/W & S/W

bran
Download Presentation

Advanced Fabric Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advanced Fabric Management Bill Tomlin for CERN IT/FIO GRIDPP 10th Collaboration Meeting June 2004

  2. Managing a large installation • ~2800 nodes in the CERN CC • Approaching 10,000 by 2008 • Frequent mass installs, moves, retirements • Daily failures of hardware • Heterogeneous H/W & S/W • Multiple functionality (batch, disk, tape, DB, web etc.) • Planning required • Data challenges, test-beds, capacity • Not easy to meet needs: • Find things • Know what’s happening • Maximize availability • Resource CC operations

  3. Fabric Management in a nutshell quattor Automatic configuration Automatic installation LEAF SMS High-level control Effectively Managed Fabric HMS Workflow tools Visualization tools Lemon Managed hardware Effective monitoring

  4. = + +  Extremely Large Fabric management system

  5. H T T P RDBMS S Q L S O A P pan Cache XML CCM Quattor: configuration, installation and management GUI CDB CLI Scripts Node Management Agents Node

  6. CDB interfaces…

  7. LEAF – LHC Era Automated Fabric • SMS: State Management System • Issue high level configuration commands • Nodes automatically take themselves into and out of production • Used during software interventions e.g. kernel upgrade for a cluster • Used during hardware interventions e.g. move a rack of machines • Validates state transitions • Keep history – who, when, why • Handles concurrent requests

  8. LEAF – LHC Era Automated Fabric • HMS: Hardware Management System • Result of process reengineering • Provides consistent, traceable workflows • Manages: • Installs • Moves • Renames • Retirements • Repairs • Implemented using Remedy • Web interface available • Allows visualization & searching for objects

  9. Node Use Case: Move rack of machines 1. Import HMS 6. Shutdown work order 10. Install work order 7. Request move Sysadmins Operations 2. Set to standby 11. Set to production 8. Update SMS 9. Update LAN DB 3. Update 12. Update CDB 5. Take out of production 4. Refresh 14. Put into production 13. Refresh

  10. LEAF screenshots

  11. LEAF Status • HMS • In production since late 2002 (installs only) • Rapid evolution – 16 production releases last year • Used successfully to move & install 100’s machines • Fully integrated (LAN DB, CDB, SMS, other workflow apps) • SMS • First production release January (stable CDB) • Now for all quattor-managed nodes (>2000) • All batch and interactive nodes change state automatically

  12. LEAF Next Steps • Consolidate • Evolve smoother processes • Documentation • Populating data (warranties etc.) • Phase-out legacy components • Extend HMS to other equipment types, individual components • Extend SMS for more clusters, states (like shutdown) • Visualization tool to: • Get/set properties and states • Initialize workflows

More Related