Advanced Fabric Management

Advanced Fabric Management Bill Tomlin for CERN IT/FIO GRIDPP 10th Collaboration Meeting June 2004

Managing a large installation • ~2800 nodes in the CERN CC • Approaching 10,000 by 2008 • Frequent mass installs, moves, retirements • Daily failures of hardware • Heterogeneous H/W & S/W • Multiple functionality (batch, disk, tape, DB, web etc.) • Planning required • Data challenges, test-beds, capacity • Not easy to meet needs: • Find things • Know what’s happening • Maximize availability • Resource CC operations

Fabric Management in a nutshell quattor Automatic configuration Automatic installation LEAF SMS High-level control Effectively Managed Fabric HMS Workflow tools Visualization tools Lemon Managed hardware Effective monitoring

= + +  Extremely Large Fabric management system

H T T P RDBMS S Q L S O A P pan Cache XML CCM Quattor: configuration, installation and management GUI CDB CLI Scripts Node Management Agents Node

CDB interfaces…

LEAF – LHC Era Automated Fabric • SMS: State Management System • Issue high level configuration commands • Nodes automatically take themselves into and out of production • Used during software interventions e.g. kernel upgrade for a cluster • Used during hardware interventions e.g. move a rack of machines • Validates state transitions • Keep history – who, when, why • Handles concurrent requests

LEAF – LHC Era Automated Fabric • HMS: Hardware Management System • Result of process reengineering • Provides consistent, traceable workflows • Manages: • Installs • Moves • Renames • Retirements • Repairs • Implemented using Remedy • Web interface available • Allows visualization & searching for objects

Node Use Case: Move rack of machines 1. Import HMS 6. Shutdown work order 10. Install work order 7. Request move Sysadmins Operations 2. Set to standby 11. Set to production 8. Update SMS 9. Update LAN DB 3. Update 12. Update CDB 5. Take out of production 4. Refresh 14. Put into production 13. Refresh

LEAF screenshots

LEAF Status • HMS • In production since late 2002 (installs only) • Rapid evolution – 16 production releases last year • Used successfully to move & install 100’s machines • Fully integrated (LAN DB, CDB, SMS, other workflow apps) • SMS • First production release January (stable CDB) • Now for all quattor-managed nodes (>2000) • All batch and interactive nodes change state automatically

LEAF Next Steps • Consolidate • Evolve smoother processes • Documentation • Populating data (warranties etc.) • Phase-out legacy components • Extend HMS to other equipment types, individual components • Extend SMS for more clusters, states (like shutdown) • Visualization tool to: • Get/set properties and states • Initialize workflows

Advanced Fabric Management

Advanced Fabric Management

Presentation Transcript

Advanced Wound Management

ADVANCED MANAGEMENT ACCOUNTING

Advanced Network Management

Fabric Knitter Management System

Advanced Memory Management

Advanced Airway Management

Advanced Case Management

Advanced Strategic Management

Fabric and Service Management

Advanced Scene Management

ADVANCED MANAGEMENT ACCOUNTING

Fabric Management

What is Fabric Management?

Advanced Cost Management

Advanced Event Management

Advanced Thermal Management

Fabric Interfaces Management Architecture

Monitoring and Fabric Management

VERITAS CommandCentral Fabric Management Overview

WP4 Fabric Management

Advanced Network Management