240 likes | 405 Views
Enabling Self-Management - Proposal for WSDM 2.0 -. 19 Jan., 2005 Fujitsu Limited IBM Corporation. Self-Management Objective. Self-management reduce s the cost and complexity of operating and maintaining a heterogeneous and multi-vendor IT infrastructure Key capabilities Self-configuring
E N D
Enabling Self-Management- Proposal for WSDM 2.0 - 19 Jan., 2005 Fujitsu Limited IBM Corporation
Self-Management Objective • Self-management reduces the cost and complexity of operating and maintaining a heterogeneous and multi-vendor IT infrastructure • Key capabilities • Self-configuring • Self-healing • Self-optimizing • Self-protecting • Self-management must manage all type of resources in a consistent manner • Hardware • Servers, storage, and network • Software • Operating systems, middle ware products, and business applications
Self-management Mechanism • Companies targeting self-management establish similar mechanisms • Those mechanisms could be summarized as: • Basic activity cycle • Monitoring • Analysis and projection • Action • Policy driven, knowledge-based Fujitsu IBM
Standardization in Self-management Mechanism • Monitoring • Monitoring the load, utilization of resources, and the running states of service components, and also detecting faults • Covered by MUWS 1.0 and MOWS 1.0 • Analysis and projection • Performed against current configurations by evaluating and determining compliance with the established policy and SLAs • Analysis is also necessary to predict future resource behavior based on history and projected requirements • Not covered by MUWS 1.0 and MOWS 1.0 • Unexplored and tough area • Lower priority than monitoring and action • Through the standardization of monitoring and action, we can complete “the autonomic loop” in a multi-vendor environment as the first step • Standardization of analysis and projection becomes necessary if we want to share “management knowledge” among autonomic managers as the second step
Standardization in Self-management Mechanism (cont.) • Action • Execution of plans derived from analysis and projection • By interacting directly with managed resources • E.g., adjust priorities, change allocation of processors, or change allocation of tasks among processors within a cluster • By communicating with other managers • E.g., make provisioning requests to add additional processors to the cluster • Not fully covered by MUWS 1.0 and MOWS 1.0 • Standardization is critical for interoperability • Particularly actions on managed resources
Use Cases • Use cases based on our experience • Analysis of common administrative tasks at customers’ sites • Customers’ requests to our existing management products • Job-level service-level attainment • Data center-levelservice-level attainment • Remove a server due to cracking (security violation) • Applying security patches
(#1) Job Level Management • Job initiation • The IT business activity manager submits a job, which must be executed so that it satisfies the specified service level (such as response time, availability, etc,). • At a later stage, changes in the operating environment of the job may occur. • Increased number of transactions from web, etc. • The resource requirements are recalculated in the service level attainment loop. • The provisioning steps (including resource allocation and deployment) are triggered (action) as a result of the changing conditions. • The resources are in a ready state for the required components of the job to start, including starting executable resources such as application server or DBMS.
(#1-a) Server Provisioning • Add a server with consistent reconfiguration of LAN and storage/SAN LAN LAN Network Load Balancer Network switch Server HBA HBA HBA FC switch Storage
Operations to Add a New Server Operation: Independent of resource type
(#1-b) Storage Provisioning • Add a new LUN by adding an HDD to a RAID device in an SAN environment LAN LAN Network Load Balancer Network switch Server HBA HBA HBA HBA HBA FC switch Storage
Operations to Add a Storage LUN Operation: Independent of resource type
(#2) DataCenter Level Management Analysis and Projection Monitoring Action Monitoring Action Analysis Job Level Data center Level Note: ‘Same’ loop operates at different levels. • Objective • Improve resource utilization while maintaining the SLAs of running jobs • Data center level manager • Add new resources to prepare for expected load increases • Release surplus resources in order to reduce costs • Adjust resources allocated to jobs based on policy • Example: the priority of a job relative to other jobs • In the Analysis and Projection phase, information about the available resources and current load, and estimates of the expected future utilization and load are evaluated
Operations for Data Center Level Management • Operations used for this case is basically the same as those used for job level management • Management capabilities aggregating separate use case scenarios to provide higher order capabilities such as: • Workload Balancing • Includes: • Server provisioning • Storage provisioning • Application management and prioritization • Resource de-provisioning as loads reduce • More…
(#3) Remove a server (Self protecting) • Remove a server against cracking (security violation) LAN LAN Network Load Balancer Network switch Server HBA HBA HBA FC switch Storage
Operations to Remove a Server Operation: Independent of resource type • After those operations, the software on the server may be re-installed using software deployment mentioned in case #1
(#4) Apply Security Patches to Servers LAN LAN Network Load Balancer Network switch Server LAN LAN Patch Management Server Install patches New security patches
Operations to Apply Patches Operation: Independent of resource type
WSDM 2.0 Proposal • Management operations (“actions”) can be categorized as: • Resource type independent operations • Power on/off, Get info, etc. • Resource type dependent operations • Create volume, Add port, etc. • Define resource type independent actions • As extension to MUWS 1.0 • Resource type dependent actions are defined in other/existing standards • Consider those standards to cover important IT resource types • Server, Storage and Network
Consistency with Resource Type-Specific Standards • Resource type-specific standards (e.g., SMI-S, SMASH) may define both application interfaces and object models of managed resources • Actions should be consistent with those models and can wrap their interfaces WSDM Consistent Management consumer Actions Agent Agent Agent Resource type-specific standards API API API Computer System Subsystem management Subsystem management Subsystem management
WSDM 2.0 Structure Possibility (Document point of view) MOWS Other/existing specifications No modification Server management (SMASH) Storage Management (SMI-S) Network Management (?) etc. Extension MUWS 2.0 Resource-type independent management actions MUWS 1.0
WSDM 2.0 Structure Possibility(Application point of view) CIM Model / Profile Management Applications using WSDM Resource specific capabilities provided by WSDM layer. Software Lifecycle Mgmt Self Protection Function Provisioning Others… Mgmt Apps WSDM exporting mgmt capabilities Is used by Remote Access Protocols “WS-CIM” Is transported in Is used by CIM Server Infrastructure WS Server Infrastructure „SMWG“ Infrastructure Manageability Access Points Typically local, also remote models CIM Server Native Access Methods Managed Elements
Open Issues • Owners of resource type-specific specification - How to keep consistency with them • Server: DMTF SM-WG (SMASH) • Storage: SNIA (SMI-S) • Defining scope of management activities • Basic functionality: power-on, reset, deploy, configure • May be extended to include snapshot, replicate, … (particularly considering servers) • Actions related to deploy and configuration • GGF CDDLM’s basic service APIs • Deploy actions may be issued not to managed resources (target servers) but managing resources (deployment management servers) • Manageability of components of “composite” resources • Software-managed virtual servers (e.g. VMware, Java VM) • Cascading configuration, e.g. a NAS server and underlying RAIDs
Reference • Foster I., et al: The Open Grid Services Architecture, Version 1.0. GGF OGSA Working Group (OGSA-WG). Draft 19, 2004. • Treadwell, J. (ed.) Open Grid Services Architecture Glossary of Terms. GGF OGSA-WG. Draft 7, 2004.