180 likes | 206 Views
Learn how GridKa offers sustainable Grid services to improve availability and reliability with ITIL. Explore ITIL processes implemented for Configuration, Incident, Problem, and Change Management, and the role of external monitoring tools.
E N D
ITIL and Grid services at GridKaCHEP 2009, 21 - 27 March, PragueTobias König, Dr. Holger Marten Steinbuch Centre for Computing (SCC)
Introduction Karlsruhe Institute of Technology: The cooperation of the Forschungszentrum Karlsruhe GmbH (FZK) and the University of Karlsruhe (TH) • Main aim of GridKa is offering sustainable Grid services • The availability and reliability of IT Services directly affects • The users‘ satisfaction • The reputation of the Computing Centre SCC and also not to forget the economical aspects • Thus it is important to implement processes and tools that increase the availability and reliability of the IT services. The Information Technology Infrastructure Library (ITIL) is a process-orientated framework for the management of IT processes Steinbuch Centre for Computing: The Computing Centre of the KIT GridKa: The German Tier-1 centre hosted by the SCC | Tobias König | CHEP 2009 | 21 - 27 March Prague
General information • The SCC puts emphasis on ITIL v2 Service Support processes • Configuration Management • Incident Management and Service Desk • Problem Management • Change Management • Release Management Only these ITIL processes are currently implemented at the SCC. These processes are the most relevant and most important processes of ITIL for the SCC • The other ITIL Service Delivery processes like • Service Level Management • Availability Management • Capacity Management • Continuity Management • Financial Management are not ITIL standardized and implemented at the SCC at the moment and in consequence they are not part of this talk | Tobias König | CHEP 2009 | 21 - 27 March Prague
Structure of this talk ITIL Processes at GridKa Configuration Management Incident Management Problem Management Change Management Respon-sibility (internalroles and groups) Externalview Internalview 4 | Tobias König | CHEP 2009 | 21 - 27 March Prague
External/Internal Configuration Management ITIL Processes at GridKa Configuration Management Incident Management Problem Management Change Management Respon-sibility (internalroles and groups) Externalview Internalview 5 | Tobias König | CHEP 2009 | 21 - 27 March Prague
External/Internal Configuration Management • The Configuration Management Database (CMDB) is the basis of all ITIL processes • Following items are stored within the CMDB: • All the Configuration Items CIs (HW, SW, Racks) • The CIs are related to the service components • The IT services themselves are compositions of the service components. • Etc. Configuration Management Respon-sibility (internalroles and groups) Externalview Services Internalview Service components CIs | Tobias König | CHEP 2009 | 21 - 27 March Prague
Incident Management ITIL Processes at GridKa Configuration Management Incident Management Problem Management Change Management Respon-sibility (internalroles and groups) Externalview Internalview | Tobias König | CHEP 2009 | 21 - 27 March Prague
Incident Management • Prime service hours:Tickets are directly assigned from the 1st level support to the 2nd level support, the technical experts. The experts start immediately to solve the incident • On-call service hours:An On-Call-Engineer (OCE) starts immediately to solve the incident • Documentation of solution Incident Management Problem Management Respon-sibility (internalroles and groups) Help Desk 1st Level Support Experts 2nd Level Support 5*12 Experts /7*24 On-Call Engineers Externalview Internalview | Tobias König | CHEP 2009 | 21 - 27 March Prague
Problem Management ITIL Processes at GridKa Configuration Management Incident Management Problem Management Change Management Respon-sibility (internalroles and groups) Externalview Internalview | Tobias König | CHEP 2009 | 21 - 27 March Prague
Problem Management • Prime service hours: Meeting of the task force (OCEs and Experts) • On-call service hours:The OCE can call other experts the next morning or another OCE during the whole night • Problem Management is the detection of the under-lying causes of an incident and their subsequent resolution and prevention. Incident Management Problem Management Respon-sibility (internalroles and groups) Experts 2nd Level Support 7*12 Experts /7*24 On-Call Engineers Task Force: Problematictickets, weeklyreports, Availability, 7*24 Operations hand over Externalview Internalview | Tobias König | CHEP 2009 | 21 - 27 March Prague
External/Internal MonitoringTools to support the Incident and Problem Management ITIL Processes at GridKa Configuration Management Incident Management Problem Management Change Management Respon-sibility (internalroles and groups) Externalview Internalview | Tobias König | CHEP 2009 | 21 - 27 March Prague
External/Internal MonitoringTools to support the Incident and Problem Management • External: A incident or problem can be recognized from outside by the user or the external monitoring system. These alarms automatically create a ticket • Internal: An incident or problem can be recognized from GridKa’s Nagios system • Planned workflow between both systems Incident Management Problem Management Respon-sibility (internalroles and groups) Externalview ExternalMonitoring / SAM tests External ticket system / GGUS Internalview Internal ticket system / Remedy Internal Monitoring / Nagios, Ganglia 7*24 Knowledgebase / Wiki and Remedy | Tobias König | CHEP 2009 | 21 - 27 March Prague
Change Management ITIL Processes at GridKa Configuration Management Incident Management Problem Management Change Management Respon-sibility (internalroles and groups) Externalview Internalview | Tobias König | CHEP 2009 | 21 - 27 March Prague
Change Management • For changes of the IT infrastructure a Request for Change (RfC) is created in the CMDB • There are 3 kinds of RfCs: • Emergency RfC • Standard RfC • Planned RfC • GridKa’s downtimes are announced externally via EGEE Broadcasts and via the Change Calendar at http://www.gridka.de/monitoring Change Management Respon-sibility (internalroles and groups) Externalview Change calendar for Maintenances Internalview Emergency RfC Standard RfC Planned RfC CMDB (RfCs) | Tobias König | CHEP 2009 | 21 - 27 March Prague
Special management roles ITIL Processes at GridKa Configuration Management Incident Management Problem Management Change Management Respon-sibility (internalroles and groups) Externalview Internalview | Tobias König | CHEP 2009 | 21 - 27 March Prague
Special management roles • The Configuration Manager • Responsible that all CIs are stored in the CMDB and that the DBs are up to date • The Change Manager • Responsible for the change process and the formal correctness of the RfCs • Organizes the Change Advisory Board (CAB) • Special role: Contact person to the IT Service management department Configuration Management Change Management Respon-sibility (internalroles and groups) Change Manager Configuration Manager CAB Change Advisory Board GridKa Contactperson to Service Management Supportingexperts in filingRfCs Collectingdata for CMDB Externalview Internalview | Tobias König | CHEP 2009 | 21 - 27 March Prague
Thankyouforyourattention! Steinbuch Centre for Computing (SCC)
Discussion ITIL Processes at GridKa Configuration Management Incident Management Problem Management Change Management Respon-sibility (internalroles and groups) Help Desk 1st Level Support Configuration Manager Change Manager Experts 2nd Level Support 5*12 Experts /7*24 On-Call Engineers Task Force: Problematictickets, weeklyreports, Availability, 7*24 Operations hand over CAB Change Advisory Board GridKa Contactperson to Service Management Supportingexperts in filingRfCs organizingTask Force and 7*24 operations Collectingdata for CMDB Externalview ExternalMonitoring / SAM tests Change calendar for Maintenances Services External ticket system / GGUS SLA Workflow between Ticket Systems Internalview Emergency RfC Internal ticket system / Remedy Service components Internal Monitoring / Nagios, Ganglia Standard RfC 7*24 Knowledgebase / Wiki and Remedy Planned RfC CMDB (CIs: Services, Service components, HW, SW, RfCs) | Tobias König | CHEP 2009 | 21 - 27 March Prague