180 likes | 351 Views
ITIL and Grid services at GridKa CHEP 2009, 21 - 27 March, Prague Tobias König , Dr. Holger Marten. Steinbuch Centre for Computing (SCC). Introduction. Karlsruhe Institute of Technology: The cooperation of the Forschungszentrum Karlsruhe GmbH (FZK) and the University of Karlsruhe (TH).
E N D
ITIL and Grid services at GridKaCHEP 2009, 21 - 27 March, PragueTobias König, Dr. Holger Marten Steinbuch Centre for Computing (SCC)
Introduction Karlsruhe Institute of Technology: The cooperation of the Forschungszentrum Karlsruhe GmbH (FZK) and the University of Karlsruhe (TH) • Main aim of GridKa is offering sustainable Grid services • The availability and reliability of IT Services directly affects • The users‘ satisfaction • The reputation of the Computing Centre SCC and also not to forget the economical aspects • Thus it is important to implement processes and tools that increase the availability and reliability of the IT services. The Information Technology Infrastructure Library (ITIL) is a process-orientated framework for the management of IT processes Steinbuch Centre for Computing: The Computing Centre of the KIT GridKa: The German Tier-1 centre hosted by the SCC | Tobias König | CHEP 2009 | 21 - 27 March Prague
General information • The SCC puts emphasis on ITIL v2 Service Support processes • Configuration Management • Incident Management and Service Desk • Problem Management • Change Management • Release Management Only these ITIL processes are currently implemented at the SCC. These processes are the most relevant and most important processes of ITIL for the SCC • The other ITIL Service Delivery processes like • Service Level Management • Availability Management • Capacity Management • Continuity Management • Financial Management are not ITIL standardized and implemented at the SCC at the moment and in consequence they are not part of this talk | Tobias König | CHEP 2009 | 21 - 27 March Prague
Structure of this talk ITIL Processes at GridKa Configuration Management Incident Management Problem Management Change Management Respon-sibility (internalroles and groups) Externalview Internalview 4 | Tobias König | CHEP 2009 | 21 - 27 March Prague
External/Internal Configuration Management ITIL Processes at GridKa Configuration Management Incident Management Problem Management Change Management Respon-sibility (internalroles and groups) Externalview Internalview 5 | Tobias König | CHEP 2009 | 21 - 27 March Prague
External/Internal Configuration Management • The Configuration Management Database (CMDB) is the basis of all ITIL processes • Following items are stored within the CMDB: • All the Configuration Items CIs (HW, SW, Racks) • The CIs are related to the service components • The IT services themselves are compositions of the service components. • Etc. Configuration Management Respon-sibility (internalroles and groups) Externalview Services Internalview Service components CIs | Tobias König | CHEP 2009 | 21 - 27 March Prague
Incident Management ITIL Processes at GridKa Configuration Management Incident Management Problem Management Change Management Respon-sibility (internalroles and groups) Externalview Internalview | Tobias König | CHEP 2009 | 21 - 27 March Prague
Incident Management • Prime service hours:Tickets are directly assigned from the 1st level support to the 2nd level support, the technical experts. The experts start immediately to solve the incident • On-call service hours:An On-Call-Engineer (OCE) starts immediately to solve the incident • Documentation of solution Incident Management Problem Management Respon-sibility (internalroles and groups) Help Desk 1st Level Support Experts 2nd Level Support 5*12 Experts /7*24 On-Call Engineers Externalview Internalview | Tobias König | CHEP 2009 | 21 - 27 March Prague
Problem Management ITIL Processes at GridKa Configuration Management Incident Management Problem Management Change Management Respon-sibility (internalroles and groups) Externalview Internalview | Tobias König | CHEP 2009 | 21 - 27 March Prague
Problem Management • Prime service hours: Meeting of the task force (OCEs and Experts) • On-call service hours:The OCE can call other experts the next morning or another OCE during the whole night • Problem Management is the detection of the under-lying causes of an incident and their subsequent resolution and prevention. Incident Management Problem Management Respon-sibility (internalroles and groups) Experts 2nd Level Support 7*12 Experts /7*24 On-Call Engineers Task Force: Problematictickets, weeklyreports, Availability, 7*24 Operations hand over Externalview Internalview | Tobias König | CHEP 2009 | 21 - 27 March Prague
External/Internal MonitoringTools to support the Incident and Problem Management ITIL Processes at GridKa Configuration Management Incident Management Problem Management Change Management Respon-sibility (internalroles and groups) Externalview Internalview | Tobias König | CHEP 2009 | 21 - 27 March Prague
External/Internal MonitoringTools to support the Incident and Problem Management • External: A incident or problem can be recognized from outside by the user or the external monitoring system. These alarms automatically create a ticket • Internal: An incident or problem can be recognized from GridKa’s Nagios system • Planned workflow between both systems Incident Management Problem Management Respon-sibility (internalroles and groups) Externalview ExternalMonitoring / SAM tests External ticket system / GGUS Internalview Internal ticket system / Remedy Internal Monitoring / Nagios, Ganglia 7*24 Knowledgebase / Wiki and Remedy | Tobias König | CHEP 2009 | 21 - 27 March Prague
Change Management ITIL Processes at GridKa Configuration Management Incident Management Problem Management Change Management Respon-sibility (internalroles and groups) Externalview Internalview | Tobias König | CHEP 2009 | 21 - 27 March Prague
Change Management • For changes of the IT infrastructure a Request for Change (RfC) is created in the CMDB • There are 3 kinds of RfCs: • Emergency RfC • Standard RfC • Planned RfC • GridKa’s downtimes are announced externally via EGEE Broadcasts and via the Change Calendar at http://www.gridka.de/monitoring Change Management Respon-sibility (internalroles and groups) Externalview Change calendar for Maintenances Internalview Emergency RfC Standard RfC Planned RfC CMDB (RfCs) | Tobias König | CHEP 2009 | 21 - 27 March Prague
Special management roles ITIL Processes at GridKa Configuration Management Incident Management Problem Management Change Management Respon-sibility (internalroles and groups) Externalview Internalview | Tobias König | CHEP 2009 | 21 - 27 March Prague
Special management roles • The Configuration Manager • Responsible that all CIs are stored in the CMDB and that the DBs are up to date • The Change Manager • Responsible for the change process and the formal correctness of the RfCs • Organizes the Change Advisory Board (CAB) • Special role: Contact person to the IT Service management department Configuration Management Change Management Respon-sibility (internalroles and groups) Change Manager Configuration Manager CAB Change Advisory Board GridKa Contactperson to Service Management Supportingexperts in filingRfCs Collectingdata for CMDB Externalview Internalview | Tobias König | CHEP 2009 | 21 - 27 March Prague
Thankyouforyourattention! Steinbuch Centre for Computing (SCC)
Discussion ITIL Processes at GridKa Configuration Management Incident Management Problem Management Change Management Respon-sibility (internalroles and groups) Help Desk 1st Level Support Configuration Manager Change Manager Experts 2nd Level Support 5*12 Experts /7*24 On-Call Engineers Task Force: Problematictickets, weeklyreports, Availability, 7*24 Operations hand over CAB Change Advisory Board GridKa Contactperson to Service Management Supportingexperts in filingRfCs organizingTask Force and 7*24 operations Collectingdata for CMDB Externalview ExternalMonitoring / SAM tests Change calendar for Maintenances Services External ticket system / GGUS SLA Workflow between Ticket Systems Internalview Emergency RfC Internal ticket system / Remedy Service components Internal Monitoring / Nagios, Ganglia Standard RfC 7*24 Knowledgebase / Wiki and Remedy Planned RfC CMDB (CIs: Services, Service components, HW, SW, RfCs) | Tobias König | CHEP 2009 | 21 - 27 March Prague