1.01k likes | 1.31k Views
Service Operation. Agenda/Learning Objectives. Main goals, objectives & business value of Service Operation Generic concepts & definitions Event Alert Incident Impact, Urgency & Priority Service Request Problem Workaround Known Error Known Error Database. Agenda/Learning Objectives.
E N D
Agenda/Learning Objectives • Main goals, objectives & business value of Service Operation • Generic concepts & definitions • Event • Alert • Incident • Impact, Urgency & Priority • Service Request • Problem • Workaround • Known Error • Known Error Database
Agenda/Learning Objectives • Key Principles & Models • Conflicting balance in Service Operation • IT Services (external) vs. Technology component (internal) • Stability vs. Responsiveness • Quality of service vs. Cost of Service • Reactive vs. Proactive • Processes • Incident Management • Event Management • Request Fulfillment • Problem Management • Access Management
Agenda/Learning Objectives • Functions • Service Desk • Technical Management • IT Operations Management • IT Operations Control • Facilities Management • Application Management
Goal • The goal of Service Operation is to co-ordinate and carry out the activities and processes required to deliver and manage services at agreed levels to business users and customers • Service Operations is also responsible for the ongoing management of the technology that is used to deliver and support services
Primary Goals, Objectives and Benefits Service Operation
Primary Goals& Objectives • Manage and deliver services at agreed levels • Manageand maintain the technology that is used to deliver and support service • Enable Continual Service Improvement through monitoring performance, asses metrics and gather data • Coordinate and execute the processes and activities required to deliver the agreed levels of service to the business
Scope • Service value is modelled in Service Strategy • The cost of service is designed, predicted and validated in Service Design and Service Transition • Measures for optimisation are identified in Continual Service Improvement
Key Principles & Models Service Operation
Achieving Balance • Conflict arises because constant, agreed levels of service need to be delivered in a continually evolving technical and business environment • Getting the balance wrong can mean service too expensive, unable to meet business requirements or unable to respond in good time Focus 1 Focus 2
Internal View vs. External View • Technology Components(Internal) – that underpin the ability to deliver the services • Different teams or departments manage technology thus each should focus on achieving good performance and availability of ‘its’ systems • IT Services(External) – How customers/users experience services • Customer/users don’t worry about the details of what technology is used to manage services. • Only concern is that the services delivered as required and agreed Extreme focus on Internal Extreme focus on External
Stability vs. Responsiveness • Balance between no change (stability) – may ignore changing business requirements and too frequent change (responsive) – may not be able to provide stable services to meet business needs • For example, a Business Unit requires additional IT Services, more capacity and faster response times • To respond to this type of change without impacting other services is a significant challenge. • Many IT organizations are unable to achieve this balance and tend to focus on either the stability of the IT Infrastructure or the ability to respond to changes quickly Extreme focus on Stability Extreme focus on Responsiveness
Quality vs. Cost • Too much focus on quality – deliver more than necessary at higher cost • Too much focus on cost – deliver on or under budget, risk due to sub-standard services • Service Level Requirements (and good SLAs) can be used to deliver service at appropriate cost and avoid “over sizing” • Achieving a balance will ensure delivery of the level of service necessary to meet Business requirements at an optimal cost Extreme focus on Cost Extreme focus on Quality
Reactive vs. Proactive • Reactive (fire-fighting?) – does not act unless prompted by external driver • Proactive – always looking for ways to improve current situation • Continually scan, looking for potentially impacting changes • Seen as positive behavior but can be expensive • Achieve balance between reactive and proactive, requires: • Formal, Integrated problem and Incident Management processes • Ability to prioritize technical faults and demands • Ongoing involvement from Service Level Management in Service Operations Extremely Reactive Extremely Proactive
Service Operation Procesesses Service Operation
Useful Definition • Event • Any detectable or discernable occurrence that has significance for the management of a CI or IT Service • Types of Events include: • Information events – e.g. batch job has finished successfully • Warning events – e.g. a disk drive 90% full • Exception events – e.g. a server is not responding to a poll • Alert • A warning or notice that a threshold has been reached, something has changed, of a failure has occurred • Alerts are often created and controlled by System Management tools • Can be an event which Event Management has interpreted as requiring action, e.g. a threshold on CPU usage has been exceeded
Useful Definition • Service Request • A request from a user for information, or advice, or for a standard change or for access to an IT Service • e.g. reset password, provide standard IT Services for a new user • Service requests are usually handled by the service desk, and do not require an RFC to be submitted • Incident • Unexpected interruption or reduction in quality of an IT service • Failure of a CI that has not yet impacted service is also an Incident
Useful Definition • Problem • A cause of one or more Incidents. • The cause is not usually known at the time the problem record is created • Workaround • A temporary way of overcoming a difficulty and restoring full of limited service (to reduce the impact) • For example by restarting a failed configuration item • Workarounds for problems are documented in known error records. • Workarounds for incidents that do not have associated problem records are documented in the incident record
Useful Definition • Known Error • A problem that has a documented root caused & a workaround • Known errors are created and managed throughout their lifecycle by problem management. • Known Error Database • A database containing all known error records • This database is created and maintained by problem management, and used by both incident and problem management • Part of an organization’sSKMS
Event Management Service Operation
Definition • Event – any detectable or discernable occurrence for the management of a CI or IT Service • An Alert can be an Event which Event Management has interpreted as requiring action, e.g. a threshold CPU usage has been exceeded • Event Management vs. Monitoring • Two areas are very closely related, but slightly different in nature • Event Management works with occurrences that are specifically generated to be monitored • Monitoring is broader, and tracks these occurrences, but it will also actively seek out conditions that do not generate Events
Objectives & Purpose • The ability to detect events, make sense of them, and initiate the appropriate control action is provided by event management. • Event Management provides mechanism for early detection of incidents • In many cases it is possible for the incident to be detected and assigned to the appropriate group for action before any actual service outage occurs • Event management provides a basis for automated operations, thus increasing efficiencies and allowing expensive human resources to be used • Basis for operational monitoring and control and entry point for many service operation activities
Roles • Unnecessary to appoint specific Event Manager • Event Management activities are delegated to the service desk or IT operation management • Technical and application management must ensure that the staff are adequately trained and that they have access to the appropriate tools to enable them to perform these tasks
Incident Management Service Operation
Goal • The primary goal of the Incident Management process is to restore normal service operation as quickly as possible and minimize the adverse impact on business operations, thus ensuring that the best possible levels of service quality and availability are maintained • ‘Normal service operation’ is defined here as service operation within Service Level Agreement (SLA) limits
Objectives & Purpose • To restore normal service operation as quickly as possible • Minimize the impact on business operation • Maintain optimal levels of service quality & availability • To manage the lifecycle of incidents
Scope • Incident Management covers anything (any event or occurrence) that disrupts, or could disrupts a service • Incidents can be generated by : • User notification • Tools (e.g. HP Open view) • Event Notification (imp note, not all events will become incidents as many classes of events are not related to disruptions at all, but are indicators of normal operation, or are simply informational) • Raised by IT Technical staff • Incidents are reported to and managed by the Service Desk
Incident Model • Many incidents are not new (they involve dealing with something that has happened before and may well happen again) • Many organizations will find it helpful to pre-define ‘standard’ incident models – and apply them to appropriate incidents when they occur • An incident model is a way of predefining the steps that should be taken to handle a process in an agreed way • The incident model should include: • The steps to be taken to handle the incident • Responsibilities; who should do what • Timescales and thresholds for completion of the actions • Escalation procedures; who should be contacted and when
Timescales • Timescales must be agreed for all Incident handling stages (these will differ depending upon the priority level of incident) • These will be based on the overall incident response and resolution targets as stated within SLAs • These will themselves be captured as targets within Operational Level Agreements and Contracts • Tools should be used to automate timescales and escalate • Support groups must be informed of defined Timescales
Major Incidents • A separate procedure, with shorter timescales and greater urgency, must be used for ‘major’ incidents • A definition of what constitutes a major incident must be agreed and ideally mapped on to the overall incident prioritization system • Special Major Incident teams may be convened directly under or reporting to the Incident Manager • May run in parallel with Problem Management but service restoration must remain the priority
Metrics • Total numbers of incidents (as a control measure) • Size of current incident backlog • Breakdown of incidents at each stages (e.g. logged, WIP, closed, etc) • Number and percentage of major incidents • Mean elapsed time to achieve incident resolution or circumvention, broken down by impact code • Percentage of incidents handled within agreed response time (incident response time) • Targets may be specified in SLAs, for example, by impact and urgency codes • Average cost per incident • Number of incidents reopened and as a percentage of the total • Number and percentage of incidents incorrectly assigned • Number and percentage of incidents incorrectly categorized • Percentage of incidents closed by the service desk without reference to other levels of support • Number and percentage of incidents processed per service desk agent • Number and percentage of incidents resolved remotely, without the need for a visit • Number of incidents handled by each incident model
Challenges • Having the ability to detect incidents as early as possible • Ensuring all incidents are logged (convincing both users and technical staff) • Availability of information: Problem & Known Errors • Integration into: • Configuration Management : use CMS to determine relationships between CI’s & find history of CIs • SLM: to correctly assess impact and priority • SLM: use defined escalation procedures
Critical Success Factors • A good Service Desk is key to successful Incident Management • Clearly defined targets to work to – as defined in SLAs • Adequate customer-oriented and technically trained support staff with the correct skill levels, at all stages of the process • OLAs and UCs that are capable of influencing and shaping the correct behaviour of all support staff • Effective Problem Management process (reduce the volume of incidents)
Value to the Business • The ability to detect and resolve incidents results in higher availability of the service, which in turn means less downtime to the business • The ability to align IT activity to business priorities • This is because Incident Management includes the capability to identify business priorities and allocate resources as necessary • Incident Management is highly visible to the business, and it is therefore easier to demonstrate its value than most areas in Service Operation • For this reason, Incident Management is often one of the first processes to be implemented in service management projects
Roles • Incident Manager • Drive efficiency & effectiveness • Produce management information • Manage work of incident support staff (1st & 2nd line) • Monitor effectiveness of process & recommend improvement • Develop & maintain Incident Management systems • Manage major incidents • Develop & maintain the process & procedures • First-line Support • Carried out by the service desk function
Roles • Second-line Support • Normally a group with greater, but still general, technical skills than the Service Desk • Handles many of the less complicated incidents • Of benefit to be co-located with the Service Desk as communications and access improved • Third-line Support • Specialist internal and external technical groups • Concentrate on more difficult incidents
Request Fulfillment Service Operation
Purpose, Goal, Objectives • Request fulfillment is the processes of dealing with service request from the users. The objectives of Request Fulfillment process include : • To Provide a channel for users to request and received standard pre-defined, pre-authorised standard services • To provide information to users and customers about the availability of services and the procedure for obtaining them • To source and deliver the components of requested standard services (e.g. licenses and software media) • To assist with general information, complaints or comments
Basic Concepts • Many service requests will be frequently recurring, so a predefined process-flow (Request Model) can be devised to aid consistency and control and safety • This is similar in concept to Incident Models but applied to service requests. • Service requests will usually be satisfied by implementing a standard change • The value of request fulfilment • Provide quick and effective access to standard services which business staff can use to improve their productivity • Request fulfilment effectively reduces the bureaucracy involved in requesting and receiving access to existing or new services