Setting the Standard for DR

Setting the Standard for DR John Pollard – 23 March 2006 PAS 77 – Guide to IT Service Continuity Management

PAS 56 Guide to Business Continuity Management Business Continuity Management RISK MANAGEMENT IT DISASTER RECOVERY FACILITIES MANAGEMENT SUPPLY CHAIN MANAGEMENT QUALITY MANAGEMENT HEALTH & SAFETY KNOWLEDGE MANAGEMENT EMERGENCY MANAGEMENT SECURITY CRISIS COMMUNICATIONS & PR * Source: PAS 56:2003 Guide to Business Continuity Management

IT Service Continuity Management … managing an organisation’s ability to continue to provide a pre-determined and agreed level of IT Services to support the minimum business requirements … * Source: ITIL: Best Practice for Service Delivery

Threats • Loss, damage or denial of access to key infrastructure services • Failure or non-performance of third parties • Loss or corruption of key information • Sabotage, extortion or industrial espionage • Infiltration or attack on critical information systems

Scope • Generic framework and guidelines for a continuity programme, including: • Management structure & responsibilities • How to conduct business criticality & risk assessments • How to define and create an IT Service Continuity plan • How to rehearse an IT Service Continuity plan • Solution architectures and design considerations

What is a PAS? * Source: BSI

Status Group formed First draft External review Expected release Edit Revise Contracts / Structure / Content Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 2004 2005 2006

Contributors

ITSC Strategy • Define direction and high-level methods to meet IT service level objectives • Agreed at Board level • Needs to consider 4 stages of major incident • Initial response • Service recovery • Service delivery (following incident) • Normal service resumption • Enable rehearsal of major incident

ITSC Strategy & Plan Business Strategy Threat Analysis Business Criticality IT Service Continuity Strategy IT Architecture IT Service Continuity Plan Rehearsals Costs Processes

Maintaining an ITSC Strategy Monitor

Management Structure Crisis Management Team CMT CMT Business Continuity Management Team BCMT BCMT Incident Management Team IMT IMT

Business Criticality & Risk Assessments • Identify business units & processes • Categorise criticality of processes • Identify IT services supporting the business processes • Categorise criticality of IT services • Review • By location • By business unit

Business Criticality Categories • Critical • Vital to day-to-day operation • Mandatory • Vital to meet statutory requirements • Strategic • Important for implementation of long-term strategy • Tactical • Important for short/medium term objectives

Risk Assessment Process Learn Lessons

ITSC Plan • Part of wider BCM Plan • Model plan should include: • Initial response • Incident assessment • Roles & responsibilities • Procedures • Rehearsing the plan • Maintaining the plan

Recovery Objectives • Recovery Point Objective (RPO) • The point in time to which work is restored. E.g. Start of day • Recovery Time Objective (RTO) • The time required to recover service

Balancing Cost & Recovery Objectives

IT Architecture – Resilience Considerations • Location & distance between sites • Number of sites • Staff access & proximity • Remote access • Dark site vs. manned site • Staff skill levels • Telecoms connectivity and redundant routing • Automation required • Telephony and email • 3rd party / external links

High Level Process Flow

Task Summary Sheet

Rehearsal • A body to control & coordinate • Objectives & success criteria • Rehearsal plan & scripts • Staff briefing • Logs and critique forms • Observers • Post-rehearsal review

Areas to Rehearse • Callout • Walk through reviews • Walk through exercises • Component rehearsals • Integration rehearsals • Relocation rehearsals • Failover rehearsals • Major incident simulations

Architectures

Site Models • Active / Contingency • Cold site • Active / Active • Service runs from both sites • Active / Alternate • Service can run from either site • Active / Backup • Warm standby site • Multi-site and other hybrids

App App Data Resilience Tape/backup Database Application Host Storage Array SAN

Replication Modes • Synchronous • Increased write latency • Typically OK for OLTP • May impact batch processing • Requires greater inter-site bandwidth than other options • Snapshot • Point in time copy • Only valid on completion of transfer • Minimal/no performance impact • Near real-time • Frequent snapshots • Minimal performance impact

Service Continuity Technology People Processes A Holistic Approach Service Continuity is much more than technology

john.pollard@unisys.com

Defining the Standard for DR Part II - Workshop John Pollard – Unisys PAS 77 – Guide to IT Service Continuity Management

Typical Challenges • Tape recovery slow • Manual build is complex • Complex inter-operation between systems • Difficult to define critical and non-critical • Management of failover site • Keeping sites in step • Windows Servers

Synchronous Write Latency Server Transfer time Write 1 ≈ 0. 5 mSec Write 2 ≈ 0.5 mSec Storage Array Storage Array Communication link Latency = 2 * Write Time + Transfer Time For 200 kilometres using Fibre Channel Latency = 2 * 0.5 + 4.0 = 5.0 mSec

Site Synchronisation • Major challenge • Cultural change is needed • Critical to successful operation • DR systems • Build at recovery time • Slow / complex recovery • Maintain ready to use • How to validate changes • Live run • System dependent

Windows Servers • Build DR servers at recovery time • Lengthy recovery process • Prone to errors • Complex – requires higher skill level • Maintain DR servers ready to use • HW does not have to be identical • Complex SW change and configuration management • How to validate releases • Boot servers from storage array • Requires matching HW • SW only installed once • Simplifies SW change and configuration management • Simplifies failover process / improves recovery

Windows Boot from SAN Production Site DR Site Test Server Live Server DR Server Live Data Test Data Live OS Test OS Data OS Storage Array Storage Array

Virtualisation • Reduced investment • Fewer servers dedicated for resilience • Expand/replace if long term outage • Flexibility • Allocate/use servers as required • Potentially reduced capacity • Depending on system and scale of incident • Configuration may not have been proved

Service Management Identify Affected Areas • Service Desk • Incident Management • Problem Management • Configuration Management • Change Management • Release Management • Testing

Operational Assessment • Understand people and process • Gap analysis

Delivery Approach Discover Model Design Implement Manage • Business Objectives • Current Issues or Problems • Existing/Target Infrastructure • Success Criteria • Vision • Existing Systems, Applications & Services • Physical ‘As-Is’ Model • Logical ‘As-Is’ Model • Data profiling • Security assessment • ‘To-Be’ Logical Model • ‘To-Be’ Physical Model • Project plan • Resource schedule • Develop business case • Implement target environment • Migrate and consolidate applications • Application and middleware integration • Define and implement test strategy • Operational assessment & gap analysis • Implement operational & management processes

Workshop • Determine high-level requirements • Determine Business Drivers • Determine Success Criteria • Overview systems and applications • Identify team members, sponsors, etc. • Agree timelines

SERVERS STORAGE NETWORKING Discovery Audit and map: • Hardware • Software • Services

Data Applications Services Group Systems Analysis

Design • Systems architecture • Operational assessment • Test environment • Project plan and resource schedule • Training requirements

Transition to Future State Operational Management Optimised Architecture Service Continuity Application Selection and Development Standards Data Centre Transformation Network Design Storage Design Training Requirements Systems Design Systems Management Migration Plan Test Environment and Strategy

Implementation • Methodology • Call on best practice • Operational management • Cultural change • Keep people informed

john.pollard@unisys.com

Setting the Standard for DR

Setting the Standard for DR

Presentation Transcript

OIE’s standard setting process

Standard Setting

Standard Setting

Setting the Standard for Storage

Standard Setting

Standard Setting: Political Issues

Standard Setting for Professional Certification

STANDARD SETTING/ ORIENTATION

9. Standard setting

Standard setting for clinical assessments

DCAS “Standard Setting”

IPPC Standard Setting Process

ISTQB Certification Setting the Standard for Tester Professionalism

Standard Setting Seminar

Setting the Standard for America’s Working Families

Standard Setting

Setting the Standard

“ Setting the Standard for Case Management Strategies ”

The Financial Accounting Foundation Setting the Standard for Standards Setting

Communication Guidelines for Standard Setting Processes

Standard Setting Political Issues

Standard Setting