520 likes | 658 Views
Business Continuity: Ensuring Survival. Ron LaPedis, CBCP, CISSP Sr. Product Manager, Compaq. Agenda. Continuity planning? I thought it was called disaster recovery… Why? Professional practices Continuity planning model Step by step Horror stories Food for thought.
E N D
Business Continuity: Ensuring Survival Ron LaPedis, CBCP, CISSP Sr. Product Manager, Compaq
Agenda • Continuity planning? I thoughtit was called disaster recovery… • Why? • Professional practices • Continuity planning model • Step by step • Horror stories • Food for thought
Some peoplenever learn… …for 10 minutes…her job was to race through work areas and scoop up appointment books, payroll records and Rolodexes needed to carry on business elsewhere… Many tenants’ main concern was getting payroll checks…phone lists and calendars Source: San Francicso Chronicle 11/30/89 Crane Collapse Closes Buildings (Over 1 month after the Loma Prieta earthquake)
Something happens Disaster event occurs Productivity (Single department or multiple departments) Business process loss Time Source: DRII
Disaster recovery Disaster event occurs 1112.7 Productivity Business process loss Time Source: DRII
Continuity planning Business process loss Disaster event occurs 1112.7 1112.8 Productivity Time Source: DRII
Average cost per hour of downtime (US$) $ 7,840,000 $ 3,160,000 $ 183,000 $ 137,000 $ 109,000 $ 108,000 $ 83,000 $ 34,000 $ 18,000 Industry Financial Financial Media Retail Retail Transportation Entertainment Shipping Financial Application Brokerage operations Credit card sales Pay-per-view Home shopping (TV) Catalog sales Airline reservations Tele-ticket sales Package shipping ATM fees Downtime is lost revenue Source: Contingency Planning Research, 2000
Downtime is not acceptable • Time zones are no longer a barrier for conducting business • If your site is down, your competition is one click away • Utility failure • Communications failure • System failure • Application failure • OS failure And what about system and database maintenance? • Utility upgrade • Communications upgrade • System upgrade • Application upgrade • OS upgrade
Downtime is controllable • System and network architecture • High-availability systems • Redundant network • Hardened primary site • Remote backup site • Continuity planning • Know what you will do before you need to do it
Continuity planning perspective • Ensures that an event doesn’t becomea disaster • Covers a broad spectrum of business and technology issues • The key goal: • Required business process availability
Disaster Recovery Institute International (DRII) Mission DRII’s mission is to provide the leadership and best practices that serve as a base of common knowledge for all business continuity and disaster recovery planners and organizationsin the industry.
DRII’s professional practices Pre-planning 1. Project initiation and management 2. Risk evaluation and control 3. Business impact analysis Planning 4. Developing business continuity strategies 5. Emergency response and operations 6. Developing and implementing business continuity plans Post-planning 7. Awareness and training programs 8. Maintaining and exercising business continuity plans 9. Public relations and crisis communication 10. Coordination with public authorities
DRII’s business continuityplanning model • Project initiation phase • Functional requirements phase • Design and development phase • Implementation phase • Testing and exercise phase • Maintenance and update phase • Execution phase
Start Maintenance and updating Project initiation Required availability times Business continuity process Testing and exercising Functional requirements Procedures Design and development Implementation It’s a process Source: DRII
Project initiation phase • Management commitment and policies • Objectives and requirements • Baseline assumptions • Project management • Teams • Delphi – Business function knowledge • Corporate team – Infrastructure / common activities • EMT – Emergency Management Team ‘the workers’ • CMT – Crisis Management Team ‘the decision makers’
Project initiation phase Project management • CP is a process consisting of programs and projects • It does not take a subject matter expert to manage projects, it takes a project manager • Use your CP experts to perform CP activities, not to manage projects.
Maintenance and updating Project initiation Required availability times You are here Business continuity process Testing and exercising Functional requirements Procedures Design and development Implementation Source: DRII
Functional requirements phase • Fact gathering, alternatives and decisions • Risk analysis and controls • Business impact analysis • RTO – Recovery Time Objective – How fast • RPO – Recovery Point Objective – How much • Alternative strategies • Cost benefit analysis and budgeting
Functional requirements phase Risk analysis Asset inventory and definition Communication and monitoring Vulnerability and threat assessment Decision Evaluation of controls
Functional requirements phase Risk analysis • Quantitative – Facts and figures, hard • Statistical • Actuarial • Annualized Loss Exposure (ALE) • Objective • Qualitative – Not calculable, soft • Reputation • Future market share • Subjective
Functional requirements phase Risk analysis Controls do not reduce the threat, they reduce the exposure (and hence, the risk)
Acceptable downtime C O S T L O S S Maximum cost of control Functional requirements phaseBusiness impact analysis Money Time to recover
Maintenance and updating Project initiation Required availability times Business continuity process Testing and exercising Functional requirements Procedures You are Here Design and development Implementation Source: DRII
Design and development phase • Scope and objectives • Recovery teams • Cookbook • Key disaster scenario • Escalation, notification, and activation
Design and development phaseRecovery teams • Evaluation and declaration • Notification • Emergency response • Interim processing • Salvage • Relocation/reentry
Design and development phaseKey disaster scenario “A fire broke out in the computer room. We are unsure of the state of the computers and data stored there. The building has been shut down by the fire department until they are sure that it is safe to enter. They are estimating that we will not have access to the building for a couple of days”
Design and development phaseEscalation, notification, and activation • Who activates the EMT? • How does the EMT get activated? • Who decides to activate the CMT? • How does the CMT get activated? • How does the CMT decide to activate the plan? • What happens if certain members of the CMTare unavailable?
Maintenance and updating Project initiation Required availability times Business continuity process Testing and exercising Functional requirements Procedures Design and development Implementation You are Here Source: DRII
Implementation phase • Emergency response • Command and control • Designation of authority • Scripts • Vendors and resources
Implementation phaseDesignation of authority • Who is in charge? • If they are not available, who is in charge? • If they are not available, who is in charge? • If they are not available, who is in charge? • Committees cannot be in charge!
Implementation phaseScripts • Step by step listing of activities to be performed every step of the way • In a disaster situation, people do not think rationally • Scripts can be tested, tuned, and tested again • The person who follows a script does not need to be the person who developed the script • Automate as much as possible • One company has 800 automated scripts just for recovering their database!
Implementation phaseVendors and resources • Hot site, warm site, cold site, off-site records storage • Equipment replacement • Rent-a-guard • Salvage experts • Catering • Hotel rooms, rental cars • Local authorities • Police, fire, hospitals, hazmat teams
Maintenance and updating Project initiation Required availability times Business continuity process Testing and exercising Functional requirements Procedures You are Here Design and development Implementation Source: DRII
Testing and exercise phase • Training and awareness • Exercise program objectives • Exercise plans, scenarios and exercises • Evaluation and modification
Testing and exercise phaseExercise program objectives • Practice makes perfect – Some companies spend hundreds of hours tweaking parts of their plans to decrease recovery time Every second counts
Testing and exercise phaseEvaluation and modification • What went wrong and how do we fix it fornext time? • Do not find someone to blame. A fault found now could save your company later • Were any of our assumptions wrong? • Do we need to revisit a previous phase?
Maintenance and updating Project initiation Required availability times You are Here Business continuity process Testing and exercising Functional requirements Procedures Design and development Implementation Source: DRII
Maintenance and update phase • Remember to budget for this phase. An untested, stale plan is worse than no plan at all! • Review criteria – still current? • Status, reporting, and audits • Distribution and security • Your plan is a competitive asset
Execution phase • If an event becomes a disaster • Decide • Declare • Notify • Execute
Not just an IT problem • IT can recover computers and applications, not Business Processes • The computers are humming, the applications are loaded… • . . . and no one is around to use them • Like Cheerios are part of a complete breakfast… • IT recovery is part of a complete • contingency plan
Horror stories • Your backup site is in Atlantic city. You declare during the Miss America pageant (Hurricane Andrew) • Your computer room is in the basement and there’s a fire in the building (Bell Canada) • Will the generators be safe? Do you have a way to refuel them? (Tropical storm Allison)
Horror stories • You power up the generators and nothing happens • You power up the generators and the power surge blows out your systems • You power up the generators and realize that your air conditioning isn’t on backup power Hint: Exercise your plan!
Food for thoughtTapes • Where is your tape backup hardware? • Where are tapes stored until they go offsite? • How quickly do your tapes go offsite? • Are multiple tape copies sent via different routes? • Do you do tape retrieval / restore tests? • For recovery, do you ship tapes in ‘waves?’
Food for thoughtReplicated enterprise storage • Vendors guarantee disk integrity • Backup disk = primary disk at a bit level • Database integrity is not guaranteed • Your database software needs to recover the database to a consistent state before you can begin processing on the backup system
Database disk cache flushed infrequently for performance Not flushed to disk but transaction committed and log flushed D1 D1 D1 D2 D2 D2 T1B T2B T3B D1 D2 D1 D2 D1 T3C D2 T1C Site Failure Audit disk cache flushed at transaction commit for safety D1 D1 D2 D2 On disk, but not committed T1B T2B T3B D1 D2 D1 D2 D1 T3C D2 T1C Physical disk does not equal logical database Disk 1 Disk 2 AuditLogDisk Source system Target system Disk 1 = disk cache flush Disk 2 AuditLogDisk
Food for thought • Check your third party site contract • How many other companies in the same threat area use the same vendor? • How soon do you have to vacate? Where will you go? • Have you included workstations and spacefor them?
Remember that building? • One year later, the tornado-scarred Bank One tower in Ft. Worth Texas is still closed. 2001/02/10 2000/03/30