470 likes | 638 Views
Combined Presentations. Business Continuity Program (BCP) & Disaster Recovery Plan (DRP ) Louis Shallal. Aim of Business Continuity Program BCP. To improve the municipal organizational resilience and capacity to respond and recover from a loss of its operational capability.
E N D
Combined Presentations Business Continuity Program (BCP) & Disaster Recovery Plan (DRP) Louis Shallal
Aim of Business Continuity Program BCP • To improve the municipal organizational resilience and capacity to respond and recover from a loss of its operational capability
Operational Resilience Crisis Management Disaster Recovery Business Resumption Plan Testing Issues Assessment Return to Normal Plan Development/Improvement Risk Management Damage Assessment Recovery Review Insurance Legal Supply Chain Assessment Mitigation Service Management Reputation Physical Infrastructure People IT Systems Applications Technical Infrastructure Data Recovery People Location /Resources Procedures People Systems / Infrastructure Location / Supplies Crisis Management Disaster Recovery Business Resumption Event Risk assessment & Mitigation Short-Term Response Long-Term Response Plan Maintenance and Testing What is Operational Resilience? An umbrella term that covers events ranging from recovery through to assessment and on-going prevention.
BCP DR Emergency Response Disaster BCP/DRP in context – • The advance preparations necessary to identify the impact of potential business interruptions; formulate recovery strategies; develop business continuity plans; and administer a training, exercise and maintenance process. • The technology aspects of a business continuity plan. Its focus is the restoration, at an alternate location, of data centre services and computer processing capabilities. • An organisation’s coordinated response to a disaster in an effective and timely manner. The goal is to avoid or minimise injury to personnel and or damage to company assets. • An event, anticipated or unanticipated, that seriously disrupts normal business operations and prevents the company from delivering essential services for a period of time.
Business Continuity vs. Emergency Management • Business Continuity is a process to develop the capacity to effectively respond and recover in an orderly manner to unplanned interruptions that disrupt critical operational functions. (e.g., water damage in the main administrative building preventing access) • Emergency Management is actions taken to manage emergencies in the community including, prevention, mitigation, preparedness, response and recovery. (i.e., a hazardous material spill, tornado or infectious outbreak)
Focus of Business Continuity vs. Emergency Management • Business Continuity is INWARD looking….. Internal to the Organization • Emergency Management is OUTWARD looking … External focus
Business Continuity vs. Emergency Management • Business Continuity in not mandatory for municipalities • ………………just Ontario ministries! • Emergency Management is governed by Legislation.
Ontario Emergency Management Legislation • The Emergency Management and Civil Protection Act requires the Region to create an Emergency Management Program adopted by a By-Law of Council: • The Program must include: • An Emergency Plan based on identified risks and hazards • Annual training programs and exercises • Public education on risks to public safety and emergency preparedness
Ontario Emergency Management Legislation • Ontario Regulation 380/04 under the EMCPA requirements include: • Trained Emergency Management Program Co-ordinator (CEMC) to develop & implement the program • Emergency Management Program Committee • A current by-law adopting the program and plan • A current community risk profile • Designated Emergency Operations Centre with appropriate communications systems • Designated Public Information Officer • It is all about Emergency Training and Public Awareness !
What types of bad stuff… Natural Events Technical Failures Human-Caused
Case Study: Blackout Northeastern North America 2003 • 4:11 pm Aug 14th • System instability • Domino “Failure” • 50 Million affected • Outage of minutes to days • “Perfect” timing • Variable business disruption
York Region Blackout !
Case Study: 1998 Ice Storm • Death and injuries • Loss of livestock • Damage to power grid • Damage to Environment • Damage to Maple Sugar Industry • Loss of Business • High cost of response • Economic Impacts
But…. Other minor emergencies happen! • Knocked hydro and telephone poles.. • Flooding….
Forces Driving Need for BCP • Stakeholder expectations • Public • Political • Regulatory concerns • Legislated requirements • Critical infrastructure • Protection of reputation
Business Continuity Strategies – Phase 2 • Decide on the preferred strategy to develop resumption plans using existing in-house facilities or outsourcing the service • Ensure equipment, services and facilities are in place to allow full, timely implementation of resumption plans • Explore opportunities for risk mitigation to reduce the likelihood of resumption plan activation
IT Business Continuity Plan • The IT BC Plan is to be developed by the IT department in partnership with Emergency Management folks as part of the overall Corporate Business Continuity Plan.
IT Business Continuity Plan • Purpose/Objective : • First, it supports IT commitment to swiftly and effectively bring under control any emergency situation • Second, it leverages the IT DR Plan • Third, it serves as a guide to an effective response in a crisis situation.
IT Business Continuity Plan • Assumption: • This plan assumes that the Worst Case Scenario is defined as follows: • Administration Centre of the municipality is destroyed at 3:00pm on a Tuesday. • 30% of staff located at the Administration Centre are not available to work.
IT Business Continuity Plan • Components of the Plan: • Resource plan – indicating primary and two alternate resources responsible for a function during a crisis along with detailed contact information • Resumption team lead and three coordinators that will be called to action to ensure critical services continue. • Plan identifies a primary recovery location and two alternative recovery locations with detailed maps to each location. • Identify Key Responsibilities and Key Actions for all positions responsible for the various elements of the plan.
IT Business Continuity Plan • Components of the Plan: • Resource Requirements – desktop, phones, and other equipment required to set-up a recovery office. • Vital Recovery Records – location of procedures and system passwords. • External Contact Information – Vendors and Partners • Internal Contact Information – Key Staff in other Departments • Personnel Location Control Form – Who has been contacted and who has not. • Critical Assessment Forms – Assessment of equipment and office damage. • Application List – The applications need to be recovered. • Personnel Notification Procedures
Lessons • Executive Level approval & involvement is Key • Plan an Effective Business Impact Analysis (BIA) • Follow a logical investigative sequence • Identify critical functions • Create plans only for critical functions • Strategy Selection: Develop Resumption Plans • Move from Paper to Capacity… and from theory to practice…. • Test and re-test Systems • Train and re-train People
Lessons: Crisis Communications • Crisis communications plan must dovetail into crisis management plan • Stakeholders must be known • Speak with one voice - consistency • Truth and timeliness are essential • Silence is not golden • Perception becomes reality • AND DON’T FORGET THE SOCIAL MEDIA!
Disaster Recovery Planning (DRP) Strategies for effective and rapid recovery
Presentation Outline • Challenge • Background: DRP principles • Best Practices in DRP • DRP phases • What Results to Achieve
Challenge of Municipalities • Continue to enable business units to provide service to citizens, business, and the community in the case of any unplanned computing services interruption. • It is all about protecting services to citizens, businesses, & community.
Challenge • It is not the delivery of IT services that we need to protect, it is the delivery of municipal services that are highly depended on technology and IT on a 7/24 basis … • example of mission critical apps • APPto serve transportation needs of people with physical disabilities • APPfor delivering of our water and wastewater distribution • APPfor social housing services • APPfor social services • APPfor child care services • APP for our financial and human resource services • APPfor managing EMS Operations • APPfor Health Services
Background- DRP Principles • A “Cold Site”, is essentially a computer room facility which is ready for build out. The site has no hardware and is sometimes called a shell site. Recovery times are measured in weeks. • The term “Warm Site” refers to a computer room is in a ready state, but is not up to date in terms of readiness for immediate failover. Recovery times are measured in several hours to days. • A “Hot Site” is one in which the requisite hardware / software is operating and is active. Typically, hot sites and “HA” sites are designed for mission critical operations / businesses such as financial, health care where down time is not an acceptable option and network failover can be performed in a rapid fashion. Recovery times are measured in minutes. • Recovery time is directly related to the degree of maintenance and scheduled efforts desired / committed by Municipal IT. Cold Site Warm Site Hot Site/HA Weeks Days / Hours Hours / Minutes No HW Quick ship Dated OS & Data Relatively Current OS & Data Real-time OS/ Data & Failover
DRP Principles Recovery Times vs. Relative Cost Cold Site Warm Site Hot Site Relative Cost $ Months Weeks < 1 Week < 1 Day 0 mins Timeframe to Recovery
DRP Principles What drives the “Temp Site” Recovery Strategy? • A formal Business Impact Analysis quantifying loss is required to decide whether a Cold , Warm or a Hot Site is required. • The BIA will recommend the preferred recovery strategy • The municipality must decide what is the right balance between under funding (as in a Cold Site) and over funding (as in Hot Site) DRP
Best Practices in DRP • Common Failings in Disaster Planning • Many organizations underestimate the effort required to do proper Disaster Planning - Big bang approach often fails under it’s own weight • Many organizations put the “Cart before the horse” and get in trouble that way. Conduct a BIA (Business Impact Analysis) first! • Many organizations look at DRP as a “one-time” event. DRP testing must be ongoing, at least annually! • Many organizations fail to see the “Moving Target” of DRP. For example most municipalities undergo a lot of change in staff and resources as well as growth. things will become obsolete if not continuously updated
Best Practices in DRP • Solution is to break it down into phases • Must build on successes and momentum • Most organizations cannot sustain adoption rate • Set expectations: DR is not a “quick fix” – long term commitment to running multiple data centres (or SLA for cloud services)
DRP phases • BIA (Business Impact Analysis) • Should be the first work package conducted • Verifies • Risk and Impact Factors • Customer Service • Productivity • Loss of revenue • Public Image • Financial accountability • Loss of data • Collective bargaining agreement compliance • Employee health and safety compliance • Public Safety • Confirms restore time requirements. • Leads to DRP recovery strategy-Confirms Cold, Warm or Hot Site-
DRP Implementation phases • Phase 1 • This Phase delivers the base infrastructure for Disaster Recovery site (which could be on premises or on the cloud). It included a build-out of the computer room at the site and equipping the room with appropriate hardware and software for recovery in the case of “on Premise” solution. • Cloud Solution (IaaS) will eliminate most of the need identified above • This phase includes a successful test of the “Priority One” applications.
DRP phases • Phase 2 (on premises) • This Phase builds on the success of Phase one. This phase includes adding enough disk space and additional servers to perform a full recovery. It adds a complete copy of the user base and full permissions from a security perspective. This phase culminates in a successful first user test of the site including all mission critical applications. • Phases one and two should built in an isolated environment in order to minimize risk to the production systems. The DRP domain is a logically separate portion of the network. Any recovery is limited to the computers and systems located at the DR site.
Phase3 • Assess how the first two phases are able to meet current DR needs before expanding them on a much larger scale. • The separate DRP domain is integrated into the production network, i.e the DRP domain can be “seen” by all computers in the Municipality. • To incorporate the DRP site into the Production environment a large scale re-architecture of the production environment is required. This MAY include an upgrade to the OS (Windows) core. DRP phases
DRP phases Phase 4 • Phase four is focused on extending the underlying network infrastructure to build-in redundancy into the system. This will make the recovery site available to a wider number of municipal sites.
What results are you after • Complet BIA • Build infrastructure for the right “temp Site.” • Successfully test priority one applications. • Successfully recovered and test all mission critical applications. Shoot for over 95% successes rate. • Relocate DRP equipment to an alternate site or to the cloud (data centre).
Thank youdrshallal@drshallal.com Questions/Comments Are Welcome