810 likes | 1.25k Views
TPL Business Continuity Management System (BCMS). How vulnerable, are we?. How vulnerable, are we?. Tonga Trench is 10,800m deep and located 200Kms east of the Kingdom of Tonga Tonga Trench is the fastest (24cm/year) velocity trench in the world. How vulnerable, are we?.
E N D
How vulnerable, are we? • Tonga Trench is 10,800m deep and located 200Kms east of the Kingdom of Tonga • Tonga Trench is the fastest (24cm/year) velocity trench in the world
The History of Business Continuity Mid 2007 11/9/2001 Holistic Contingency Plans DRP: The advance planning and preparations that are necessary to minimize loss and ensure continuity of the critical business functions of an organization in the event of disaster. The technological aspect of business continuity planning. Business Continuity Management System Organization wide Contingency Plans Business Continuity Planning IT or Technical Contingency Plans Disaster Recovery Planning BCP: Process of developing advance arrangements and procedures that enable an organization to respond to an event in such a manner that critical business functions continue with planned levels of interruption or essential change. Fallback Plans , Contingency Plans Alternative Planning / Plan B
What is BCM System? • Business Continuity Management System is: • A living system; managed every day; updated at all times when the situation changes • Holistic • Requires a strategy/policy • End-to-end critical business process restoration (focus is not asset recovery only) • Communication – clients, employees, emergency organisations • Integration with business processes of other Business Unites (BUs) • Address employee safety & wellbeing
Describing BCM Risk • RISK has four key components: • Threats: Fire, earthquake, power failure, loss of key staff etc. • Assets: Human, mission critical systems and infrastructure, suppliers, clients, buildings, information and records • Vulnerabilities: Weaknesses in assets such as single point of failure, inadequacies in fire protection, poor staffing levels, unreliable IT security, inefficient data back up etc. • Impact – Financial or Non financial (e.g. reputational, health & safety impacts) • Risk description (example): • “A significant financial and reputational impacts to the Company as a result of IT Manager is unable to restore the Billing Server which was damaged by a power surge on time.”
Risk Controls ACCIDENT HARM MITIGATION CONTROLS PREVENTION CONTROLS Reduce Consequence Reduce Likelihood
BCM Risks • BCM Plans are mitigation controls (i.e. minimise consequences) rather than prevention controls • BCM Risks are managed through Quantate as part of the TPL’s Risk Management Program
BCM Assumptions Secondary or alternative site is always available to the company to restore business critical processes if the primary site not accessible due to damages sustained by a disaster. Minimum resources (e.g. staff, IT equipment etc.) are still available to restore business critical functions following a disaster Popua power station is unharmed and minor damages to the distribution network assets National communication methods (i.e. TCC or Digicel) are available to communicate with public. BCM plans are not applicable for nationwide disasters (e.g. tsunami) that may have damaged the company’s all primary and secondary sites, generation and distribution assets.
BS25999 Standard • Requires a policy statement • Requires focusing on restoring end-to-end processes (not just equipment or machinery) • Requires RTO/RPO analysis for all critical processes • Requires a risk assessment and mitigation • Requires a hierarchy of plans (IMP, BURP, ERP etc.) • Requires a command Centre and DR sites • Requires a Call Tree • Requires testing, maintaining & auditing of BCM plans periodically
Overview of BS 25999:2007 • Maintaining and Improving • Ongoing training • Update plans whenever there is a major change to the company processes/structure • Embedding BCM in the culture • Planning • BCM Policy • Steering Committee • Structure, roles & responsibilities • BCM scope & objectives • Top management commitment PLAN ACT • Implementation & Operation • Identify BC threats • Business impact analysis • Process criticality analysis • Gap analysis • Risk assessment • Recovery strategies & scenarios • Generating BCM Plans/BCM Manual • Testing & Auditing BCM plans • Periodical tests – Table top exercises, Simulations, Live Exercises • Periodical audits • Management review CHECK DO
BURP vs. DR Plan • BURP is a plan used to restore the entire business unit including the key processes, which employees are moved to DR site, activation of call tree, coordination with other business units, resume business as normal. (holistic) • DR Plan is technical plan designed to restore equipment and machinery back to normal. (e.g. IT DR Plan) • Companies use both names intermittently.
Managing Incidents Within the Capability of TPL Plans • Identify the identification • Assess damage and identify lost critical processes • Declare a disaster if the critical operations are unable to be restored within the primary premises • Shut- down power supply for the safety of public • Activate BURPs/DRPs. Move into the DR site to restore operations at a minimum acceptable levels • Restore the damaged distribution assets and network • Restore power generation • Resume business as usual at the DR Site • Monitoring • Review & Documentation
RTO & RPO (Last Backup) • The Recovery Time Objective (RTO) is the goal for how quickly TPL need to recover the interrupted processes. • The Recovery Point Objective (RPO) is the point in time to which data must be restored to successfully resume the interrupted processes (often thought of as time between last backup and when an “interruption” occurred).
Business Impact Analysis • BIA identifies impacts (financial & non-financial) due to malfunction of company critical processes resulting from disasters • The goal of BIA is to identify, categorise and prioritise the mission critical processes and resources (e.g. technology, infrastructure, vital records, personnel, suppliers) required to function these processes within the company. • Some examples of such processes are customer service & support, order & data processing, pay roll, IT & communication, and purchasing and production. • BIA also identifies interdependencies (telephones, IT facilities, office space etc.) between different business units within the company. • The priorities of critical processes for subsequent resumption are based on RTO and RPO objectives of each process.
BIA – Identifying Critical Processes • Administer BIA Questionnaire with each BU Manager (template supplied), collect data, review and analyse them • Consider the worst case scenario; a total loss of personnel, facilities and property. • Does the disaster affect the critical processes? • Determine impacts if a process is lost due to the disaster: • Identify and quantify financial impacts (recovery costs, production losses & revenue losses) • identify non-financial impacts • Impact on staff or public wellbeing • Impact of damage to premises/assets/records • Impact of breaches of statutory regulations • Damage to reputation • Deterioration of product/services quality • Environmental damage
BIA – Identify Critical Processes • Determine minimum resources required for minimum acceptable recovery of each process manually or using alternative processes. • Staff – skills and knowledge • Secondary premises • Plant, equipment, software, data/records • External services providers • Determine RTOs and RPOs for each process. Note shorter the RTO, greater cost of recovery. Also, longer RTOs increases the chances that the recovery will not be achieved within MTO. • Determine whether the company is able to achieve RTO/RPO objectives currently. If not, flag them as risks to analyze them at a later stage • Rank the processes based on impacts to determine priority of recovery of critical processes • Example: Results of RTO/RPO analysis is supplied • Conduct a process dependency analysis with other business units
BIA - Process Dependency Analysis RTO = 2 hrs. B D RTO = 7 hrs. • Consider the entire chain of processes to be recovered together • Adjust RTOs if necessary • Define Maximum Tolerable Outage (MTO) • BCM Plans are designed to recover company critical processes within MTO BU 2 A C RTO = 3 hrs. E RTO = 5 hrs. RTO = 1 hr. BU 1 BU3
BIA – Disaster Declaration • Scenario 1 – If Estimated Recovery Time < MTO • Primary Site is intact; (Example: malfunction of IT Servers) • Execute IT DR Plan for all IT issues • Scenario 2 – If Estimated Recovery Time <= MTO • Primary site is inaccessible but DR sites are operational, key staff are available; national communication systems functional (Example: cyclone) • Minimum and acceptable reputational/financial losses • Execute all BCM Plans including IMP, BURPs, & IT DR Plan • Resume company critical processes from the DR Sites to a minimum acceptable level • Scenario 3 – If Estimated Recovery Time > MTO • All primary and DR sites are not operational; key staff are unavailable; national communication system is malfunctioned (Example: tsunami) • Requires a good communication plan (e.g. radio communication) • Potential reputational/financial losses are inevitable • Execute BCM plans with a delay
Disaster Declaring Criteria TPL MTO = 2 days Level 1: Minor Incident –Minor incidents occur only when critical business operations are affected and distribution network recovery or IT problems are expected to be resolved within 3 hours. In this occasion, the incident is resolved on ‘business as usual’ basis and no disaster is declared. Level 2: Major Incident – Major disruption to the critical business operations with system outage expected to last more than 1 day but not more than two business days. In this case, the incident is escalated to a semi-disaster and is declared by the members of the IMT. At this incident level, only Generation and Distribution DR Plans will be invoked as applicable. Level 3: Critical Incident – Critical disruption to the business operations with system outage expected to last more than two business days. In this case, the incident is escalated and a disaster is declared by the members of the IMT. At this incident level all critical Business Units (IT, Finance, Generation, and Distribution) BURPs/DRPs will be invoked.
Command Centre Options Command Centre Activities: • Declaring the disaster based on the damage assessment • Establishing 24 hour communication channels • Activating, coordinating and monitoring BURPs/DRPs • Monitoring and acting on staff health & safety • Media relations • Making all key decisions for successfully restoration Possible Locations for the Command Centre: • Head Office (if available) • Distribution Office (if available) • Scenic Hotel, Digicel Network Operating Centre
Field Supervisors & Foremen Linesmen Incident Management Team Business Unit Managers Office Supervisors Staff Internal Communication – Call Tree • Brief description of the disaster, loss of life or injuries, damage summary, response and recovery details • Location of the DR or Network Site to report to or to remain at home on standby • Phone number of the DR Site/Network Supervisor • Immediate actions to be taken • Location and time the team should meet at DR or Network Site • Instruct everyone notified not to make any statements to the media or social media.
Distribution Manager Police, Fire & Hospital Business Unit Restoration Plans National Emergency Management Office (Ministry of Works) Finance Manager IMT Business Unit DR Plans Generation Manager Tonga Met Service Planning & Design Manager Radio Tonga IT Supervisor Tonga Defense Service Government Organisations Vendors & Suppliers Embassies & High Commissions External Communication
Media Relations • Only the authorised spokespeople shown below are to comment to the media. • John van Brink (CEO) – English media • Steven Esau (Finance Manager) – Tongan media • All unauthorised staff should know that if they are contacted by the media that they are not to comment and that their standard reply should be “I’m sorry I can’t help you, I am not the appropriate person to speak to. If you provide me your name and contact number, I’ll get the right person to get in touch with you shortly”. • Jane Guttenbeil and Nau Lavemai coordinate media relations
Head Office BURP Minimum Systems Required at DR Site Minimum Staff Required at DR Site
Distribution Office BURP Minimum Systems Required at DR Site Minimum Staff Required at DR Site
Distribution Network DR Plan Note: H/O staff are expected to support distribution staff for cooking etc. Meter readers are expected support linesmen at the field
Livening Priorities • Airport – Feeder 3 • Hospital – Feeder 1/2 • NEMO/MET Service– Feeder 1 • Emergency evacuation centers (e.g. schools & churches) if applicable – Feeder 1/3 • Prime minister’s Office – Feeder 2 • King’s royal palace – Feeder 2 • Water Board – Feeder 2/1 • Defense organisations – Feeder 2 • Communication organisations (e.g. Radio, TV) • Government offices – Feeder 2 • Commercial organisations – Feeder 2
Emergency Response Plans • ERPs contain specific emergency procedures that must be followed during a disaster in order to protect people and assets, and to mitigate further damage. For example: • Building Evacuation • Damage Assessment • Spillage (Oil/Chemical/Diesel Fuel) at Power Station • Fuel Supply Cut Off • Civil Disturbance • Hurricane/Storm • Records Recovery • Bomb Threat • Earthquakes
Testing Business Continuity Plans • The development of BCM plans does not end of the BCM process. The emphasis of BCM is upon management • Without regular maintenance and testing, their usefulness in a real crisis may be severely limited • Practicing the company’s ability to recover from an incident. • Test the scenarios identified under the Scenario Planning Section (refer above) • Tests validates effectiveness and timeliness (RTO and RPO objectives) of restoration of critical activities. • Determine adequacy of SLAs (service level agreements) with third party suppliers • Testing identifies communication breakdowns during call-tree activation trials. • Tests are essential to developing teamwork, competence, confidence, and knowledge which is vital at the time of an incident. • Frequency of testing can be annual, bi-annual etc. and announced or unannounced
Types of Tests • Table top checks: is the simplest and most frequent form of tests. The author of the plan simply checks the contents of the plan to ensure that information (e.g. employee names, contact numbers) are up to date. • Walk-Through: similar to table top exercise but involve all named participants to test a special disaster scenario. The participants are brought together to role play their defined resumption procedures alongside those of others. This is the most common method of testing BCM plans as is relatively less expensive. • Simulation: exercises widens participation to all those who are involved in business recovery with a prior notice. A simulation may includes an interruption such as building fire in which people do not access to normal facilities and must relocate to an alternative location (i.e. DR site). • Full or Live Exercises: are the most extensive and expensive form of test thus they are normally undertaken yearly or bi-yearly basis. This is the largest scale of test and involves the invocation of all BCM plans (i.e. IMP, BURP & ERPs) to deal with a scenario which normally involves a move to an alternative site where operations are to be resumed. The dependencies and links between BURPs are the focal point of this type of testing.
Post Test Evaluations The post test evaluation should consider following issues: • Did the plans help or hinder recovery efforts? • Did people deviate from the plan and, if so, what was the effect of this? • Were RTO & RPO objectives achieved? • Where and when did delays occur? • What did staff do well? • What did staff do badly? • How did the expectations differ from what actually happened? • Were all BURPs integrated sufficiently to achieve recovery? • What are the priorities for change? • Is there a paper or audit trail? • How should changes be implemented? • Could the observation process be improved? • Identify and document all the deficiencies, lesions learned etc. • Update BCM plans if required based on test outcomes
BCM Maintenance & Auditing • Ongoing BCMS audits should be conducted to evaluate and identify gaps/inconsistencies of entire BCM portfolio of plans. • If testing has shown that plans have failed to meet the recovery objectives, a fundamental review of plans may be required. • In addition, audits should be conducted after company restructure as a result of a merger or acquisition, installation new systems & facilities (e.g. IT) etc. • Auditor will issue corrective and preventive actions focusing continuous improvement • Ensure the BCMS is current and up to date at all times (i.e. BCM plans are living documents) • Provide ongoing training on the entire BCM process (e.g. BIA, Risk Management etc.) • Communicate to all employees the BCM initiatives through newsletters, induction programmes etc. • Cultivate a BCM culture.