240 likes | 363 Views
CSE 4482 – Session 9. Understand system availability and business continuity, and recognize differences between the two. Comprehend incident response systems and their role in achieving the system availability objective.
E N D
CSE 4482 – Session 9 • Understand system availability and business continuity, and recognize differences between the two. • Comprehendincident response systems and their role in achieving the system availability objective. • Explaindisaster recovery planning objectives and its, design, implementation and testing requirements. • Comprehendthe link between business continuity and disaster recovery. • Understandthe role of backup and recovery in disaster recovery plans.
Power outage at Northwest Airlines • Thunderstorm and lightening at the datacenter location caused the problem. • Systems, down initially, operated in a degraded manner the next morning. • Took very long to check people in flights. • NWA triggered manual processes. Lines became longer and so did the delays in departure. • Arrivals were late, but the departures from gates at the destination airport made the flights to wait before they could get to the gate. • NWA announced an embargo, limiting itself to what it can handle under the circumstances.
System Availability and Business Continuity • System availability assures you that business will continue to operate. • Business continuity is necessary for systems to add value on an ongoing basis. • The issues of business continuity and systems availability are related and even overlap to a degree.
Incident Response • Incident: A level of interruption in the system availability that appears to be temporary. • An incident can be triggered by an accidental action by an authorized user, it may result from a threat. • Incidents may be detected by: • End-users who may describe the symptom but not the cause. • Those monitoring systems and processes may detect anomalies which lead to an incident that has occurred. • Attack: A series of steps taken by an attacker to achieve an unauthorized result. • Event: An action directed at a target that is intended to result in a change of state, or status, of the target. • An event consists of an action and a target.
Nature of Response to an Incident • Assess the business significance of the incident’s impact. • Identify critical business processes that might have been compromised. • Determine the root causes of the incident. This might present a challenge, for every incident could be of a different variety. The team may need to consult experts from outside the team. • Training in forensics could help the team collect and evaluate evidence systematically. • Standard procedures must be followed for restoring the affected systems and processes, instead of ad hoc, one-off attempts to restore what is compromised or lost.
Preventive Measures • Prevention is better – and could be more cost effective - than a cure. • Preventive measures require an anticipation or prediction of what might happen in terms of incidents and consequent compromises. • Lessons learned from the organization’s and from others’ experiences can help design and implement effective preventive measures.
Incident Response Team • A multi-skilled group, since the incident may be any variety and may impact almost any information asset. • May include representation from human resources, legal, information systems, networks and communications, physical security, information security, and public relations. • A top management team member may be designated as a direct contact for counseling and support.
CERT • CERT stands for Computer Emergency Readiness Team. • Also called CERT Coordination Center (CERT CC), it is the Internet’s official emergency team. • Provides alerts and offers incident handling and avoidance guidelines. • Is located at Carnegie-Mellon University. • www.cert.org
Disaster Recovery • Disaster: An event that causes a significant and perhaps prolonged disruption in system availability. • Disasters can be man-made or natural. • Man-made disasters can be malicious or unintentional. • Disaster recovery is a systematic effort to recover from the impact of a disaster. • Best way to understand recovery is by focusing on post-disaster phases. • Post-disaster phases • Immediate response • Near-term resumption • Recovery toward normalization • Restoration to pre-disaster state
Timeliness of Action and Value of Recovery • Timeliness of action • The timeline of actions planned should reflect value of the action at the time. • Certain steps can wait while others must be taken without delay, to minimize losses. • Value of recovery • Timeliness of action reflects value of the recovery target. • Considering this, recovery tasks should be systematically assigned to each post-disaster phase.
Disaster Recovery Planning (DRP) • DRP: The definition of business processes, their infrastructure supports and tolerances to interruptions, and formulation of strategies for reducing the likelihood of interruption or its consequences. • Component steps of DRP: • Define the process • Identify what supports the process and its tolerance to interruptions • Determine and implement strategies that would reduce the likelihood and cosequences of interruptions.
Disaster Recovery Planning (DRP) • Assessing potential losses: Disaster Impact Analysis • What disasters the firm is likely to face? • What is the probability of each type of disaster? • What is the impact of the disaster on the firm?
Disaster Recovery Planning (DRP) • Value-based recovery planning • Definition of criticality and criteria to determine criticality • Identification of critical business processes and their supports • Identification of the role of information systems resources in the critical process • Determination of process owners and process customers • Determination of the amount of time the business can survive without the process post-disaster • Identify interdependencies between the process and the rest of the business processes and systems • To find critical processes, consider attributes such as importance, key users, tolerance to outage, waiting time between cycles, possibility of data recovery.
Disaster Recovery Planning (DRP) • Disaster recovery strategies • How do we recover a system given its priority? • Address the question by system components. • Data (e.g., designate off-site storage) • Processing (e.g., backup and store offsite current copies of the software) • Network and communication (e.g., backup and store offsite a copy the current network configuration) • Dependencies with other systems (e.g., identify how these processes will be interfaced post-disaster)
DRP: Recovery Locations • Recovery location: A site(s) where processes and systems will be recovered post-disaster. • Hot sites: Near-perfect replicas of the operations. • Cold sites: Just the infrastructure (computer operations room, platform for installing hardware, power and communication lines, cabling, etc.). • Warm sites: More than just a cold site, but not quite as ready as a hot site. For example, it may include commonly used computers and operating system. • Reciprocal agreements: Sharing of similar resources by those in the same or similar computing enviornments. • Colocations: Recovery is planned using availability of computing resources at the firm’s many locations.
DRP: Teams • Purpose of forming teams is to ensure that recovery tasks are accomplished in an orderly and responsible manner. • The number and nature of teams could vary across organizations. • However, each team should include knowledge and skills necessary to perform its assigned tasks. • Recovery teams can be organized by recovery phases. • Flexibility in assignments is necessary, for an actual disaster may need adjustments to the team. Non-availability of some team members when disaster strikes is also likely.
DRP: Disaster Readiness • Meaning of readiness: Having the assurance that if and when a disaster strikes, the firm has a high likelihood of recovering from the disaster. Testing of the plan is crucial to get this assurance. Disaster readiness practices include: • Walkthroughs: Having a plan preparer walk though others to show how the plan leads from point A to point B. • Rehearsals: An “as-if” exercise to simulate a disaster’s impact and have people responsible recreate recovery of “lost” processes and systems. • Compliance (Live) testing: Actual test of recovery with a simulated disaster.
Business Continuity Planning (BCP) • BCP: The totality of plans made to recover the business operations following a disaster. • Recovery of all operations is involved, not just information assets. • Methods and strategies adopted for BCP are comparable to, and often overlap with, those used in DRP.
Business Continuity Planning (BCP) • Business impact analysis is an exercise in risk assessment. • Identify vulnerabilities of the firm. • Assess the business impact • Focus on a particular disaster and determine processes that might be affected, and/or • Analyze all business processes to assess probable business impact in the event that a disaster strikes. • Initiate a planning process to develop methods and strategies to mitigate risk. • Business recovery • Approaches and methods for business recovery are similar to those discussed in disaster recovery planning.
Assurance Considerations • Any assurance that BCP/DRP will be effective requires an examination of such plans from three angles: • Method: Review the method followed in the development of the plan. A sound planning process make possible a plan that is complete and reliable. • Content: Should have been collected from “right” participants, and the instruments and methods used to collect data must be valid. The plan should be current. • Testing: Critical components of the plan should be tested, results should be documented, and corrective action, where necessary, should follow.