Disaster Recovery Planning (DRP)

Disaster Recovery Planning (DRP) W.lilakiatsakun

Disaster Recovery Planning (DRP) • DRP is the process of regaining access to the data, hardware and software necessary to resume critical business operations after a natural or human-induceddisaster. • DRP is part of a larger process known as business continuity planning (BCP). • Disaster recovery is the process by which you resume business after a disruptive event.

What is the difference DRP and BCP(1/2) • The event might be • something huge-like an earthquake or the terrorist attacks on the World Trade Center • something small, like malfunctioning software caused by a computer virus. • Many business executives are prone to ignoring "disaster recovery" because disaster seems an unlikely event.

What is the difference DRP and BCP(2/2) • "Business continuity planning" suggests a more comprehensive approach to making sure you can keep making money. • Often, the two terms are married under the acronym BC/DR. • DR and/or BC determines how a company will keep functioning after a disruptive event until its normal facilities are restored.

What do these plans include (1/2) • All BC/DR plans need to encompass • How employees will communicate • Where they will go • How they will keep doing their jobs. • The details can vary greatly, depending on the size and scope of a company and the way it does business.

What do these plans include (2/2) • For example :The plan at one global manufacturing company • restore critical mainframes with vital data at a backup site within four to six days of a disruptive event, • obtain a mobile PBX unit with 3000 telephones within two days • recover the company's 1000-plus LANs in order of business need • set up a temporary call center for 100 agents at a nearby training facility.

Events that necessitate disaster recovery • Natural disasters • Fire • Power failure • Terrorist attacks • Organized or deliberate disruptions • Theft • System and/or equipment failures • Human error • Computer viruses • Testing

Prevention against data loss (1/2) • Backups sent off-site in regular intervals • Includes software as well as all data information, to facilitate recovery • Create an insurance copy on Microfilm or similar and store the records off-site. • Use a Remote backup facility if possible to minimize data loss • Storage Area Networks (SANs) over multiple sites make data immediately available without the need to recover or synchronize it

Prevention against data loss (2/2) • Surge Protectors — to minimize the effect of power surges on delicate electronic equipment • Uninterruptible Power Supply (UPS) and/or Backup Generator • Fire Preventions — more alarms, accessible extinguishers • Anti-virus software and other security measures

Techniques and technology • Mirroring • Disk mirroring : Redundant arrays of inexpensive disks 1 (RAID1) • Server mirroring: web / ftp /email • RAID : RAID0 – 6 and combination • On-site data storage • Back up - Tape / optical disk • Off-site data storage (backup-site) • Cold sites • Warm sites • Hot site

Mirroring • Mirroring can occur locally or remotely. • Locally means that a server has a second hard drive that stores data. • A remote mirror means that a remote server contains an exact duplicate of the data. The second drive is called a mirrored drive. • Data is written to the original drive when a write request is issued and then copied to the mirrored drive, providing a mirror image of the primary drive. • If one of the hard drives fails, all data is protected from loss.

Disk mirroring (RAID1) • The replication of logical disk volumes onto separate physical hard disks in real time to ensure continuous availability, currency and accuracy. • A mirrored volume is a complete logical representation of separate volume copies

Server mirroring • Mirror sites are most commonly used to provide multiple sources of the same information, and are of particular value as a way of providing reliable access to large downloads. • Web server • To preserve a website or page, especially when it is closed or is about to be closed • Load balancing • Email server • To protect loss of email information • ftp server • To allow faster downloads for users at a specific geographical location • Load balancing

Redundant arrays of inexpensive disks (RAID) • The organization distributes the data across multiple smaller disks, offering protection from a crash that could wipe out all data on a single, shared disk. • Benefits of RAID include the following • Increased storage capacity per logical disk volume • High data transfer or I/O rates that improve information throughput • Lower cost per megabyte of storage

RAID0 (stripe set or striped volume) • RAID Level 0 splits data evenly across two or more disks (striped) with no parity information for redundancy. • It is important to note that RAID 0 provides zero data redundancy. • RAID 0 is normally used to increase performance • A RAID0 can be created with disks of differing sizes, but the storage space added to the array by each disk is limited to the size of the smallest disk

RAID1 (mirrorring) • A RAID 1 creates an exact copy of a set of data on two or more disks. • This is useful when read performance or reliability are more important than data storage capacity. • Such an array can only be as big as the smallest member disk. • A classic RAID 1 mirrored pair contains two disks which increases reliability

RAID3 (Parallel access with a dedicated parity disk) • RAID Level 3uses byte-level striping with a dedicated parity disk. • This comes about because any single block of data will be spread across all members of the set and will reside in the same location. • So, any I/O operation requires activity on every disk.

RAID5 (Independent access with distributed parity) • A RAID 5 uses block-level striping with parity data distributed across all member disks. • A minimum of 3 disks is generally required for a complete RAID 5 configuration. • In the example, a read request for block "A1" would be serviced by disk 0. • A simultaneous read request for block B1 would have to wait, but a read request for B2 could be serviced concurrently by disk 1

Nested RAID

Storage Model

Storage Area Network • The Storage Network Industry Association (SNIA) defines the SAN as a network whose primary purpose is the transfer of data between computer systems and storage elements. • A SAN consists of a communication infrastructure, which provides physical connections; and a management layer, which organizes the connections, storage elements, and computer systems so that data transfer is secure and robust.

SAN ‘s definition • A SAN is a specialized, high-speed network attaching servers and storage devices • It is sometimes referred to as “the network behind the servers.” • A SAN introduces the flexibility of networking to enable one server or many heterogeneous servers to share a common storage utility, which may comprise many storage devices, including disk, tape, and optical storage.

SAN Component • SAN Connectivity • the connectivity of storage and server components typically using Fibre Channel (FC). • SAN Storage • TAPE /RAID /ESS (Enterprise Storage System) /JBOD (Just Bunch of Disk) /SSA (Serial Storage Architecture) • SAN Server • Windows /Unix /Linux and etc

Switched Fabric • An infrastructure specially designed to handle storage communications called a fabric. • A typical Fibre Channel SAN fabric is made up of a number of Fibre Channel switches. • Today, all major SAN equipment vendors also offer some form of Fibre Channel routing solution, and these bring substantial scalability benefits to the SAN architecture by allowing data to cross between different fabrics without merging them.

Fiber Channel protocol • Fibre Channel is a layered protocol. It consists of 5 layers, namely: • FC0 The physical layer, which includes cables, fiber optics, connectors, pinouts etc. • FC1 The data link layer, which implements the8b/10b encoding and decoding of signals. • FC2 The network layer, defined by the FC-PI-2 standard, consists of the core of Fibre Channel, and defines the main protocols. • FC3 The common services layer, a thin layer that could eventually implement functions like encryption or RAID. • FC4The Protocol Mapping layer. Layer in which other protocols, such as SCSI, are encapsulated into an information unit for delivery to FC2.

IP Storage Networking • FCIP (Fiber Channel over IP) • It is a method for allowing the transmission of Fibre Channel information to be tunneled through the IP network. • iFCP (Internet Fiber Channel Protocol) • It is a mechanism for transmitting data to and from Fibre Channel storage devices in a SAN, or on the Internet using TCP/IP • Internet SCSI (iSCSI) • It is a transport protocol that carries SCSI commands from an initiator to a target.

FCIP (Fiber Channel over IP) • FCIP encapsulates FC frames within TCP/IP, allowing islands of FC SANs to be interconnected over an IP-based network • TCP/IP is used as the underlying transport to provide congestion control and in-order delivery FC Frames • All classes of FC frames are treated the same as datagrams • End-station addressing, address resolution, message routing, and other elements of the FC network architecture remain unchanged

iFCP • iFCP is a gateway-to-gateway protocol for implementing a fibre channel fabric over a TCP/IP • Traffic between fibre channel devices is routed and switched by TCP/IP network • The iFCP layer maps Fibre Channel frames to a predetermined TCP connection for transport • FC messaging and routing services are terminated at the gateways so the fabrics are not merged to one another

iSCSI • iSCSI is a SCSI transport protocol for mapping of block-oriented storage data over TCP/IP networks • The iSCSI protocol enables universal access to storage devices and Storage Area Networks (SANs) over standard TCP/IP networks

Back up site • A backup site is a location where a business can easily relocate following a disaster, such as fire, flood, or terrorist threat. This is an integral part of the disaster recovery plan of a business. • A backup site can be another location operated by the business, or contracted via a company that specializes in disaster recovery services. • In some cases, a business will have an agreement with a second business to operate a joint disaster recovery facility.

Cold Sites • A cold site is the most inexpensive type of backup site for a business to operate. • It provides office spaces to operate • It does not include backed up copies of data and information from the original location of the business, nor does it include hardware already set up. • The lack of hardware contributes to the minimal startup costs of the cold site, but requires additional time following the disaster to have the operation running at a capacity close to that prior to the disaster.

Warm Sites • A warm site is a location where the business can relocate to after the disaster that is already stocked with computer hardware similar to that of the original site, but does not contain backed up copies of data and information.

Hot Sites • A hot site is a duplicate of the original site of the business, with full computer systems as well as near-complete backups of user data. • Ideally, a hot site will be up and running within a matter of hours. This type of backup site is the most expensive to operate. • Hot sites are popular with stock exchanges and other financial institutions who may need to evacuate due to potential bomb threats and must resume normal operations as soon as possible.

How to choose • Choosing the type is mainly decided by a company's cost vs. benefit strategy. • Hot sites are traditionally more expensive than cold sites since much of the equipment the company needs has already been purchased and thus the operational costs are higher. • However if the same company loses a substantial amount of revenue for each day they are inactive then it may be worth the cost.

The advantages of a cold site are simple--cost. It requires much fewer resources to operate a cold site because no equipment has been bought prior to the disaster. • The downside with a cold site is the potential cost that must be incurred in order to make the cold site effective. • The costs of purchasing equipment on very short notice may be higher and the disaster may make the equipment difficult to obtain.

Discovery Planningsteps(1) • I. Information Gathering • Step One - Organize the Project • Appoint coordinator/project leader, if the leader is not the dean or chairperson. • Determine most appropriate plan organization for the unit (e.g., single plan at college level or individual plans at unit level) • Set project timetable • Draft project plan, including assignment of task responsibilities

Discovery Planningsteps (2) • Step Two – Conduct Business Impact Analysis • In order to complete the business impact analysis, most units will perform the following steps: • Identify functions, processes and systems • Interview information systems support personnel • Interview business unit personnel • Analyze results to determine critical systems, applications and business processes • Prepare impact analysis of interruption on critical systems

Discovery Planningsteps (3) • Step Three – Conduct Risk Assessment • The risk assessment will assist in determining the probability of a critical system becoming severely disrupted and documenting the acceptability of these risks to a unit. • Review physical security (e.g. secure office, building access off hours, etc.) • Review backup systems • Review data security

Discovery Planningsteps (3/1) • Review policies on personnel termination and transfer • Identify systems supporting mission critical functions • Identify vulnerabilities (Such as flood, tornado, physical attacks, etc.) • Assess probability of system failure or disruption • Prepare risk and security analysis

Disaster Recovery Planning (DRP)