1 / 19

CIT 470: Advanced Network and System Administration

CIT 470: Advanced Network and System Administration. Disaster Recovery. Topics. Planning Disasters Mitigation. What is a Disaster Recovery Plan?. Considers potential disasters. Describes how to migitate potential disasters. Makes preparations to enable quick restoration of services.

gdelarosa
Download Presentation

CIT 470: Advanced Network and System Administration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CIT 470: Advanced Network and System Administration Disaster Recovery CIT 470: Advanced Network and System Administration

  2. Topics • Planning • Disasters • Mitigation CIT 470: Advanced Network and System Administration

  3. What is a Disaster Recovery Plan? • Considers potential disasters. • Describes how to migitate potential disasters. • Makes preparations to enable quick restoration of services. • Identifies key services and how quickly they need to be restored and in what order. CIT 470: Advanced Network and System Administration

  4. Disaster Recovery Plans • Define (un)acceptable loss. How much could you lose in a disaster? • Back up everything. Backup data, metadata, and instructions on how to restore your system. • Organize everything. Can you find the backup tapes you need when disaster strikes? CIT 470: Advanced Network and System Administration

  5. Disaster Recovery Plans • Protect against disasters. Natural disasters and many more. • Document what you have done. Plan must be detailed enough for people to follow in a disaster w/o additional info. • Test, test, test. A disaster recovery plan that has not been tested is not a plan; it's a proposal. CIT 470: Advanced Network and System Administration

  6. Define loss Loss of service How much employee productivity lost? How much customer revenue lost? Loss of data Irreplaceable data Medical image records Stock purchases Re-creatable data. At what cost? Code for a software product Simulation results CIT 470: Advanced Network and System Administration

  7. Backup Everything On a system Project directories Home directories System files (fstab, kernel, passwd, LVM config, MBR) Types of systems Laptops Plugin to network and backup to backup server on command. Desktops Store everything on network disks. Servers Permanent connection to backup system. CIT 470: Advanced Network and System Administration

  8. Organize Everything What resources do you back up? On what schedule? Media organization Bar code labels on each tape. Stored securely at proper temp/humidity. Media database Maps servers/drives to tapes and their locations. Indicates whether tapes are on- or off-site. Must be backed up w/ humanly-readable label. CIT 470: Advanced Network and System Administration

  9. Protect against Disasters On-site vaults. Off-site storage. Test your media regularly. Store documentation securely too. CIT 470: Advanced Network and System Administration

  10. Document Store documentation in portable format. Ensure documentation accessible in disaster. Paper copies on and off-site. CIT 470: Advanced Network and System Administration

  11. Test Can other people understand procedures? Sample test tapes on regular (weekly) basis. Attempt a full system recovery 2/year. CIT 470: Advanced Network and System Administration

  12. What is a Disaster? A catastrophic event that causes loss of data and/or service. Human disasters Errors or intentional. Typo, backhoe, or hacker tools. Natural disasters Small scale: Hardware or power failure. Large scale: Hurricane, earthquake, fire. CIT 470: Advanced Network and System Administration

  13. Types of Disasters User errors Accidental file deletion / overwrite. Very common. Snapshots can automate. Sysadmin errors Accidental mass file destruction. Regular backups will prevent loss. Drive failure Single disk failure: RAID can prevent loss. System failure Loss of an entire system. RAID won’t help. Need backups. CIT 470: Advanced Network and System Administration

  14. Types of Disasters Power/Network Failure Need UPS/generator or redundant network connection. Software Failure Software corrupts its own or other apps data store. Need regular and perhaps historical backups. Security Breach An attacker / worm destroys/corrupts data. Need long-term historical backups. Natural Disaster Potential loss of entire data center, including backups. Need off-site backups to restore data. Need off-site (virtual) data center to restore service. CIT 470: Advanced Network and System Administration

  15. Risk Analysis Evaluate risk cost of disaster Cost * Probability Determines budget for disaster mitigation. Ex: power failure 70% chance per year Average downtime: 4 hours Average web site revenue / hour: $1000 Budget = 4 hrs * (1000 $/hr) * 0.7/yr = $2800/yr CIT 470: Advanced Network and System Administration

  16. Disaster Mitigation Power Failures UPS Generator System Failures Redundancy: CPU, ECC RAM, NICs, power Cluster of servers Network Failures Multiple internet connections f/ diff ISPs. CIT 470: Advanced Network and System Administration

  17. Disaster Mitigation Drive Failures RAID Backups Accidental Deletion Snapshots Backups Security Incident Backups CIT 470: Advanced Network and System Administration

  18. Redundant Site Redundant site at a different location • Location far enough away to be unaffected by whatever disaster took down primary site. • Automatic or manual switchover. • DNS names with short experimation times. Cheaper solution: use existing second site • Duplicate critical services at both data centers. • Rebuild less critical servers at second site. CIT 470: Advanced Network and System Administration

  19. References • Aeleen Frisch, Essential System Administration, 3rd edition, O’Reilly, 2002. • Evi Nemeth et al, UNIX System Administration Handbook, 3rd edition, Prentice Hall, 2001. • Thomas A. Limoncelli and Christine Hogan, The Practice of System and Network Administration, 2nd edition, Addison-Wesley, 2007. • W. Curtis Preston, UNIX Backup & Recovery, O’Reilly, 1999. CIT 470: Advanced Network and System Administration

More Related